Together for free and open web search: OWS.EU Community Meet-Up was a great success!

From 9 to 11 October the #ossym24 brought together the open web search community to discuss technical, legal, ethical and economical aspects of free, unbiased and transparent access to information on the internet.

This year the OWS.EU consortium organised for the first time a “Community Meet-Up” on the evening of the first conference day. Many participants accepted the invitation and mingled with the experts from the project. Tables were dedicated to specific topics like “Infrastructure”, “Sustainability”, “Ethics”, “Legal” or “Web Index and Crawling”.

Megi Sharikadze (Leibniz Supercomputing Centre) who is heavily involved in management and coordinating aspects of the EU-funded OpenWebSearch.EU project explains the idea behind the gathering: “Usually the people involved in open web search initiatives are spread across Europe and beyond. It’s not always easy to stay in touch. The opportunity to meet the community locally, exchange ideas, inspire others and share common motivations is therefore of the utmost importance. As we build an open European web search ecosystem together, our community is growing and diversifying. We remain committed to motivating others and finding new allies – especially as we pass the halfway point of the OpenWebSearch.eu project. We have made great strides over the past few years, bringing on board new partners selected through our Community Programme. Now it is time to go out with our results and spread the word about how free and open web search can enhance our sovereignty and innovation potential in Europe”.

How to join the community

Are you a researcher, entrepreneur, inventor, politician, etc. interested in open web search, its chances and challenges? Don’t miss our monthly community meetings where the latest developments in open web search are presented and discussed on a regular basis. The 45-min community updates take place online on the BigBlueButton platform every first Monday of the month. The next Community Meet-Up will take place on 4 November 2024 in conjunction with our partner project NGI Search. Find out more about the event and registration in our community area.

 

OWS.EU Community Meet-Up @ #ossym24

The 6th International Open Search Symposium #ossym24 invites open search enthusiasts to discuss and promote ideas and concepts of Open Web Search at the Leibniz Supercomputing Centre (LRZ) in Garching from 9 to 11 October 2024. OWS.EU hosts a Community Meet-Up during the conference.

OWS.EU Community Meet-Up @ #ossym24

The #ossym24 is organised by the Open Search Foundation in collaboration with the Leibniz Supercomputing Centre (LRZ). OWS.EU is proud to be involved in this year’s symposium and to invite attendees to the first OWS.EU Community Meet-Up @ ossym on Wednesday, 9 October, 7 PM. Members of the OWS.EU project will be waiting for you to discuss all aspects of a European Open Web Search over a catered dinner. The community meet-up is a get together for everyone involved in the OWS.EU project and its Community Programme but also for people who want to join or learn more about the quest for a better European Web Search that enables free, unbiased and transparent access to information.

#Ossym24 with a multifaceted programme

Members of the OWS.EU project are also heavily involved in the conference programme. Our colleagues will present their work on technical, legal and ethical aspects of piloting a European Open Web Search. All details and times can be found at the event website.

Sign up and save your spot

The #ossym24 will take place in a hybrid format in presence and online, registration is required. Participation in both formats is free of charge. There are 100 places available for on-site participation at the Leibniz Supercomputing Centre in Garching near Munich. Save your spot now!

OWS.EU Partner in Focus: University of Passau

The University of Passau coordinates the OpenWebSearch.EU project and is beyond that responsible for providing the Open Web Index (OWI), which includes the development of technology for coordinating crawlers, building the OWI and enabling its download. Building the OWI is one of the key milestones in the OWS.EU project since it will accelerate further use and research towards an open web search.

Prof. Michael Granitzer leads the OWS.EU project and holds the Chair of Data Science at University of Passau. Together with Jelena Mitrović, Professor of Legal Informatics and Natural Language Processing and leader of the Junior Research Group CAROLL, he supervises the research team working on the Open Web Index. We talked to three researchers from their team about the work they do in the OWS.EU project: Saber Zerhoudi, Mahmoud Istaiti and Mohammed Al-Maamari.

How is the project progressing so far?

Saber: Very good, we made considerable progress over the past months. Our team has developed a scalable and distributed crawling software that is currently deployed across three datacenters. To keep users informed about the content being crawled and provide them with filtering options, we have also created a monitoring dashboard that can be accessed under https://dashboard.ows.eu/.

Can you explain what the dashboard does?

Saber: One of the key features of the dashboard is its ability to display near real-time information about the crawling process. Users can easily track the progress of the crawling tasks and view statistics on the number of pages crawled. This transparency ensures that users are always informed about the status of our crawling pipeline.

Furthermore, the dashboard offers users the flexibility to filter the crawling content based on various criteria, such as domain, keyword, or date range. This functionality allows users to focus on specific subsets of data that are relevant to their needs, saving time and effort in analyzing the collected information.

In addition to monitoring and filtering capabilities, the dashboard provides users with the ability to actively contribute to the crawling process. Users can submit lists of URLs they wish to have crawled, expanding the scope of our data collection efforts. This feature enables users to tailor the crawling process to their specific requirements, ensuring that the most relevant and valuable data is collected.

But how does this look from the perspective of a website owner? Will they have the option to manage their data?

Saber: Yes, to address the important aspects of data privacy and intellectual property rights, we have integrated takedown request and website ownership verification functionalities into the dashboard. Through our third-party partners, users can easily submit takedown requests for content they believe infringes upon their rights. Similarly, website owners can verify their ownership, establishing a clear line of communication and ensuring that any concerns or requests are promptly addressed.

By combining a scalable and distributed crawling software with a user-friendly monitoring dashboard, we have created a powerful tool for data collection and management. The ability to monitor, filter, and contribute to the crawling process, along with the integration of takedown request and website ownership verification features, positions our system as a comprehensive solution for users seeking to gather and analyze web data efficiently and responsibly.

What other milestones did you achieve in the project so far?

Mahmoud: My role involves enhancing the crawler process by implementing various filters and features, as well as integrating different data sources into our process. Additionally, I am working on developing machine learning models to extract information from privacy policies.

One major accomplishment is that we can now label crawled websites as either spam or high-quality content by verifying their presence on datasets like Wikipedia external links or CURLIE.

Mohammed: I specialize in Machine Learning and Data Science. My responsibilities include building and processing datasets, training machine learning models (such as URL classification models), and enhancing model modularity.

Key milestones we achieved so far include developing and comparing various URL classification models and building and open-sourcing several datasets.

What are the challenges you face in your work?

Saber: Navigating the diverse infrastructure setups, guidelines, and technology stacks unique to each of the three data centers that currently host the OWSI can be a significant challenge. Each data center has its own distinct configuration of hardware, software, and networking components, which requires a deep understanding of the specific environment to effectively manage and maintain.

Moreover, data centers often have their own set of best practices, policies, and procedures that must be followed to ensure smooth operations and compliance with industry standards and regulations. These guidelines cover various aspects, from physical security and access control to data backup and disaster recovery protocols.

Mahmoud: Same, I often encounter issues related to the infrastructure, that can be a challenge at times.

Mohammed: For me it’s often challenging to effectively test the trained machine learning models.

What are the next steps from here? 

Saber: In the coming months, our goal is to streamline the crawling process across various datacenters using a centralized control center. This automation will enhance efficiency and consistency in data collection. Additionally, we are exploring methods to integrate embeddings seamlessly into our crawling-preprocessing-indexing pipeline.

Mohammed: In the coming months, I aim to optimize and improve the machine learning models, particularly the URL classification model.

Mahmoud: I plan to finish the integration of data from the Mastodon platform into our process.

 

Thank you for the interview!

Read more about University of Passau: https://openwebsearch.eu/partners/university-of-passau/