OWS.EU Partner in Focus: SUMA-EV

SUMA-EV is the next partner we are introducing. The German non-profit organization is comitted to promoting free access to knowledge and protecting online privacy. It pursues these goals through conferences, funding and support of promising projects, and performing talks at educational institutions.
A central part of its work is operating MetaGer, a privacy-focused meta search engine that has been running since 1996 in cooperation with the University of Hanover. As a partner of OWS.eu, Suma-ev is a driving force behind the Open Web Index movement and played a key role in supporting the launch of the Open Search Foundation in 2020.
Phil Höfer, the organization’s technical counselor, contributes together with his team their extensive experience in running a large meta search engine and in processing search results from index-based search systems.

Thanks to Phil for taking the time to share your insights with us.

Please describe your organization’s tasks in the project. What is your field of expertise that you bring to the project?

Phil: SUMA-EV has been running search engine projects for multiple decades and has been working towards an Open Web Index during those years. We aim to further help the transformation of the OWI from research prototype to public amenity by focusing on adoption and application support.

How is the project progressing? Which major milestones did you achieve?

Phil: As the project nears its conclusion, we’re glad to see the pieces coming together. We’ve succeeded in integrating the OWI data into our infrastructure and built an independent functional open-source implementation of the higher search stack elements.

What are the challenges you have been facing (regarding your tasks)?

Phil: Developing applications against a changing data format was one of the main challenges. The same is true of missing documentation. These are common challenges when–so to say–building the rocket while launching it.

Which milestones do you plan to achieve in the remaining months?

Phil: My hope is that we can present a public-facing service demonstrating the ease of integration with OWS data before the end of the project. A part of that relies on fixing remaining compatibility bugs in MOSAIC and publishing our framework for working with OWS index data.

What makes the OWS project special to you?

Phil: Unlike with the failed Quaero project, this is the first time Europe has decided to explicitly push for sovereignty in web search and web data analysis. Only through this we can ensure availability and accessibility of web search as a public ressource.

Do you already have plans for the time after the project ends?

Phil: For us, the end of the project is only the beginning. While the question of how to keep the index going is most important, we also strongly believe that the index is useless if it isn’t being used. Thus, we plan to provide tooling and infrastructure to build search-related projects on top of OWS index data.

Thank you for the interview!

Read more about the SUMA-EV: SUMA-EV

Parliamentary Breakfast in Brussels with lots of food for thought

Earlier this month, a part of our team took to Brussels for a special occasion: At a Parliamentary Breakfast in the European Parliament, hosted by MEPs Alexandra Geese (The Greens/EFA) and Elena Sancho Murillo (S&D), we were given the chance to lay out to parliamentarians, accredited assistants, media representatives, researchers, industry stakeholders our reasons for urging Europe to implement a European Web Data Infrastructure – a crucial step towards digital sovereignty and competitive European Web services, including in the domain of AI.

At a get-together over coffee and breakfast, the event was kicked off with a strong statement by Elena Sancho Murillo, who emphasized that the European Web Data Infrastructure is a precondition for Europe’s AI sovereignty. In her view Europe must not accept that AI foundations be solely built outside Europe. Renate Nikolay (Deputy Director-General for Communications Networks, Content and Technology at DG Connect) highlighted that direct access to data is fuel for everything that is to be done in AI. Alexandra Geese stated that the Open Web Search Initiative is seen as a cornerstone of tech sovereignty but also for democracy in Europe. She therefore issued her concerns over the fact that the OpenWebSearch initiative still needs to look for funding in Brussels at this point, instead of being backed up in their important work regarding the European Web Data Infrastructure and the Open Web Index without further delay.

Economic need for direct access to web data

An industry perspective was presented by Per Öster, who spoke on behalf of LUMI AI Factory. He argued for taking back control over web data and using it to the benefit of individuals, industry and research. For industrial players the power of data lies in making use of it. It is important to be able to process the data.
On behalf of OpenWebSearch.eu, our spokesman Stefan Voigt called for a clear legal basis and secure long-term funding of a European Web Data Infrastructure, explaining the manifold opportunities such an Infrastructure offers for Europe’s SMEs, industrial corporations and start ups. To boost digital sovereignty and competitiveness, Europe needs to enable sovereign large-scale access to Web data and this is what the European Web Data Infrastructure ensures.

Pursuing a holistic approach

In a subsequent lively discussion various aspects such as the current legal framework, micropayments for publishers/content creators, the need for talents in Europe who can make use of data, data sharing obligations pursuant to the Digital Markets Act, and the importance of objective data for democracy in the context of a multinational and multilingual European Union were addressed.

The Journey continues

After the event our team used the opportunity to hop on countless elevator rides in the Parliament building to introduce the project to further parliamentarians and their staff at their desks. Fortunately the topic has been well received.
We are now following up with the aim to bring the European Web Data Infrastructure into the Parliament’s Committee on Industry, Research and Energy (ITRE) and the European Competitiveness Fund (ECF).

Our time in Brussels was a great opportunity to again highlight the importance of a European Web Data Infrastructure for Europe’s sovereignty and competitiveness. Its importance has been understood and acknowledged, but we also need to see some action now, especially with regard to funding.

Europe must act now – fast and boldly!

 

OpenWebSearch.eu offers entire web directory Curlie.org as free download

As of today, the huge human-edited web directory Curlie.org is made publicly available for download – thanks to the OpenWebSearch.eu initiative.

With over 2.9 million well-structured entries, Curlie.org is a clear guide to the Internet. The download now enables operators of niche websites to offer website catalogues on their topic. This is also good news for operators of alternative search engines. The trustworthy entries in DMOZ (Curlie’s predecessor project) have long been Google’s secret sauce to displaying spam-free and relevant search results.

The download of the Curlie database under an open source license is enabled by the European Open Web Search initiative via OpenWebSearch.eu project partner Leibniz Supercomputing Centre (LRZ). As of now, the provider of scientific IT services in Munich, Germany and Europe will provide a constantly updated dump of the entire Curlie directory.

OpenWebSearch.eu is already offering the pilot version of an Open Web Index, which contains roughly 1.3 billion website entries. This index should serve as the basis for an expandable search infrastructure that complies with European democratic values, legal regulations and standards. It enables the creation of alternative search engines that do not have to rely on an index from the big tech companies. 

Category data from Curlie.org is already integrated into the Open Web Index. Curlie data also supports the identification of high-quality websites and the corresponding guidance of website crawlers. Some 45.000 categories containing geographic labelling open the door to enriching location-aware apps.

For the open search community, the cooperation opens up new ways to judge information, says Laura Brown at Curlie.org: ‘We only include high-quality websites in our directory that provide useful information. This is ensured by our experienced and specialised volunteer editors in the individual categories. That’s the advantage we humans have over chat language models: We can assess whether websites are trustworthy. With Curlie, you can always see the source of the information.

We want to enable free, unbiased and transparent access to information. By working together, we are taking a big step towards greater data transparency and data democracy on the World Wide Web,’ explains Michael Granitzer, project manager at OpenWebSearch.eu. The computer science professor at the University of Passau sees many use cases: ‘For example, the combined knowledge of the Curlie editors can be easily leveraged to exclude AI-generated websites from search results – or to flag them. This would give search engine users more transparency about their search results. 

The Curlie directory is now available for free download at 

https://curlie.org/download