Questions? Answers! In conversations with people at webinars, conferences and other events, we get asked a lot of questions. We answer some of them here. If you want to know more, please write to ows@openwebsearch.eu.

FAQs

What is the current status of the Open Web Index?

You can find current information on our status page.

For Open Calls please refer to the Third-party Calls page.

OWS.EU and Open Web Index. What’s the Problem anyway?

In recent years, the Internet and the digital world have become extremely important in our everyday life. Almost every aspect of our daily routine is tied to digital resources.

Those resources are currently curated by a small number of non-European gatekeepers. In the matter of Web Search this results in limited access to information, provided by companies that are primarily focused on commercial success rather than taking the individual needs and societal values into consideration.

With the recent emergence of AI, the situation has become even more difficult.
The extreme amount of AI generated disinformation in the Web is influencing our debates. As a result, EU citizens cannot determine which sources and messages they can trust.

This can lead to polarisation and poses a threat to democracy itself. Consequently, the concept of information as a public good, with freely accessible and transparent content is no longer under public control.

Furthermore, the imbalance and monopolisation of the search engine market through few big tech conglomerates not only endangers democracy but also limits the innovative potential of Europe’s research landscape and economy.^{1, 2}

¹Jiawei Zhou, Yixuan Zhang, Qianni Luo, Andrea G Parker, and Munmun De Choudhury (2023): Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions. , New York, USA Article 436, 1–20. https://doi.org/10.1145/3544548.358131

²Nick Robins-Early (2023): Disinformation reimagined: how AI could erode democracy in the 2024 US elections, online available at https://www.theguardian.com/us-news/2023/jul/19/ai-generated-disinformation-us-elections [Accessed 27.11.23]

What is a (Open) Web Index?

A Web Index is the heart of every search engine. It can be seen as a giant library catalogue for the internet, that keeps track of all the web pages, documents, pictures and videos on the internet.

The objective of the Open Web Search Initiative (OWS.eu) is to create an Open Web Index – open in contrast to the proprietary, closed web indices of the large commercial providers. An Open Web Index means constructing the index on Open Source code while also ensuring that its content undergoes public and transparent moderation.

Who is behind OpenWebSearch.eu (ows.eu) and who is funding it?

OpenWebSearch.eu is an international research project dedicated to the development of an open European infrastructure for web search. It is funded with 8.5 million euros from the EU‘s Horizon Europe research framework programme (No. 101070014). The project will run for three years from August 2022 to August 2025.

The OWS.eu team consists of 14 partners representing diverse entities across seven European countries. This consortium includes data centres, research institutions, universities, industry and non-governmental organisations. (Meet the people behind the organisations on the OWS.eu team page.

Who owns the Open Web Index?

Throughout the OpenWebSearch.eu project, the prototype of the index is jointly owned by the 14 project partners.

The forthcoming operational Open Web Index, which is to be built on the basis of previous research within the OpenWebSearch.eu project, is planned as a public infrastructure. This means that an open European web index will be a public good and thus should not be transformed into a purely commercial entity. It requires support from policymakers at both, national and EU levels.

One of the project’s objectives is to evaluate various organisational structures and legal forms to establish the most suitable one. One potential option is to create a EU based non-profit entity or organisation that would subsequently take ownership of the Open Web Index serving the European public.

Why do you think you can achieve something that so many others before you have not achieved?

Firstly, it is vital to act now:

Having its own Web Index is an important pillar of European sovereignty.

In an era marked by conflicts and crises, the significance of transparent access to information is more evident than ever and the EU itself is striving to become more autonomous in the global political landscape. This goes along with the duty to safeguard citizens personal data. Consequently, the establishment of the EU’s own digital infrastructure is now more crucial than ever.¹

Secondly, we offer a European and cooperative approach:

Knowing that monopolies in the digital sphere entail risks for users , can hinder innovation and have negative effects on society, we approach the search engine market differently from single players who attempted entry before and faced failure.
Unlike previous approaches, we aim to build the Open Web Index through a cooperative approach positioning it as a public infrastructure. The basis for this is to foster collaboration and to grow an extensive European ecosystem.

In this ecosystem, that is currently being developed, many small and medium-sized enterprises, which characterise the economic landscape of the EU, will be strengthened to develop new business models.

Furthermore, we utilise the already existing infrastructure of super computing centres in the EU, adopting a distinctive approach that builds on the benefits of
cooperation while reducing costs. Our approach is to embrace the diversity of culture and language within the EU instead of seeing it as an obstacle.^{2, 3, 4}

¹Council of the European Union (2023): EU ministers boost research for a more autonomous, self-sufficient Europe, online available at, https://spanish-presidency.consilium.europa.eu/en/news/informal-ministerial-meeting-competitiveness-research-santander-28-july/ [Accessed 27.11.23]

²Hagen Krämer (2019): Digitalisierung, Monopolbildung und wirtschaftliche Ungleichheit, in Wirtschaftsdienst 613-978X ,Volume: 99 , Issue: 1, Springer, Heidelberg 2019, pp. 47-52.

³Hubert Burda Media Holding Kommanditgesellschaft (2021): Burda gibt Suchtechnologie und Kernteam an US-Browser Brave ab, online available at https://www.burda.com/de/news/burda-gibt-suchtechnologie-und-kernteam-us-browser/ [Accessed 28.11.23]

⁴European Parliament (2023): Small and medium-sized enterprises, online available at https://www.europarl.europa.eu/factsheets/en/sheet/63/kleine-und-mittlere-unternehmen [Accessed 28.11.23]

How is the Open Web Index supposed to be funded long-term?

A combination of private and public funding is required to cover the costs associated with upholding European values and maintaining openness. Public investments are necessary for this purpose.

The primary expenses come from infrastructure. OpenWebSearch.eu aims to reduce these costs by using existing infrastructure in Europe. The concept envisages that European research institutions will provide capacities of their data centres in order to jointly build and operate the Open Web Index.

One of the project’s tasks is to conduct a market assessment and formulate revenue models to recover parts of the costs.

Why is it necessary to build our own European infrastructure in addition to regulating big tech?

Regulation is important, but always lags behind the rapid pace of technological progress, because it is reactive to new developments, and legislation takes time.
Being dependent on regulation alone can lead to Europe having no technical solutions of its own and being reliant solely on companies outside the EU, which poses a high risk to EU countries in matters of national security and societal well-being as recent crises have shown.

By developing such an infrastructure, the EU can ensure its self-reliance, reduce its dependence on foreign tech giants and promote European innovation.¹

¹Rebecca Baldwin and Richard Freeman (2022): Global supply chain risk and resilience, VoxEU, online available at https://cepr.org/voxeu/columns/global-supply-chain-risk-and-resilience , [Accessed 27.11.23]

Are search engines and web search still relevant given the growing influence of AI technology?

The majority of internet navigation is still done via search. Search engines, not AI models, are still the primary gatekeepers for most citizens to access information. Thus, Web search and search engines retain their relevance as. Additionally, the foundation of AI models, particularly Large Language Models, is similar to that of a search engine, relying on processed web data. For instance, Large Language Models require high-quality, enriched, and cleaned data for effective training.

The data required for training generative AI is currently held by a small number of providers, mostly outside Europe. This data is generated by search engines, among other things. A few commercial search engine providers, which act as gatekeepers, collect and store an enormous amount of user and usage data that can be exploited not only for the advertising business, but also for the training of AI models. This gives these non-European providers major advantages in the training of AI models.

Therefore, the establishment of an Open Web Index would hold significant potential in two distinct realms within our digital sphere: It would give Europen citizens the opportunity of accessing information through various EU based search engines and it would deliver data that helps build sovereign, European, generative AI systems. Such systems would be subject to EU legislation and jurisdiction to maintain control over citizens data and mitigate the high risk of malicious use.

Why is it so hard to design an Open Web Index? Why don’t the scrapers and crawlers just run off and build the OWI?

Answer by Prof. Dr. Michael Granitzer, scientific director of OpenWebSearch.eu.

“The basic approach is to simply crawl, then analyze and index the crawl. However, there are many details to be considered and in our case especially the size – if we assume that commercial indexes have several 100 petabytes. In addition, there is the amount of raw data for storage or data that is generated during the analysis. This means that we are already on a scale in terms of computing capacity that can no longer simply be rented and used via cloud services.

Efficiency becomes a key factor and organization a core question – both points that need to be answered with the project. That is why there are also leading HPC (high-performance computing) and infrastructure partners on board who can handle such sizes. For example, the Leibniz Computing Center in Germany, CSC in Finland, which operates Europe’s largest supercomputer, or IT4Innovation in the Czech Republic and CERN, the European Organization for Nuclear Research.

This necessary scalability is also the biggest problem for smaller companies and innovators to simply crawl. In addition, there are many technical details in the individual steps: crawling should be both efficient and „polite“. This means, for example, that crawling strategies have to be considered – when and how often content is to be retrieved. We are also considering accepting crawl results from third parties here, for example, to allow data centers to crawl themselves and dump the data to the OWI. The analysis of the crawls is also very detail-driven. It is not just a technical problem. Services like Google’s Search Console allow website operators to optimize their search page for Google – thus Google crowdsources the robust parsing without making this information available to third parties. Here we are already at a disadvantage compared to such commercial indices.

This already shows how many individual components are necessary on a technical level alone in order to obtain good quality data – and quality is crucial for every application of an open web index. In addition there are also legal, ethical and social aspects, for example which legal principles have to be taken into account when operating an OWI.”

What new web services / innovations based on the OWI could be possible?

Answer by Prof. Dr. Michael Granitzer, scientific director of OpenWebSearch.eu

“New content analysis methods for information quality or annotation of hate speech.
Development of specialized language models, i.e. to provide up-to-date language models based on the data
Search paradigms such as Argumentation Search (prototype: https://www.args.me/search.html?query=is+climate+change+real).
Web site registration: collecting information about web sites and allowing webmasters to define usage variants (filled by existing standards like Robots.txt) but also to see how the data was used, who wants which data removed where etc. (right to be forgotten)”

Which search algorithms / search factors are discussed, for example? Which decisions have to be made?

Answer by Prof. Dr. Michael Granitzer, scientific director of OpenWebSearch.eu.

“Currently, content-based factors, texts, anchor text, information quality or genres, link structures (classic Pagerank), but also technical factors such as response times are mainly discussed. What is currently not taken into account are user clicks – these are important, but also critical in terms of privacy. The question here is also to what extent user clicks can be replaced by other usage data or by simulating user queries. However, it is essential for search algorithms that the choice, combination and weighting of search factors are controllable and adjustable. Only in this way can the index and the search be open. Basically, every search engine based on the OWI is free to use additional factors.”

How does a reasonable distribution of the OpenWebIndex (OWI) to different servers work?

Answer by Prof. Dr. Michael Granitzer, scientific director of OpenWebSearch.eu.

“Index generation needs data locality, index usually smaller in size and can be distributed or reorganized. The distribution of the index is then probably not only a technical decision, but also an organizational one. It is possible that different organizations take over the “maintenance” of different sub-indices and then host them. E.g. an index for geosciences, an index for financial markets and so on.

It is important to understand that our primary goal is not to have our own search engine, but to collect web data, process it and make it partionable. Partions can be found and used to build own services – these can be search services, these can be LLMs but also any other services.

It is important to understand that our primary goal is not to have our own search engine, but to collect web data, process it and make it shareable. Portions of the index can be used to create one’s own services – these can be search services or LLMs, but also any other services.”

Will Search Engine Optimization (SEO) still be necessary in the future with publicly available OWI? Or can website operators then concentrate on offering information in a meaningful environment and in a well-structured way?

Answer by Prof. Dr. Michael Granitzer, scientific director of OpenWebSearch.eu.

“It is our hope that it would be more about content-based SEO, rather than pure marketing/sales. An open index should allow for multiple different search engines – from Pokemon search engine to Google search, which also means SEO can no longer focus on the top 3 results of a provider.

As a result, the SEO structure will have to change, and it can be assumed that search optimization will then be more up to the individual search engine operators.”

At what rhythm could the OWI be updated? At what rhythm should it be continuously updated?

The index itself should be updated continuously.

Can a publicly managed Index compete with an Index run by a private company?

OpenWebSearch.eu is not trying to compete with other big search engines for several reasons.

Firstly, it is not the objective of OpenWebSearch.eu to enter the targeted advertising market, which is usually a key focus of the major search engines.
Secondly, the aim is not to build a single large search engine, but to develop a comprehensive ecosystem of search engines and additional applications.
Thirdly, the Open Web Index differs significantly from US-centered search engines in that it focuses on the European Union and emphasises European values, languages, and cultural diversity. Additionally, it is subject to European law and jurisdiction.

The ows.eu project participants always emphasize that it is not about a European alternative to Google or Bing. At the same time, they are aiming to include half the text-based Internet (~5 petabytes) and, in the long term, even the entire text-based Internet in the OWI. Surely a search engine operator should then be able to build up a profound alternative even to Google relatively quickly?

Answer by Prof. Dr. Michael Granitzer, scientific director of OpenWebSearch.eu

“Yes, absolutely, but we don’t see this search engine as the main task of the OpenWebSearch.EU project. We want to do the heavy lifting of the data and we hope that other players – search engines, AI start-ups, scientists – would then build on it. The data and the index are the core elements of web search that are the most technically complex. And this is exactly what we want to open up so that others can take over.”

The project goal is to cover about 50% of the current text-based web in the OWI (about 5 petabytes). Does this mean that an OpenWebIndex of this size should be available at the end of the EU project period?

Answer by Prof. Dr. Michael Granitzer, scientific director of OpenWebSearch.eu

“Yes, that is at least the goal. When the processes and systems are up and running, it should be “only” a question of providing infrastructure, i.e. for storage. The question is much more how to ensure sustainability and what the index will be used for afterwards. For multimedia data and also social media the integration is technically more complex.”

How will I as a developer or company be able to use the Open Web Index?

The Open Web Index aims to empower companies and non-commercial organisations of all sizes across Europe to develop applications, including search engines and Artificial Intelligence (AI) applications (such as LLMs), utilising the index.

Companies, NGOs, scientists and other organisations will be able to download customizable segments of the index to create an application.

If you have an interest in collaborating as a data centre, researcher, company, individual developer, or a non-profit organisation, please send an email to join@openwebsearch.eu.

How can I help/support and become part of the community?

There are several possibilities:

To get involved with the EU project, you can apply to open calls and get funded under the OpenWebSearch.eu Community Programme. Successful third-parties will be integrated into ongoing and future activities for sustainable Research and Development on Open Web Search.

To support the Open Web Search initiative, check out this list of organisations and people we are looking for:

Data centres to help hosting a distributed Open Web Index – this could be also for specific communities, regions or purposes,
Researchers and technical innovators to develop new search and retrieval paradigms or content analysis algorithms,
Industry and business partners to discover the commercial potential of an Open Web Index in new (or old) business models,
Policy makers to help shaping the governance of an unbiased, fair and privacy-preserving open search ecosystem.

To keep in touch with these possibilities or to join us send an email to join@openwebsearch.eu