WOWS Calls for Participation are open!

The international Workshop on Open Web Search takes place for the  3rd time this year from ..to…. As a side event of ECIR 2026.

The Third International Workshop on Open Web Search (WOWS) aims to promote and discuss ideas and approaches to open up the web search ecosystem so that small research groups and young startups can leverage the web to foster an open and diverse search market.

Therefore, the workshop, which takes place at ECIR2026, has two calls that support collaborative and open web search engines: (1) for scientific contributions, and (2) for participation in the WOWS-Eval shared task for collaborative evaluations of the Open Web Index.

The organizers are now calling for papers and participation (deadline 26 February).

Topics include 

  • Crawling for an Open Web Index, Collaborative crawling
  • Web deployment of search engines
  • Standards for search and Interoperability
  • Large scale web data pre-processing components or pipelines
  • Pre-processing and Enrichment
  • Indexing and Search Architectures
  • Open infrastructures for evaluation
  • Open source search engines
  • Open source replicability

To name just a few.

Find all Details here: 

https://opensearchfoundation.org/events-osf/wows2026/

Making Open Maps Richer: Inside the OMMS project

Project OMMS (Open Mobile Maps Search) was conducted by E foundation with the aim to enhance OpenStreetMap data with web data from the #OpenWebIndex to feed these combined data into a competitive Open Source Maps App.

OpenStreetMap data is comprehensive in many areas of the world, but for the purposes of a maps app that aims to compete with Google or Apple’s offerings, the data freshness, data accuracy, and data richness all leave much to be desired. Additionally, OpenStreetMap almost without exception lacks authoritative information from business owners about their point of interest (POI). 

However, most larger businesses have invested some amount of energy into Search Engine Optimization (SEO), which involves surfacing this information online to be crawled and indexed by search engines. For the purpose of the project E Foundation used the OpenWebSearch tools to crawl and provide to users web-based POI information about businesses in the open- source and open- data mobile maps application of E Foundation.
The goal was to create a compelling mobile maps experience for users on mobile that will allow users to confidently explore, learn about and navigate to points of interest nearby. 

As a concrete solution e- Foundation set out to:
  • send the OpenWebSearch.eu team a list of URLs that are of interest to them so that OpenWebSearch can provide fresh crawl data as parquet files. 
  • create a web API that accepts a point of interest’s URL and returns information about that point of interest in a format that a maps app can easily ingest.
  • Additionally, E Foundation aimed to create a proxy that augments API responses from Pelias with metadata parsed from structured data provided by OpenWebSearch.

We will start with opening hours and contact information, and from there expand to images, services offered, FAQs and anything that we feel may enrich the user’s experience in a mobile UI.” stated the project team at the start of the project. 

Results

At the end of the project time, E Foundation has developed the following pieces of software: 

  • URL list generator to iterate through an OpenStreetMap extract and create a list of URLs to be crawled for structured data.
  • Batch processing program to transform the resulting Parquet files into .osc (OpenStreetMap changeset) files which amend OSM features to include fresh opening hours. This is option 1 for consuming crawl data.
  • Batch processing program to ingest the resulting Parquet files into a PostgreSQL database for use in a Point of Interest information server. This is option 2 for consuming crawl data.
  • POI Server. This connects to the PostgreSQL database and serves freshly updated opening hours, contact information, and FAQs for clients via an HTTP API. 
Difficulties

The team had initially started by crawling websites associated with points of interest in the metropolitan area surrounding Seattle, Washington, USA. Once they validated that the software worked, they moved on to crawling POI websites for the entire planet. They found that approximately 12% of the websites attached to OpenStreetMap points of interest contain structured data. This amount varies by the type of POI. Websites for department stores and fast food restaurants contain structured data more often than other POIs. 

The team ran into some trouble parsing opening hours. Some points of interest have opening hours that don’t conform to the standard format, but more often points of interest have opening hours listed for different parts of the same store. For example, grocery stores with pharmacies attached may have hours listed separately on the website, e.g. Mo-Fr 08:00-20:00, Mo-Fr 09:00-17:00. To avoid updating hours incorrectly, ambiguous data such as these were discarded. 

What’s next?

Being pleased with the achieved results, the team acknowledges that there are still many opportunities to be tackled to improve the completeness and accuracy of POI data. The main recommendations concern further services concerning Ranking and Backlinks. 

Open-data geocoders often struggle to rank points of interest in textually ambiguous queries because they don’t have context on which POIs are more commonly searched for. Traditional ranking systems like BM25 typically prevent this from happening, but they’re far from perfect.
Search giants use past user behavior to help rank results, but open-data geocoders don’t have this luxury. However, OpenWebSearch is well-positioned to publish POI rankings based on PageRank or a similar algorithm. Open-data geocoders could ingest this and use it to augment their existing ranking algorithms. 

Another problem with open-data maps apps is they generally lack the richness that comes from years of collecting user-generated data like reviews and photos. Fortunately, much of this information is published to the public internet on e.g. food blogs and travel information websites. 

We would like to explore using backlinks as an imperfect substitute for user-generated reviews and ratings. For example, a point of interest information page for a restaurant could contain links to a few food blog posts about that particular restaurant. Discovery of these backlinks is something that OpenWebSearch is uniquely positioned to do, and we think this is a very promising line of work to explore.” summarizes the OMMS team. 

To read the full report, go here: https://zenodo.org/records/17815218

The OMMS project was funded under the OpenWebSearch.EU initiative (Horizon Europe, Grant Agreement 101070014, Call #2).

Plugging a university supercomputer into Europe’s Open Search infrastructure: The NordLink project

The University of Oldenburg connects its data centre and HPC cluster to OpenWebSearch.eu, demonstrating how academic infrastructure can add to a distributed European web index

A European open web search infrastructure should not depend on a single data centre or a single organisation. Instead, the idea is to connect different institutions, different countries, different kinds of computing resources, who all contribute to the same shared infrastructure. The NordLink project, carried out by the University of Oldenburg following the OpenWebSearch.EU third-party funding call, is a concrete step in that direction: it connects a university’s high-performance computing resources to the already existing OpenWebSearch network.

What NordLink brings to the table

The University of Oldenburg’s contribution is not trivial. The resources committed to the project include 50 terabytes of S3-compatible cloud storage, two dedicated physical servers with 200 terabytes of combined storage, three virtual machines for testing and deployment, and – most notably – access to the university’s HPC cluster. This is a serious piece of computing infrastructure: 161 nodes, over 20,000 CPU cores, 145 terabytes of RAM, and 36 high-end NVIDIA GPUs (including A100 and H100 models) with a combined peak GPU performance exceeding 2 TFlop/s. The storage subsystem provides more than 4 petabytes of capacity.

For context, this is the kind of computing power typically used for large-scale scientific simulations, machine learning training runs, and data-intensive research. Making it available to the OpenWebSearch.eu project beautifully demonstrates that European academic HPC centres can play a meaningful role in search infrastructure – a domain traditionally dominated by commercial tech companies.

The Integration Challenge

The primary technical challenge for NordLink was integrating the university’s resources with the OWS infrastructure through HEAppE, a middleware system designed to provide HPC-as-a-Service capabilities. This middleware allows remote users and automated systems to submit computing jobs to the university’s cluster without needing direct access to the local systems.

The NordLink team deployed HEAppE on both virtual machines and physical servers, set up comprehensive monitoring using Prometheus and Grafana, configured the university’s S3 storage as a data staging area for the project, and provided the IP addresses of both physical servers and virtual machines for whitelisting to enable web crawling. A functional account was created to link the physical infrastructure to the HPC cluster, enabling job submission from the OWS network.

Challenges to consider

The team reported that the documentation for the HEAppE middleware was incomplete and difficult to follow, making deployment more laborious. Notably, the infrastructure provider EXAION reported the same issue in their final report as well.

Why university infrastructure matters for Open Search

European universities collectively operate enormous computing resources. HPC clusters, large-scale storage systems, high-bandwidth network connections, and skilled technical teams exist across hundreds of institutions. Most of this capacity is used for scientific research – climate modelling, genomics, particle physics, engineering simulations. But much of it also has periods of underutilisation, and the skills required to operate it overlap significantly with those needed for a web search infrastructure.

NordLink demonstrates that these resources can be connected to a shared infrastructure with reasonable effort.

What’s Next

Beyond the formal project period, the University of Oldenburg plans to maintain all committed infrastructure for a while – the S3 storage, physical servers, and virtual machines – and to complete the HEAppE integration with the HPC cluster. The team is also considering provisioning additional VMs running search index software such as OpenSearch or Vespa.ai, which would allow the university to host a searchable subset of the Open Web Index locally.

In conjunction with EXAION’s contribution from France, NordLink underpins the kind of infrastructure network that OpenWebSearch.eu is building: a distributed system where European organisations of different types – universities, data centre operators, research institutions – contribute to a shared, sovereign search infrastructure.

To read the full technical report, go here: https://zenodo.org/records/18259771

The project was funded under the OpenWebSearch.EU initiative (Horizon Europe, Grant Agreement 101070014, Call #3).

Building sovereign infrastructure for Open Web Search: Inside the EEI project

The French eco-responsible infrastructure provider EXAION (therefore the project name EEI) provides GPU-powered computing to the OpenWebSearch.eu project

The OpenWebSearch.eu research project aims to create and maintain an independent, open web search infrastructure, based in Europe. In order to establish powerful, sustainable and reliable Open Web Search services, a robust physical infrastructure is a basic requirement. The servers needed for crawling the web, processing and indexing billions of pages, have to exist somewhere. And where they exist, and who controls them, matters. In this context, the EEI project, funded under the OpenWebSearch.eu project, contributes high-performance computing infrastructure hosted in France, managed by European teams, and operated under European regulatory frameworks.

What EXAION provides

Exaion committed to providing GPU-accelerated bare-metal servers and virtual machines in its data centres. The hardware includes servers equipped with NVIDIA RTX A6000 GPUs – powerful graphics processing units increasingly used not just for rendering but for the computationally intensive tasks that modern search infrastructure demands, from training machine learning models to running web crawlers at scale.

The company commits to using circular-economy IT equipment – refurbished or second-life hardware – wherever possible, and to deploying open-source solutions in line with the broader ethos of the OpenWebSearch.eu project. All operations comply with GDPR and relevant European regulations.

What was done

The project unfolded in two phases over the course of a year. The first phase (September 2024 to March 2025) focused on setting up the infrastructure: deploying virtual machines, establishing Grafana monitoring systems, and assessing the feasibility of various integration options with the OWS technology stack. Some planned deployments, such as HPC middleware, were deferred because the matching use cases had not yet materialised.

The second phase (April to August 2025) delivered the project’s core objective: deploying and running the MASTODON crawler – one of the crawling components of the OpenWebSearch.eu infrastructure – on Exaion’s GPU servers. The experiment was tested and validated by Prof. Michael Granitzer from the University of Passau, who coordinates the overall OpenWebSearch.eu project. The crawler ran on five virtual machine instances, demonstrating that real OWS workloads can be effectively executed on sovereign European infrastructure.

Why it matters

Data sovereignty is not just about where data is stored but about who controls the infrastructure that processes it. The OpenWebSearch.eu project is designed as a distributed, cooperative infrastructure from the grounds up. Having computing resources available in multiple European locations, operated by different organisations, reduces single points of failure and concentration risk. Moreover, EXAION’s commitment to circular-economy hardware and direct management without subcontractors demonstrates that sovereign infrastructure can also be sustainable infrastructure.

What’s next?

Current ideas involve an extension of the partnership to include high-performance computing use cases with SIMVIA.

Find the full project report here: https://zenodo.org/records/17777285

The project was funded under the OpenWebSearch.eu initiative (Horizon Europe, Grant Agreement 101070014, Call #3).