How Dutch municipalities are sharing Search Intelligence to serve citizens better: Inside the CIFFIL Service project

The CIFFIL Service project shows that open web index standards can help small municipalities improve their search quality by accessing results from larger ones

Search engines work best when they have a lot of data to learn from. The more documents in a collection, the better the system can distinguish between common words and genuinely informative ones – and therefore the better it can identify what is relevant to a query. This is a well-known principle in information retrieval, and it creates an obvious problem for anyone who needs to search a small collection of documents: the search results are simply not as good as they could be.

The CIFFIL Service project, funded under the European OpenWebSearch.EU project, tackled exactly this problem – in a setting with direct consequences for citizens. Spinque, a Dutch search technology company, builds search systems for municipalities that allow council members and residents to search through publicly available government documents. Some of these municipal collections are small, containing fewer than 10,000 documents, and are full of domain-specific jargon. The result is that search quality suffers. The CIFFIL project set out to fix this by allowing municipalities to share their search index data with one another through an open standard.

The problem: Small collections, unreliable statistics

Most search engines use some variant of a ranking algorithm called BM25. At its core, BM25 judges the relevance of a document to a query by looking at how often the query terms appear in the document and how rare those terms are across the collection as a whole. Terms that do not appear in a lot of documents signal relevance.

This is where small collections often fall short. When a collection has only a few thousand documents, the estimates of how common or rare a term is very unreliable. The ranking algorithm, relying on these skewed statistics, makes poor decisions about what is relevant. The result for the user is a search experience that feels hit-or-miss.

The solution: Sharing index data through an open solution

The CIFFIL team’s approach is simple. If a small municipality’s search system suffers from unreliable statistics because its collection is too small, why not supplement those statistics with data from a larger municipality that deals with similar types of documents? After all, Dutch municipal documents share a common vocabulary of administrative, legal, and policy language.

The technical mechanism for this sharing is the Common Index File Format, or CIFF – an open standard developed in the information retrieval research community for exchanging inverted index data between systems. An inverted index is the core data structure behind a search engine: it maps every term in a collection to the documents in which that term appears, along with statistics such as how often it appears and in how many documents.

Spinque integrated CIFF support into its search platform, Spinque Desk. This involved building a CIFF reader (to import index data), a CIFF writer (to export index data), and – critically – a modified BM25 ranking component that can combine the statistics from a local collection with those from an external CIFF index. When a small municipality’s search system uses this combined approach, it effectively “borrows” the larger municipality’s understanding of which terms are common and which are rare, while still searching its own documents.

Proof of concept

The team implemented tests for the functionalities implemented for this project. Specifically, they did manual testing by doing experiments using CIFF exports to see if they could replicate effectiveness results on open datasets. Additionally, they implemented unit tests to ensure the parser and writer were producing indexes according to the CIFF specifications.

The results were clear. The small collection performed substantially worse than the baseline, confirming that skewed statistics degrade search quality. But when the small collection borrowed statistics from the larger one, performance not only recovered but actually slightly exceeded the baseline – because the small collection, now ranked with accurate statistics, contained a higher concentration of relevant documents.

In practice

The project created CIFF indices for four major Dutch municipalities: Amsterdam, Utrecht, Nijmegen, and Almere. A live deployment was initiated for the municipality of Nieuwegein, a smaller city near Utrecht, using the Utrecht index as the background collection. Evaluation of the real-world impact on user experience is ongoing.

All of the CIFF tools developed during the project have been released as open-source software, and the export service ensures that published indices are automatically updated when the underlying data changes.

Why it matters

The CIFFIL project illustrates a principle that is central to the OpenWebSearch.eu core idea: that open, interoperable standards can enable forms of cooperation that proprietary systems cannot. By sharing index statistics through CIFF, municipalities can improve their search quality without sharing their actual documents, without depending on a single commercial provider, and without each needing to build a large collection of their own. It is a form of search infrastructure as a public good.

The approach is also notable for its simplicity. It does not require neural models, large language models, or expensive computational resources. It works by making better use of data that already exists, through a well-understood ranking algorithm and an open file format.

What’s next

The immediate priorities are completing the open publication of all four municipal indices, conducting user-experience evaluations in the live deployments, and publishing the experimental findings as a research paper. Longer-term, the approach could be extended to additional municipalities and to other domains where small document collections need better search – such as cultural heritage institutions, local archives, or specialised libraries. The underlying principle – that sharing standardised index data can improve search quality without centralising control – has broad applicability wherever open, cooperative search infrastructure is valued.

Find the full project report here: https://zenodo.org/records/17750643

The CIFFIL Service project was funded under the OpenWebSearch.EU project (Horizon Europe, Grant Agreement 101070014, Call #2).