Fighting Misinformation with the Open Web Index: Inside the VERITAS project
How a European research team built a browser-based fact-checking assistant powered by the #OpenWebIndex
In an information environment where misleading claims can spread in real time, the ability to verify what you read online is a necessity. The VERITAS project, funded under the European OpenWebSearch.EU project and conducted by DEXAI, set out to build a practical tool for exactly this purpose: an AI-powered assistant that sits in your browser, answers your questions with sourced evidence, and draws its knowledge not from a proprietary index controlled by a single corporation, but from an open, European web search infrastructure.
The Problem: Misinformation and the Limits of Conventional Search
The War in Ukraine has been accompanied by an unprecedented volume of online misinformation – from fabricated reports and manipulated imagery to subtly misleading narratives. For journalists trying to verify claims, researchers analysing media coverage, and ordinary citizens attempting to understand what is actually happening, conventional search engines offer limited help. They return ranked lists of links, but they do not assess the credibility of sources, provide citations for specific claims, or explain the basis for their answers. The burden of verification falls entirely on the user.
At the same time, the emergence of AI chatbots has introduced a new set of problems. Large language models can produce fluent, confident-sounding answers that are entirely fabricated – a phenomenon known as hallucination. Without mechanisms to ground their outputs in verifiable evidence, these systems risk becoming part of the misinformation problem rather than the solution.
The Approach: Retrieval-Augmented Generation via the Open Web Index
The DEXAI/VERITAS team adopted an approach known as RAG (Retrieval-Augmented Generation). The core idea: instead of prompting an AI model to generate answers from whatever it absorbed during training, you first retrieve relevant documents from a trusted knowledge base and then ask the model to compose its answer based specifically on those documents. Every claim in the response can thus be traced back to an identifiable source.
Rather than relying on a commercial search engine or a static dataset, the system draws its evidence from the Open Web Index (OWI).
In practice, the system works as follows. The VERITAS pipeline fetches the latest crawled web pages from the OWI (pulling the latest 30 days of crawled content). The dataset is then indexed using a semantic embedding model, which converts text passages into numerical vectors that capture their meaning. When a user poses a question, the system converts it into a similar vector, finds the most relevant passages in the index, and passes them – together with the question – to a language model (LLaMA 3.1), which generates a grounded response.
The chatbot supports multiple audiences, offering background information with explicit citations for journalists, metadata-rich summaries for researchers, and concise, jargon-free explanations for the general public.
What They Built: A Fact-Checking Assistant in Your Browser
The finished product is a Chrome browser extension. Once installed, it provides a small popup where users can type questions in natural language and receive answers accompanied by source references.
The system is designed to serve different types of users in different ways. Journalists receive background information with explicit citations. Researchers get metadata-rich summaries. Members of the general public are given concise, jargon-free explanations. In the current prototype, the system is focused specifically on the War in Ukraine – a deliberate scoping decision that allowed the team to develop and validate the approach within a well-defined domain.
Why It Matters: Open Infrastructure for Trustworthy Information
VERITAS is significant not only for what it does, but for how it does it. By building on the Open Web Index rather than a proprietary data source, the project demonstrates that open search infrastructure can serve as the foundation for practical applications.
The RAG approach itself addresses one of the most persistent criticisms of AI-generated text: the lack of verifiability. By requiring the model to base its answers on retrieved documents and by presenting those documents to the user, VERITAS moves away from the “trust me” paradigm of conventional chatbots towards a “check for yourself” model of AI-assisted information access.
What’s next?
Future development could also introduce user feedback mechanisms, allowing the quality of responses to be improved over time, as well as streaming responses for a more interactive user experience. Perhaps most importantly, the VERITAS approach could be applied to other domains where information verification is critical – from public health to climate science to electoral integrity.
An outstanding challenge overall lies in the growing complexity of multi-model ecosystems. As retrieval systems, ranking components, embedding models, and large language models are increasingly combined—often across organisational and infrastructural boundaries—the integrity of the final answer depends on the entire chain. Outputs generated by one model may be ingested, summarised, or re-ranked by another, creating feedback loops that are difficult to detect and audit. In such environments, disinformation cascades can emerge when misleading or low-quality content propagates across interconnected systems, gaining credibility through repetition and algorithmic reinforcement. Ensuring traceability, cross-model accountability, and robust provenance mechanisms will be essential to prevent systemic amplification of false or manipulated claims.
To read the full technical report, go here: https://zenodo.org/records/17588890
The VERITAS project was funded under the OpenWebSearch.EU project (Horizon Europe, Grant Agreement 101070014, Call #2).




