Building trustworthy access to medical information: Inside the TILDE project

The TILDE project builds a health search system that doesn’t just find answers – it checks them for bias, explains the underlying reasoning, and let’s users explore the evidence visually

Search for a health question online and you will get plenty of results. But how much can you trust what you find? Are the top results there because they are the most accurate, or because they happen to be the most popular? Are they showing you the full picture, or a skewed one – biased toward certain demographics, viewpoints, or types of sources?

The TILDE project – Trustworthy Access to Knowledge from the Indexed Web – funded under the European OpenWebSearch.eu project and carried out by Know Center Research GmbH in Austria, tackles this issue directly. It builds a health domain search system on top of the Open Web Index that goes beyond finding relevant documents to actively examining search results for bias, ensuring viewpoint diversity, and providing  visual tools to help users explore the evidence for themselves.

The Problem: Bias in health search

Health information is one of the most searched-for categories on the web, and also one of the most consequential. A search for COVID-19 treatment options, for example, should ideally return results that are medically accurate, drawn from credible sources, and representative of different perspectives – official health guidance, clinical research, patient experiences. In practice, standard search systems optimise for another kind of relevance, which is often approximated by popularity and click behaviour. This can systematically favour certain types of content while marginalising others.
The problem is compounded when large language models are involved. LLM-based systems, including RAG (retrieval-augmented generation) pipelines, inherit and can amplify biases present in both their training data and the documents they retrieve.
A search result list that is geographically skewed, lacks viewpoint diversity, or reinforces stereotypes about particular demographic groups is not just an academic concern – it can directly affect how people understand their health options.

The Approach: Three modules for trustworthy search

TILDE addresses this through three integrated modules, each tackling a different dimension of the problem.The first module extracts medical knowledge from the Open Web Index. Starting from approximately 200,000 health-related websites identified in the OWI, the team extracted medical entities – diseases, symptoms, drugs, procedures – using a named entity recognition model, then standardised these entities against the UMLS clinical ontology (a comprehensive medical terminology system). This creates a structured knowledge layer on top of the raw web content. The extracted entities and their relationships form a medical knowledge graph that links websites to each other and to clinical concepts. A hybrid search engine combines entity-based retrieval (finding pages that mention specific medical concepts) with semantic similarity search (finding pages whose content is meaningfully related to the query), fusing the results to balance precision and recall.

The second module checks search results for fairness and trustworthiness. This is TILDE’s most distinctive contribution. Built on DSPy, a Stanford framework for programmatic LLM pipelines, the trustworthiness module processes search results through three stages. First, each candidate document is enriched with fairness-related attributes: its viewpoint (official guidance, patient narrative, investigative journalism), its source credibility (from high-authority institutional sources down to user-generated content), whether its content is factual or anecdotal, and a gender neutrality score. Second, an intelligent re-ranker uses these attributes to reorder results according to a strict hierarchy: maximise fairness first, then filter for credibility, then ensure viewpoint diversity. The system uses chain-of-thought reasoning, meaning it explains its re-ranking decisions step by step. Third, a stereotype audit inspired by established bias benchmarks checks both the system’s internal reasoning and its user-facing output for harmful stereotypes – a safety net against the system itself introducing bias.

The third module provides visual aids to help users understand the evidence. Rather than presenting search results as a flat list of links, the visual web interface allows users to explore medical information through multiple lenses: highlighted medical concepts within document text, faceted search by entity type, tag clouds and bar charts showing the frequency of different symptoms or drugs across results, co-occurrence matrices revealing relationships between medical concepts, and an interactive knowledge graph that can be expanded and filtered.

Why It Matters: Making fairness operational

There is no shortage of academic research on bias in search systems. What is less common is work that takes established fairness metrics – like the NFaiRR measure of retrieval fairness – and turns them into actionable components within a working search pipeline. TILDE does exactly this. The re-ranking module does not merely measure bias after the fact; it actively uses fairness criteria in real time to reorder results, while maintaining credibility and diversity as additional constraints. The chain-of-thought reasoning makes the process transparent: users and auditors can see why results were ranked the way they were.

The health domain is the testbed, but the approach is not limited to it. The same pipeline – entity extraction, hybrid retrieval, fairness-aware re-ranking with transparent reasoning, and visual analytics – could be applied to any domain where search results carry real-world consequences: legal information, financial advice, educational content, public policy. The fact that it is built on the Open Web Index, rather than on a proprietary search engine, means the underlying data is open and the approach is reproducible.

What’s Next

The immediate next step is completing the integration of the hybrid search across all components of the visual interface. Longer-term priorities include optimising the trustworthiness pipeline for real-time performance, extending the approach to additional health sub-domains, and conducting user studies to understand how fairness-aware re-ranking and visual analytics actually affect the way people seek and evaluate health information.

To read the full technical report, go here: https://zenodo.org/records/17542369

The TILDE project was funded under the OpenWebSearch.eu initiative (Horizon Europe, Grant Agreement 101070014, Call #2).