Making Open Maps Richer: Inside the OMMS project

Project OMMS (Open Mobile Maps Search) was conducted by E foundation with the aim to enhance OpenStreetMap data with web data from the #OpenWebIndex to feed these combined data into a competitive Open Source Maps App.

OpenStreetMap data is comprehensive in many areas of the world, but for the purposes of a maps app that aims to compete with Google or Apple’s offerings, the data freshness, data accuracy, and data richness all leave much to be desired. Additionally, OpenStreetMap almost without exception lacks authoritative information from business owners about their point of interest (POI). 

However, most larger businesses have invested some amount of energy into Search Engine Optimization (SEO), which involves surfacing this information online to be crawled and indexed by search engines. For the purpose of the project E Foundation used the OpenWebSearch tools to crawl and provide to users web-based POI information about businesses in the open- source and open- data mobile maps application of E Foundation.
The goal was to create a compelling mobile maps experience for users on mobile that will allow users to confidently explore, learn about and navigate to points of interest nearby. 

As a concrete solution e- Foundation set out to:
  • send the OpenWebSearch.eu team a list of URLs that are of interest to them so that OpenWebSearch can provide fresh crawl data as parquet files. 
  • create a web API that accepts a point of interest’s URL and returns information about that point of interest in a format that a maps app can easily ingest.
  • Additionally, E Foundation aimed to create a proxy that augments API responses from Pelias with metadata parsed from structured data provided by OpenWebSearch.

We will start with opening hours and contact information, and from there expand to images, services offered, FAQs and anything that we feel may enrich the user’s experience in a mobile UI.” stated the project team at the start of the project. 

Results

At the end of the project time, E Foundation has developed the following pieces of software: 

  • URL list generator to iterate through an OpenStreetMap extract and create a list of URLs to be crawled for structured data.
  • Batch processing program to transform the resulting Parquet files into .osc (OpenStreetMap changeset) files which amend OSM features to include fresh opening hours. This is option 1 for consuming crawl data.
  • Batch processing program to ingest the resulting Parquet files into a PostgreSQL database for use in a Point of Interest information server. This is option 2 for consuming crawl data.
  • POI Server. This connects to the PostgreSQL database and serves freshly updated opening hours, contact information, and FAQs for clients via an HTTP API. 
Difficulties

The team had initially started by crawling websites associated with points of interest in the metropolitan area surrounding Seattle, Washington, USA. Once they validated that the software worked, they moved on to crawling POI websites for the entire planet. They found that approximately 12% of the websites attached to OpenStreetMap points of interest contain structured data. This amount varies by the type of POI. Websites for department stores and fast food restaurants contain structured data more often than other POIs. 

The team ran into some trouble parsing opening hours. Some points of interest have opening hours that don’t conform to the standard format, but more often points of interest have opening hours listed for different parts of the same store. For example, grocery stores with pharmacies attached may have hours listed separately on the website, e.g. Mo-Fr 08:00-20:00, Mo-Fr 09:00-17:00. To avoid updating hours incorrectly, ambiguous data such as these were discarded. 

What’s next?

Being pleased with the achieved results, the team acknowledges that there are still many opportunities to be tackled to improve the completeness and accuracy of POI data. The main recommendations concern further services concerning Ranking and Backlinks. 

Open-data geocoders often struggle to rank points of interest in textually ambiguous queries because they don’t have context on which POIs are more commonly searched for. Traditional ranking systems like BM25 typically prevent this from happening, but they’re far from perfect.
Search giants use past user behavior to help rank results, but open-data geocoders don’t have this luxury. However, OpenWebSearch is well-positioned to publish POI rankings based on PageRank or a similar algorithm. Open-data geocoders could ingest this and use it to augment their existing ranking algorithms. 

Another problem with open-data maps apps is they generally lack the richness that comes from years of collecting user-generated data like reviews and photos. Fortunately, much of this information is published to the public internet on e.g. food blogs and travel information websites. 

We would like to explore using backlinks as an imperfect substitute for user-generated reviews and ratings. For example, a point of interest information page for a restaurant could contain links to a few food blog posts about that particular restaurant. Discovery of these backlinks is something that OpenWebSearch is uniquely positioned to do, and we think this is a very promising line of work to explore.” summarizes the OMMS team. 

To read the full report, go here: https://zenodo.org/records/17815218

The OMMS project was funded under the OpenWebSearch.EU initiative (Horizon Europe, Grant Agreement 101070014, Call #2).