Third-party Call #2
Update per 4 April 2024: submission for Call 2 was closed on 4 April 2024, 17 CEST.
Following the first third-party call in 2023, the OpenWebSearch.eu Community Programme once again calls for third-party contributions.
In this call, the OpenWebSearch.eu consortium asks for ideas for new search and discovery applications, content analysis methods, search paradigms, data products or extend the platform in relevant ways.
Fundings can be granted over a period of 12 months with up to 100,000 Euros. See modalities for more details.
Dates and Modalities
Opening date: 8th February 2024
Closing date: 4th April 2024, 17:00 CEST
Notification date: June 2024
Start of projects: June/July 2024
Possible funding: 50,000 – 100,000 EUR (with a limit for a single organisation of EUR 150,000 of cumulative funding across all OpenWebSearch.eu calls)
What?
Third-party projects should explore closely related topics of the project and help advance the development of the OWI.
Call #2 particularly addresses proposals for applications of the Open Web Index. OpenWebSearch.eu project will provide access to both, pre-processed and indexed data in the Terabyte range and deliver continuous, daily updates until the end of the project.
Who?
In particular, we are targeting smaller companies (e.g. SMEs, start-ups), individual innovators, individual researchers or research teams (e.g. doctoral or post-doctoral researchers) from renowned universities.
Eligible applicants are individuals residing in EU Member States or Horizon Europe Associated Countries, or organisations registered in EU Member States or Horizon Europe Associated Countries.
Third-party Funding
Successful applications can request funding between 50,000 and 100,000 EUR in this second call for a funding period of up to 12 months.
Note that there is a limit for a single organisation of EUR 150 000 of cumulative funding across all OpenWebSearch.eu calls.
Call topic: Applications of an Open Web Index
OpenWebSearch.eu aims at building and piloting a foundation for commercially exploitable applications based on a European Open Web Index. Call #2 hence particularly addresses proposals for applications of the Open Web Index. OpenWebSearch.eu project will provide access to both, pre-processed and indexed data in the Terabyte range and deliver continuous, daily updates until the end of the project. Successful applicants should use and exploit the data provided in innovative applications scenarios or research topics.
OpenWebSearch.eu consortium asks for technical, algorithmic and software contributions, focused on developing potential components and applications of the pilot open search infrastructure, including machine learning or AI models.
Successful applications will technically define and develop new search and discovery applications, content analysis methods, search paradigms, data products or extend the platform in relevant ways.
We address innovators and business that (i) either build search verticals on top of our pilot infrastructure thus demonstrating its applicability, or (ii) extend the pilot infrastructure to relevant areas, like new content analysis methods, or (iii) develop interesting data products on top of the Open Web Index. Results must be made available as Open Source and/or Open Data, with documentations or experimental work published in as Open Access.
Topics could belong to the following areas, but are not limited to
- Innovative vertical search applications (e.g. Kids search, mobile search, science search, argument search);
- Retrieval augmented generation, conversational search and search based on Large Language models;
- Geo-location based search and hybrid search (i.e. search settings that combine specialised data and web data);
- New search scenarios like personal search, human-centric search, corporate search, hybrid search, privacy aware search;
- Approaches for transparent and privacy aware searching;
- New search paradigms, search interactions and search user interfaces;
- Evaluation or simulation of search systems;
- Standards for Open Search;
- Efficient pre-processing and indexing methods like vector embeddings, content quality estimates;
- Web Analytics at different scale.
Target Audience
The calls target especially smaller companies (i.e., SMEs, start-ups), individual innovators, individual researchers or research teams (e.g., doctoral or post-doctoral researchers) from renowned universities.
The eligible applicants for this opportunity are either:
- Individuals who are citizens or residents of any EU Member State or any of the countries associated with Horizon Europe; or
- Organizations that are registered in any EU Member State or any of the countries associated with Horizon Europe.
Please note that the list of associated countries may change over time, and it is recommended to check the latest list of eligible countries before applying.
The third-party calls particularly focus on the following categories of applicants:
- Academic researchers and research groups in universities or research centres or R&D focused organisations;
- Renowned experts, individuals and scholars or associations;
- High-tech start-ups, SMEs, or industry with a focus on Web technology or software development
- Outstanding individual open-source innovators / researchers and experienced individual developers / researchers;
- Other multidisciplinary actors.
Applications can also involve teams of different organisations or teams of natural persons. In case of team applications (i.e., multiple natural persons without an organisational entities or multiple organisational entities), one team member must take the role of main contact point and legally responsible party.
Submission procedure
The applicants have to submit their appropriately formatted proposal by email to the call management by the given deadline. Please use the proposal template for applications. Applicants can submit at most one application per call. English is the main language for communication with the OpenWebSearch.eu consortium, and all submitted documents must be written in English.
The submission will be acknowledged by the call management, and, only after the confirmation, the proposal can be considered as being submitted. It is advised not to wait till the last moment with the submission. Note that multiple submissions of the same project are not accepted. If the submission is not confirmed within max. 2 days, and provided the call deadline has not yet passed, you may contact the Grantor at call2@openwebsearch.eu to request the information and ask for re-submission
Get the Call Package here
The package consists of two documents: The information for applicants and the proposal template.
- The “Information for applicants” contains all information about the call, the procedures and the application process as well as the legal information.
- The “Proposal Template” contains the actual application form. For full functionality, open this proposal template in Adobe Acrobat or Adobe Acrobat Reader.
FAQs Third-party Call #2
Status Crawling
We have crawled around 1.3 Billion URLs in 185 different languages from 28 million hosts. That’s around 60 TiB in total.
Currently we are crawling 1TiB per day. This will be further expanded. An integration of Common Crawl is planned.
Further information on our initial crawler setup is accessible via our Zenodo community and the OWLer Crawler Web Page.
(March 2024)
Status Preprocessing
Our preprocessing pipeline, details of which are available in our open-source repository (or on Zenodo), extracts various features from the web pages. This includes plain text, titles, structured metadata, genre, and domain labels, with ongoing efforts to expand the types of metadata extracted.
We will also include named entities and geo-locations in the data and are exploring ways to identify pages restricted from use in generative AI applications.
Our comprehensive strategy and future plans for the OpenWebSearch Engine Hub, dedicated to declarative search engines, are documented at https://zenodo.org/records/10369512, and guidance on using the first version of our toolchain for creating a local index is provided here.
(March 2024)
Accessing the Data
We are currently working on releasing daily snapshots of our data, aiming for a one-day lag.
The data will be provided in two formats:
(i) parquet files containing cleaned text and metadata and
(ii) CIFF files containing an inverted index that can be used in search libraries like lucene or pyterrier.
A preview of this offering can be seen in a small excerpt of one day’s worth of crawled, preprocessed, cleaned, and indexed data.
(March 2024)
Status Open Web Infrastructure
We are currently setting up the download facilities for third-party partners.
As a research project we currently don’t have the capacities to offer an API for accessing the indexed Data, but there are initial tutorials demonstrating the utilization of the OWS.eu. We are constantly working on expanding them.
Data access will be facilitated through the LEXIS Portal, which supports high-bandwidth downloads, and a dashboard currently in development that will enable the transfer of smaller data excerpts to other systems.
(March 2024)
General Project Status
For more information on the project’s status, please visit the status page.
Are UK entities eligible for OWS.eu call #2?
Yes.
Can I apply if I have already applied for a previous call?
Yes, you can. But please note that there is a limit for a single organisation of EUR 150 000 of cumulative funding across all OpenWebSearch.eu calls.
Contact
If you have further questions on the application procedure and third-party call related activities of OpenWebSearch.eu project, feel free to contact us.
We also maintain a Community Mattermost Channel with more in-depth discussion and can onboard you there if you are interested.