Opting-Out
Your control over your online presence is paramount. If you prefer to keep our web crawler from accessing your site, you can do so by updating your website’s robots.txt file. Just add our user-agent identifier Owler, which represent our main and experimental crawler. To prevent any current or future versions from accessing your site, simply add Owler to the file.
Due to the latest developments in regards to web publishers control, we also support the user-agent identifier GenAI, representing any data use for the purposes of training generative AI models. GenAI is conceptually similar to Google’s proposal of the Google-Extended user-agent. Whereas Google-Extended, GPTBot, Anthropic-AI, etc. are data scrapers particularly restricted to power the respective company’s AI products, GenAI means to provide a more general opportunity for the opt-out from data use related to the development of generative AI applications. OpenWebSearch.EU forwards any information about the publishers’ usage preferences to the users of our web index and all additional data products we publish through an INDEX as well as a GENAI Metadata field, both represented as boolean values.
Please following the step by step guideline bellow:
Guidelines for Updating Your robots.txt File
Adding our user-agent identifiers to your robots.txt file is a simple and effective way to control the access of our web crawler to your site. Here’s a step-by-step guide on how to do it:
1. Access Your Website’s robots.txt File
This file is usually located in the root directory of your site. For example, if your website is www.example.com, you can find the robots.txt file at www.example.com/robots.txt.
2. Edit Your robots.txt File
Open the file with a text editor. This could be any program that lets you view and edit text files – Notepad on Windows, TextEdit on macOS, or a dedicated code editor like Sublime Text or Visual Studio Code.
3. Add Our User-Agent Identifiers
To block our current web crawlers, add the following lines to your robots.txt file. Remember that the matching of user-agent identifiers is case-insensitive, so owler (lower letters) will work as well.
User-agent: Owler
Disallow: /
If you want to web page to be indexed in any Search Application built on top of the Open Web Index, but you still want to protect your web data against the use in the training of generative AI models, add the following lines to your robots.txt file.
User-agent: GenAI
Disallow: /
4. Save Your Changes
After you’ve added these lines, save your robots.txt file and upload it back to your website’s root directory, if necessary.
Remember: the “Disallow: /” line tells the user-agent specified not to crawl any pages on your site. If you want to block only certain pages, you can specify those pages instead of using “/”. For example, “Disallow: /private” would prevent the crawler from accessing any page on your website that includes www.example.com/private.
Feel free to refer to our GitLab repository for any further clarification. If you have additional questions or need assistance, don’t hesitate to reach out.