
On 6 June, the OpenWebSearch.eu consortium released a pilot of a new infrastructure that aims to make European web search fairer, more transparent and commercially unbiased. With strong participation by CERN, the European Open Web Index (OWI) is now open for use by academic, commercial and independent teams under a general research licence, with commercial options in development on a case-by-case basis.
The OpenWebSearch.eu initiative was launched in 2022, with a consortium made up of 14 leading research institutions from across Europe, including CERN. The project aims to build a public web index that offers an alternative to existing indexes held by companies like Google (USA), Microsoft (USA), Baidu (China) and Yandex (Russia). Web indexes provide the back-end data infrastructure behind search engines, and today the companies that manage them determine what content is searchable and how it is ranked. Currently, Europe does not have a search index of its own, making it vulnerable to digital dependence.
The OWI offers a clear alternative based on European values. The project's cross-disciplinary nature, ensuring continuous dialogue between technical teams and legal, ethical and social experts, ensures that fairness and privacy are built into the OWI from the start. "Over thirty years since the World Wide Web was created at CERN and released to the public, our commitment to openness continues," says Noor Afshan Fathima, IT research fellow at CERN. "Search is the next logical step in democratising digital access, especially as we enter the AI era." The OWI facilitates AI capabilities, allowing web search data to be used for training large language models (LLMs), generating embeddings and powering chatbots.

The CERN team has built key parts of the infrastructure that power the OWI's crawling and indexing capabilities. This means that it tracks which webpages should be scanned. The system handles about 9 million URLs per hour, which equates to roughly 3 terabytes of public web data a day, with the aim of indexing 30-50% of the text-based web by the end of 2025. "We have already hit our target of indexing one petabyte of openly licensed web data, and our public dashboard helps users monitor that progress," says Noor.
CERN is also contributing to other parts of the project. For example, it is scanning its own public physics content to enhance the OWI, as well as developing an internal index and its own search tools and services. Currently, a prototype of a use case for the OWI is in development: known as "Nooon", this research-driven search engine is dedicated to people with disabilities who require search engines that surface structured, accessible and representative information while ensuring privacy in both access and contribution.
The release of the OWI, which has received funding from the European Union's Horizon research and innovation programme, comes at a pivotal time. The European Commission's Invest AI initiative is set to mobilise 200 billion euros for artificial intelligence, and the OWI offers a powerful foundation of open data for innovation. Furthermore, as Microsoft plans to retire access to the Bing index, the OWI will be able to offer an alternative index for European search engines.
After two and a half years of intensive research and development, anybody can now request access to the OWI by signing up at openwebindex.eu/auth/login. Note that the project provides a web index, and not a search engine or API, and users wishing to build their own search engines or chatbots will need a working knowledge of how to apply web index data.