News

The case for data catalogues in materials science

September 23, 2025

Data cataloguing is the process of organising and describing datasets so they can be easily found, understood, and reused.

There are vast amounts of data available within materials science and only a fraction of these are published. Such data can come from research and development or manufacturing and span experimental data, simulations and may be from diverse laboratories. As a result the metadata — for example, how measurements were taken or simulations were set up — is often lacking and inconsistent and experimental protocols can vary. Data is often lost or inaccessible, so not reusable in any way.

Be FAIR

We urgently need to make these swathes of data more coherent and make the most of the knowledge and insights that can be gained. Thorough data cataloguing can lead to significant improvements. And we are beginning to see changes, as many groups and organisations strive to develop standard repositories, shared metadata, ontologies and federated data systems.

Recently, Gerhard gave a keynote talk on behalf of the European Materials Modelling Council (EMMC) entitled: EMMO Ontology: enabling AI-based innovative advanced materials development: the CoBRAIN Knowledge Base for Hardmetal Thermal Spraying Coatings.

This presentation highlights the challenges of FAIR data management in materials sciences, experience of an EU project (CoBRAIN) with building a knowledge base from the project data, lessons learnt and future outlooks. Much data is lost, so we urgently need to address this issue, otherwise materials science will not be able to maximise the opportunities of AI.

The talk also provides details about a data community effort to develop a ‘data cataloguing standard’ for materials data (a so-called Materials DCAT-AP).

Cooperative solutions

Similarly, the Research Data Alliance (RDA) aims to make it easier for researchers to share and reuse data by building both social and technical bridges. The RDA develops community-driven standards, tools, and best practices that promote data-sharing and data-driven research aims to foster interoperability, collaboration and inclusivity across disciplines.

Anyone interested in learning more about data cataloguing, FAIR maturity of materials data and best practices may be interested in attending a workshop organised by the Harmonised terminologies and schemas for FAIR data in materials science and related domains WG entitled: “Data Cataloguing for Materials Science and related domains”.

The meeting will take place, online, on Wednesday 24 September 2025, 11.30am–1.30pm UTC and will be supported by RDA TIGER and RDA Europe. For full information about the event including joining instructions, visit the post on the working group’s website.

Here is a copy of Gerhard’s presentation:

Acknowledgement

This work described in the presentation on behalf of the EMMC and has received funding from European Commission under the European Union’s Horizon Research and Innovation programme, GA Nos. 101092211 (CoBRAIN), 101091912 (AID4GREENEST), and  101137725 (BatCAT), and by the UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee, GA no 10091190 (BatCAT).

Share this article:

Abstract network with connecting lines.