How do we get there? a 7* scheme of getting open FAIR publications

April 28, 2024

Conference:	3rd World Biodiversity Forum
Location:	Davos, Switzerland
Date and Time:	18 Jun 2024, 12:30 PM UTC
Session:	SCICOM_15.1 Opening up and preparing scientific publications for the chatGPR-age

Time: 2:30PM CEST Presenters: Donat Agosti

Research results are published as scientific articles. They represent an intricate network of citations and facts, representing the existing knowledge, as billions of statements. In biodiversity, this includes a corpus of an estimated 500 million pages. A small but growing part is published in a semantically enhanced open access format, but the overwhelming part is behind multiple barriers, from being print only to closed access portable document formats (PDFs). To make use of the emerging AI tools, this corpus needs to be made available in a machine actionable way. At least part of it has to be curated to serve as training material for AI and machine learning. The steps towards fully machine actionable data will be described in this presentation. Starting with print (), print with metadata (), to scan-based PDF with metadata (), text-based PDF with metadata (), ASCI – standard structured XML with metadata (), ASCI – XML with semantic enhancements and metadata () and ending with ASCI – XML with semantic enhancements, attributes and metadata (****). To serve the wider community, the publications have to be open access, infrastructures need to be expanded such as the Biodiversity Literature Repository to allow FAIRizing of data, including specific blocks of text such as taxonomic treatments, recommendations or illustrations, and vocabularies have to be developed and maintained to enable semantic enhancement in cases where they do not exist.