Access to taxonomic treatments mentioned in press releases

At year-end, it is common to review the past year’s highlights as we (Plazi) too did, highlighting a few of the almost 9000 species discovered in 2021. A few of the world’s leading taxonomic institutions did so as well with their recent discoveries. Since we are interested in not just finding about the species discovered but also learning more in detail about them, we decided to test how easy it was to locate their original taxonomic treatments in the scientific papers that described the species mentioned in the press releases.

We took three press releases, one each from the California Academy of Sciences (CAS), the Royal Botanic Gardens Kew and the Natural History Museum of London (NHM) in which 29 species from 29 different scientific articles are mentioned.

We scientists, with at least a basic understanding how taxonomic works are published, read the press releases and measured the time we took to locate the original sources of the 29 species. In all three press releases, most of the species are mentioned along with their scientific names that could be copied into Ecosia, a search engine, to find the original source paper. If the original articles were open access, we were able to locate most of them within two minutes. However, some species were mentioned only by their vernacular or descriptive names, not their scientific names (for example, São Tomé’s caecilian). In such cases, the search took up to 8 minutes.

All three articles referred to species whose taxonomic data were not accessible openly due to paywalls or registration obstacles that sometimes didn’t work due to technical issues. In two cases, typos in the taxonomic names made it more difficult (up to 15 minutes) to locate the original publication.

Arguably more importantly, none of the articles provided author citations or direct links to the original source publications. If direct links to the source publications had been provided then typographic errors or vernacular names would not have mattered.

This is only half the story. Taxonomic names refer to a section of a scientific publication, a taxonomic treatment, delimited by the author, to present and discuss the results of the discovery of a new specie. In the age of the internet, this section could be cited allowing the interested reader to directly and immediately learn about the facts of the new specie, including figures and further links to the specimens in the digital collection. This would make the scientific collection more usable. While writing her article, the scientist has this structure in mind. The current publishing process, with a few notable exceptions, removes this structure by publishing a long text that, while understandable by humans, is machine processable only at great cost. As a result, a press officer or journalist writing an article about new species is unable to easily link to the respective treatment of those species.

We therefore explored what it takes to make these treatments and the cited specimen (holotype) open access, citable via the Biodiversity Literature Repository (BLR on Zenodo), and reused by the Global Biodiversity Information Facility (GBIF), where increasingly all observations on biodiversity aggregate.

The PDF conversion, annotation and dissemination of the data is automated as much as possible, but errors discovered during the quality control process have to been fixed manually. Since the 29 articles examined in this exercise have been published in 24 different journals, most of them very domain-specific, automation is not easy. Furthermore, processing includes the entire article, not just the target text, and thus the time taken to process reflects the processing of the entire article.

For journals where a high degree of automation is possible because of their known/consistent layout, e.g. European Journal of Taxonomy or Zootaxa, processing per page takes between 1 and 2.5 minutes. In this case, the average for all journals was 3 minutes with a maximum of 9.5 minutes. The total average processing time of articles, ranging from 2—93 pages took 57 minutes. These articles included 891 pages, 170 taxonomic treatments and 408 figures, accessible through BLR and GBIF.

Clearly, the digital age is more than telling interesting stories about research in natural history institutions. Taxonomic names linked to their taxonomic treatments are a way to provide access to the fantastic results its scientists provide.

press release vernacular name taxonomic name source article and accessibility Plazi mediated links
CAS Easter egg weevil Pachyrhynchus obumanuvu DOI (Open Access via Researchgate)
CAS pygmy pipehorse Cylix tupareomanaia DOI (Open Access)
CAS scorpion Centruroides catemacoensis DOI (Open Access via Researchgate)
CAS São Tomé caecilian   DOI (Closed access)
CAS Guitarfish Acroteriobatus andysabini DOI (Open Access)
CAS sea star Uokeaster ahi DOI (Closed Access)
KEW Killer tobacco plant Nicotiana insecticida DOI (Closed access)
KEW hidden banana seed fungus Fusarium chuoi DOI (Open Access)
KEW Ghost orchid Didymoplexis stella-silvae DOI (Open Access)
KEW blue Barleria Barleria thunbergiiflora DOI (Open Access)
KEW Cape primrose Streptocarpus malachiticola DOI (Open Access)
KEW Firework flower Ardisia pyrotechnica DOI (Access via Researchgate)
KEW Bolivian periwinkle Philibertia woodii DOI (Open Access)
KEW tooth-fungus Hydnellum nemorosum DOI (Open Access)
KEW Voodoo lily Pseudohydrosme ebo DOI (Open Access)
KEW bright-blue-fruited rainforest shrubs Chassalia northiana DOI (Open Access)
NHM ankylosaur Spicomellus afer DOI (Closed Access)
NHM chunky sauropod Rhomaleopakhus turpanensis DOI (Open Access)
NHM   Brighstoneus simmondsi DOI (Open Access)
NHM   Pendraig milnerae DOI (Open Access)
NHM   Megalomys camerhogne DOI (Access via Researchgate)
NHM Jurassic mouse Borealestes cuillinensis DOI (Closed Access)
NHM   Amazops amazops DOI (Open Access)
NHM moth Xanthopan praedicta DOI (Access via Researchgate)
NHM   Mecopoda sismondoi DOI (Open Access)
NHM deep sea polychaete worm Neanthes goodayi DOI (Open Access)
NHM giant amphipod Eurythenes atacamensis DOI (Open Access)
NHM jewelweed Impatiens versicolor DOI (Access via Researchgate)
NHM Joseph's racer Platyceps josephi DOI (Open Access)