At year-end, it is common to review the past year’s highlights as we (Plazi) too did, highlighting a few of the almost 9000 species discovered in 2021. A few of the world’s leading taxonomic institutions did so as well with their recent discoveries. Since we are interested in not just finding about the species discovered but also learning more in detail about them, we decided to test how easy it was to locate their original taxonomic treatments in the scientific papers that described the species mentioned in the press releases.
We took three press releases, one each from the California Academy of Sciences (CAS), the Royal Botanic Gardens Kew and the Natural History Museum of London (NHM) in which 29 species from 29 different scientific articles are mentioned.
We scientists, with at least a basic understanding how taxonomic works are published, read the press releases and measured the time we took to locate the original sources of the 29 species. In all three press releases, most of the species are mentioned along with their scientific names that could be copied into Ecosia, a search engine, to find the original source paper. If the original articles were open access, we were able to locate most of them within two minutes. However, some species were mentioned only by their vernacular or descriptive names, not their scientific names (for example, São Tomé’s caecilian). In such cases, the search took up to 8 minutes.
All three articles referred to species whose taxonomic data were not accessible openly due to paywalls or registration obstacles that sometimes didn’t work due to technical issues. In two cases, typos in the taxonomic names made it more difficult (up to 15 minutes) to locate the original publication.
Arguably more importantly, none of the articles provided author citations or direct links to the original source publications. If direct links to the source publications had been provided then typographic errors or vernacular names would not have mattered.
This is only half the story. Taxonomic names refer to a section of a scientific publication, a taxonomic treatment, delimited by the author, to present and discuss the results of the discovery of a new specie. In the age of the internet, this section could be cited allowing the interested reader to directly and immediately learn about the facts of the new specie, including figures and further links to the specimens in the digital collection. This would make the scientific collection more usable. While writing her article, the scientist has this structure in mind. The current publishing process, with a few notable exceptions, removes this structure by publishing a long text that, while understandable by humans, is machine processable only at great cost. As a result, a press officer or journalist writing an article about new species is unable to easily link to the respective treatment of those species.
We therefore explored what it takes to make these treatments and the cited specimen (holotype) open access, citable via the Biodiversity Literature Repository (BLR on Zenodo), and reused by the Global Biodiversity Information Facility (GBIF), where increasingly all observations on biodiversity aggregate.
The PDF conversion, annotation and dissemination of the data is automated as much as possible, but errors discovered during the quality control process have to been fixed manually. Since the 29 articles examined in this exercise have been published in 24 different journals, most of them very domain-specific, automation is not easy. Furthermore, processing includes the entire article, not just the target text, and thus the time taken to process reflects the processing of the entire article.
For journals where a high degree of automation is possible because of their known/consistent layout, e.g. European Journal of Taxonomy or Zootaxa, processing per page takes between 1 and 2.5 minutes. In this case, the average for all journals was 3 minutes with a maximum of 9.5 minutes. The total average processing time of articles, ranging from 2—93 pages took 57 minutes. These articles included 891 pages, 170 taxonomic treatments and 408 figures, accessible through BLR and GBIF.
Clearly, the digital age is more than telling interesting stories about research in natural history institutions. Taxonomic names linked to their taxonomic treatments are a way to provide access to the fantastic results its scientists provide.
|press release||vernacular name||taxonomic name||source article and accessibility||Plazi mediated links|
|CAS||Easter egg weevil||Pachyrhynchus obumanuvu||DOI (Open Access via Researchgate)|
|CAS||pygmy pipehorse||Cylix tupareomanaia||DOI (Open Access)|
|CAS||scorpion||Centruroides catemacoensis||DOI (Open Access via Researchgate)|
|CAS||São Tomé caecilian||DOI (Closed access)|
|CAS||Guitarfish||Acroteriobatus andysabini||DOI (Open Access)|
|CAS||sea star||Uokeaster ahi||DOI (Closed Access)|
|KEW||Killer tobacco plant||Nicotiana insecticida||DOI (Closed access)|
|KEW||hidden banana seed fungus||Fusarium chuoi||DOI (Open Access)|
|KEW||Ghost orchid||Didymoplexis stella-silvae||DOI (Open Access)|
|KEW||blue Barleria||Barleria thunbergiiflora||DOI (Open Access)|
|KEW||Cape primrose||Streptocarpus malachiticola||DOI (Open Access)|
|KEW||Firework flower||Ardisia pyrotechnica||DOI (Access via Researchgate)|
|KEW||Bolivian periwinkle||Philibertia woodii||DOI (Open Access)|
|KEW||tooth-fungus||Hydnellum nemorosum||DOI (Open Access)|
|KEW||Voodoo lily||Pseudohydrosme ebo||DOI (Open Access)|
|KEW||bright-blue-fruited rainforest shrubs||Chassalia northiana||DOI (Open Access)|
|NHM||ankylosaur||Spicomellus afer||DOI (Closed Access)|
|NHM||chunky sauropod||Rhomaleopakhus turpanensis||DOI (Open Access)|
|NHM||Brighstoneus simmondsi||DOI (Open Access)|
|NHM||Pendraig milnerae||DOI (Open Access)|
|NHM||Megalomys camerhogne||DOI (Access via Researchgate)|
|NHM||Jurassic mouse||Borealestes cuillinensis||DOI (Closed Access)|
|NHM||Amazops amazops||DOI (Open Access)|
|NHM||moth||Xanthopan praedicta||DOI (Access via Researchgate)|
|NHM||Mecopoda sismondoi||DOI (Open Access)|
|NHM||deep sea polychaete worm||Neanthes goodayi||DOI (Open Access)|
|NHM||giant amphipod||Eurythenes atacamensis||DOI (Open Access)|
|NHM||jewelweed||Impatiens versicolor||DOI (Access via Researchgate)|
|NHM||Joseph's racer||Platyceps josephi||DOI (Open Access)|