Rita Pucci

Author
Rita Pucci

I am a researcher in computational biodiversity with a key interest in advanced multimodal machine learning algorithms that can improve, support and speed up biodiversity monitoring on our planet. At Naturalis, I collaborate with biologists and experts across disciplines to apply new AI technology for good reasons.

Keywords

Biodiversity monitoring, AI, Multimodal Models, Nature awareness

Research
interest

Machine Learning and biodiversity monitoring are increasingly related due to the overflow of data collected with devices. Over the past few decades, biologists have collected more than a billion data points in the wild with remote sensing devices. This is happening faster with the advent of innovative technologies. How to deal with such an amount of data? How can we obtain a rapid taxonomic classification of the images to begin studying biodiversity?  How can we identify patterns across data modalities (e.g. images and DNA). Time is ticking in nature; we can no longer relate only to a human's lifetime.

For this, I develop machine-learning methods for computational biodiversity, with a focus on bioimaging and multimodal data integration. My focus is on models that fuse images, genomic sequences, morphometrics (and when available, audio) to uncover cross-modal relationships, improve species identification and delimitation, and enable scalable biodiversity monitoring. I design computer vision pipelines that leverage images from field surveys, museum/herbarium collections, and citizen science platforms, and I create integrated tools to support systematics and integrative taxonomy. 

I grounded my research on computer vision. Visual information and object representation are some major sources of information for biodiversity monitoring and a fervent field of research in computer science. Deep learning is one of the most prolific fields in computer vision, providing competitive algorithms to analyse and classify such data, trying to emulate the human ability of observation. We study these algorithms in the realm of natural images from mobiles, where the main providers of such data are citizen science. We will investigate modern algorithms and propose new ones to improve automatic animal monitoring and contribute to the study of nature.

Gradient
image

Projects collaboration
DNN for Species Classification

MAMBO (Modern Approaches to the Monitoring of BiOdiversity) is a research and innovation project of Horizon Europe, funded by the European Commission for 5 million euros. The project is a collaboration between ten partners. Scientists Koos Biesmeijer and Vincent Kalkman lead two work packages, one of which is aimed at properly aligning newly developed techniques with existing data and ICT infrastructure. The other work package focuses on the application of image and sound recognition for biomonitoring. For the latter work package, work is being done, among other things, on the further development of image and sound recognition for mobile telephones with the aim of making this available for citizen scientists throughout Europe for virtually all policy-relevant species.

 

TETTRIS In this project, we tested whether it is possible to build image recognition models for molluscs based on images from multiple collections. We did so for 17 species of Vertigo and 100 species of mollusc from Tenerife. The genus Vertigo (snails of about 2mm) was selected as it includes species listed on the EU Habitats Directive. The molluscs of Tenerife were selected as Tenerife is part of the Mediterranean Biodiversity hotspot.

The models performed well with an accuracy of 96 per cent and 76 per cent, respectively, for Vertigo and Tenerife-molluscs. Most incorrect predictions refer to species for which training data was limited or to confusion within larger species-radiations. The results show that it is possible to build an image recognition model for all European molluscs. The main limitation is the availability of training data. The lack of data can be addressed by aligning digitalisation efforts between natural history museums.  

Key
publications

All publications

Teaching
Multimodal models for ecology and biodiversity

The course is MSc level and intended for all MSc Computer Science Specialisations at the Leiden University in the Leiden Institute of Advanced Computer Science. In particular, BioInformatics and Computer Science. Students must be highly proficient in Python programming and writing reports, and have an interest in interdisciplinary Biodiversity and Ecology topics and machine learning. Students are expected to be comfortable with the implementation, training, and testing of machine learning.