Naturalis leads the development of the Distributed System of Scientific Collections (DiSSCo), a new world-class Research Infrastructure (RI) for natural science collections. DiSSCo brings together 136 museums across 21 European countries to unify and serve genomic, geographical, morphological and taxonomic knowledge for the 1.5 billion physical objects held in European collections. DiSSCo envisages a novel FAIR Digital Object architecture (see fairdo.org) to provide Digital Specimens as digital twins of the physical objects in the future data fabric of interdisciplinary scientific data.
To develop and demonstrate the concept, Naturalis is looking for a lead programmer for development of DiSSCo Digital Specimen Architecture pilots. Objective of this position is to further develop and transition the nsidr.org demonstrator into a Digital Specimen pilot to link information about natural history specimens together and enable community curation and annotation.
Data derived from collection specimens is fundamental to scientific biodiversity and geodiversity research aimed at the understanding and conservation of the natural riches of our Earth. In Europe, Naturalis leads the development of the Distributed System of Scientific Collections (DiSSCo), a new world-class Research Infrastructure (RI) for natural science collections. DiSSCo brings together 136 museums across 21 European countries to unify and serve genomic, geographical, morphological and taxonomic knowledge for the 1.5 billion physical objects held in European collections. The new RI introduces a step change by massively improving the way scientists discover, access and analyse complex and previously disjoined information deriving from the study of the vast European natural science collections. DiSSCo embarks on a complex preparation and construction programme. The programme is executed through a series of innovation, consolidation and construction projects across multiple European stakeholders. With implementation beginning in 2024, DiSSCo’s full operations are planned to commence in 2026. Nevertheless, DiSSCo is already starting to work on early e-Services and pilots to demonstrate the added value and feasibility of its plans for a Digital Specimen Architecture (DSArch).
Within the ICEDIG project a demonstrator NSIDR registry (nsidr.org) was developed that shows how a digital specimen objects registry can be built using CORDRA, Handles and the DOIP protocol. It also demonstrates how this can supply enriched specimen data implementing links to GBIF data, Catalogue of Life (for taxon names), EBI (for sequence data) and WikiData. Furthermore, the implementation was used to demonstrate a direct connection through DOIP with a collection management system (Memorix) and to showcase the handling of provenance/attribution events.
Objective of this position is to further develop and transition the nsidr.org demonstrator into a Digital Specimen pilot to showcase and test the novel principles of a FAIR Digital Object infrastructure (FAIR DO) to link information about natural history specimens together and enable community curation and annotation. More information about the current implementation and concept can be found here. The DiSSCo technical team has created a wish list for further development of this demonstrator which involves several inter-related developments – in the schema/mechanisms of the repository, in the simpleUI, and in service/app software. These developments will likely need Elasticsearch, SPARQL and NEO4J graph database implementations as well as implementation of APIs and may benefit from machine learning too.
The pilot needs to support and trial the OpenDS specification for Digital Specimens, currently being created in the DiSSCo Prepare project. It should also get a real dataset, scaled up from a few specimens to millions of specimens. Not only to get experience with operation at scale but also to make it interesting for early adaptors. The pilot will need to support stories that showcase advantages the new infrastructure will bring. Further components need to be developed to implement a DOI PID scheme for the objects, to support early implementation in DiSSCo e-services already in development, like ELViS, specimen data refinery services and the digitisation dashboard. This includes implementing a local handle server and eventually a data type registry (DTR).
Demonstrating these functionalities in an early stage of the development of the DiSSCo Research Infrastructure is essential for adoption of the novel DO architecture by the community and further services development. For the development the work has to be carried out in close collaboration with the DiSSCo Data Architect and the international technical team in a distributed work environment. As lead developer you will work with them on further development of the novel DS Architecture to position DiSSCo as leading in FAIR DO infrastructure implementation, which is seen as the future for a data fabric of scientific data in Europe. Development will include:
- Evolvement of the current nsidr.org demonstrator to implement and test openDS specification;
- Transitioning of the demonstrator into a pilot with datasets for data that includes links to external sources, and data required for DiSSCo e-Services. This may require migration from MongoDB to Amazon S3;
- Implementation of iiif.io manifests into Digital Specimens and enhancement of the frontend with a iiif.io viewer and Elastic search based search function across multiple CORDRA repositories;
- Deployment of a local handle server and required kernel information profiles;
- Cordra based repositories for eServices under construction (ELViS, Collection Digitisation Dashboard, Specimen Data Refinery) and pilots like an AAI pilot and link prediction pilot;
- JSON-LD data ingestion and exchange pipelines for CETAF registry API and GBIF API;
- Documentation of developed components and configurations.
Optionally, depending on external developments and progress with formation of a distributed team of developers, the work may include:
- Deployment of a local Data Type Registry and definition of data types in a DTR;
- DOIP based data exchange with JACQ and Specify (collection management systems) & batch operations;
- Demonstration of MS Excel, MS Access and Filemaker connections;
- Support for developers wanting to create demonstrators connecting to the DiSSCo CORDRA instances;
- Visualisation of the knowledge graph;
- Visibility in Google Search and Google Dataset Search.
Requiredqualifications, experience and skills (must-have)
The lead developer we are looking for will need to have the following skills:
- Have a solid understanding of the processes, limitations and technical solutions in development of data infrastructure for international science, in particular on data processing, linking and indexing;
- Think in innovative solutions and rapid prototyping;
- Be able to travel internationally several times a year (if the COVID-19 situation allows this);
- Excellent knowledge of the English language (written and verbal);
- Be able to present and pitch architectural concepts and implementation choices to both technical and non-technical audiences
- Have experience in Java programming (preferably in Kotlin), and one other programming language (preferrably Python, Ruby/Rails);
- Familiar with standard web application development (JSON, XML, REST, APIs);
- Familiar with the concept of Persistent Identifiers, e.g. Handles, DOIs, DO, DOIPv2, ORCID iDs;
- Be self-motivated, feel responsible, able to work independently.
Desiredqualifications, experience and skills (good-to-have)
- Have experience in search engines and graph data processing, preferably with Elasticsearch, SPARQL, Neo4J;
- Have experience with continuous integration, dev-ops, Docker, Kubernetes, AWS;
- Have experience in open source programming and engineering in complex international science innovation or infrastructure projects;
- Hold a university degree, preferably in software engineering or similar technical study;
- Knowledge of the Dutch language;
- Have affinity with the community and field of natural sciences and biology;
- Familiar with Research Data Alliance recommendations and TDWG standards;
- Familiar with data serialization as well as exchange and discovery solutions (such as Protobuf, Avro, Bioschemas);
- Affinity with user-friendly GUIs.
A contract (36 hours per week, 32 is possible) for a period of one year, to be extended with one year after a successful first year evaluation, and a monthly gross salary between € 3,405 and € 4,576, depending on relevant experience. You also get an allowance for travel expenses, holiday allowance (8%) and year-end bonus (3.4%). Naturalis Biodiversity Center offers an inspiring working atmosphere and advanced ICT infrastructure. The Naturalis offices in Leiden are easily accessible by public transport from Amsterdam, Rotterdam, Utrecht and The Hague.
Applicants are invited to submit their application, including a cover letter and CV, by using this application form. Feel free to contact Wouter Addink (coördinator Research-data & E-infrastructure) with questions about the position: firstname.lastname@example.org.
Naturalis endorses the Cultural Diversity Code. In the case of equal suitability, preference is given to the candidate who reinforces diversity within the team.
Acquisition is not appreciated.