ETL pipelines for the RKI Metadata Exchange.
The Metadata Exchange (MEx) project is committed to improve the retrieval of RKI research data and projects. How? By focusing on metadata: instead of providing the actual research data directly, the MEx metadata catalog captures descriptive information about research data and activities. On this basis, we want to make the data FAIR1 so that it can be shared with others.
Via MEx, metadata will be made findable, accessible and shareable, as well as available for further research. The goal is to get an overview of what research data is available, understand its context, and know what needs to be considered for subsequent use.
RKI cooperated with D4L data4life gGmbH for a pilot phase where the vision of a FAIR metadata catalog was explored and concepts and prototypes were developed. The partnership has ended with the successful conclusion of the pilot phase.
After an internal launch, the metadata will also be made publicly available and thus be available to external researchers as well as the interested (professional) public to find research data from the RKI.
For further details, please consult our project page.
Contact
For more information, please feel free to email us at mex@rki.de.
Robert Koch-Institut
Nordufer 20
13353 Berlin
Germany
The mex-extractors package implements a variety of ETL pipelines to extract
metadata from primary data sources using a range of different technologies and
protocols. Then, we transform the metadata into a standardized format using models
provided by mex-common. The last step in this process is to load the harmonized
metadata into a sink (file output, API upload, etc).
This package is licensed under the MIT license. All other software components of the MEx project are open-sourced under the same license as well.
- install python on your system
- on unix, run
make install - on windows, run
.\mex.bat install
- run all linters with
make lintor.\mex.bat lint - run unit and integration tests with
make testor.\mex.bat test - run just the unit tests with
make unitor.\mex.bat unit
- update boilerplate files with
cruft update - update global requirements in
requirements.txtmanually - update git hooks with
pre-commit autoupdate - update package dependencies using
uv sync --upgrade - update github actions in
.github/workflows/*.ymlmanually
- run
mex release RULEto release a new version where RULE determines which part of the version to update and is one ofmajor,minor,patch.
- build image with
make image - run directly using docker
make run - start with docker compose
make start
- run
uv run {command} --helpto print instructions - run
uv run {command} --debugfor interactive debugging
uv run dagster devto launch a local dagster UI
uv run all-extractorsexecutes all extractors- execute only in local or dev environment
uv run artificialcreates deterministic artificial sample data- execute only in local or dev environment
uv run biospecimenextracts sources from the Biospecimen excel files
uv run blueantextracts sources from the Blue Ant project management software
uv run confluence-vvtextracts sources from the VVT confluence page
uv run consent-mailersend emails to collect publishing consents
uv run contact-pointextracts default contact points
uv run datscha-webextracts sources from the datscha web app
uv run endnoteextracts from endnote XML files
uv run ff-projectsextracts sources from the FF Projects excel file
uv run grippewebextracts grippeweb metadata from grippeweb database
uv run ifsgextracts sources from the ifsg data base
uv run international-projectsextracts sources from the international projects excel
uv run kvisextracts KVIS metadata from KVIS database
uv run odkextracts ODK survey data from excel files
uv run open-dataextracts Open Data sources from the Zenodo API
uv run seq-repoextracts sources from seq-repo JSON file
uv run sumoextract sumo data from xlsx files
uv run synopseextracts synopse data from report-server exports
uv run voxcoextracts voxco data from voxco JSON files
uv run publishergets merged items from backend and publishes them into sink
Footnotes
-
FAIR is referencing the so-called FAIR data principles – guidelines to make data Findable, Accessible, Interoperable and Reusable. ↩