MEx extractors

ETL pipelines for the RKI Metadata Exchange.

Project

The Metadata Exchange (MEx) project is committed to improve the retrieval of RKI research data and projects. How? By focusing on metadata: instead of providing the actual research data directly, the MEx metadata catalog captures descriptive information about research data and activities. On this basis, we want to make the data FAIR¹ so that it can be shared with others.

Via MEx, metadata will be made findable, accessible and shareable, as well as available for further research. The goal is to get an overview of what research data is available, understand its context, and know what needs to be considered for subsequent use.

RKI cooperated with D4L data4life gGmbH for a pilot phase where the vision of a FAIR metadata catalog was explored and concepts and prototypes were developed. The partnership has ended with the successful conclusion of the pilot phase.

After an internal launch, the metadata will also be made publicly available and thus be available to external researchers as well as the interested (professional) public to find research data from the RKI.

For further details, please consult our project page.

Contact
For more information, please feel free to email us at mex@rki.de.

Publisher

Robert Koch-Institut
Nordufer 20
13353 Berlin
Germany

Package

The mex-extractors package implements a variety of ETL pipelines to extract metadata from primary data sources using a range of different technologies and protocols. Then, we transform the metadata into a standardized format using models provided by mex-common. The last step in this process is to load the harmonized metadata into a sink (file output, API upload, etc).

License

This package is licensed under the MIT license. All other software components of the MEx project are open-sourced under the same license as well.

Development

Installation

install python on your system
on unix, run make install
on windows, run .\mex.bat install

Linting and testing

run all linters with make lint or .\mex.bat lint
run unit and integration tests with make test or .\mex.bat test
run just the unit tests with make unit or .\mex.bat unit

Updating dependencies

update boilerplate files with cruft update
update global requirements in requirements.txt manually
update git hooks with pre-commit autoupdate
update package dependencies using uv sync --upgrade
update github actions in .github/workflows/*.yml manually

Creating release

run mex release RULE to release a new version where RULE determines which part of the version to update and is one of major, minor, patch.

Container workflow

build image with make image
run directly using docker make run
start with docker compose make start

Commands

run uv run {command} --help to print instructions
run uv run {command} --debug for interactive debugging

dagster

uv run dagster dev to launch a local dagster UI

all extractors

uv run all-extractors executes all extractors
execute only in local or dev environment

artificial extractor

uv run artificial creates deterministic artificial sample data
execute only in local or dev environment

biospecimen extractor

uv run biospecimen extracts sources from the Biospecimen excel files

blueant extractor

uv run blueant extracts sources from the Blue Ant project management software

confluence-vvt extractor

uv run confluence-vvt extracts sources from the VVT confluence page

consent-mailer

uv run consent-mailer send emails to collect publishing consents

contact-point

uv run contact-point extracts default contact points

datscha-web extractor

uv run datscha-web extracts sources from the datscha web app

endnote extractor

uv run endnote extracts from endnote XML files

ff-projects extractor

uv run ff-projects extracts sources from the FF Projects excel file

grippeweb extractor

uv run grippeweb extracts grippeweb metadata from grippeweb database

ifsg extractor

uv run ifsg extracts sources from the ifsg data base

international-projects extractor

uv run international-projects extracts sources from the international projects excel

kvis extractor

uv run kvis extracts KVIS metadata from KVIS database

odk extractor

uv run odk extracts ODK survey data from excel files

open-data extractor

uv run open-data extracts Open Data sources from the Zenodo API

seq-repo extractor

uv run seq-repo extracts sources from seq-repo JSON file

sumo extractor

uv run sumo extract sumo data from xlsx files

synopse extractor

uv run synopse extracts synopse data from report-server exports

voxco extractor

uv run voxco extracts voxco data from voxco JSON files

publisher

uv run publisher gets merged items from backend and publishes them into sink

FAIR is referencing the so-called FAIR data principles – guidelines to make data Findable, Accessible, Interoperable and Reusable. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.github		.github
assets		assets
docs		docs
mex		mex
tests		tests
.cruft.json		.cruft.json
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS		AUTHORS
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
compose.yaml		compose.yaml
dagster-compose.yaml		dagster-compose.yaml
dagster.yaml		dagster.yaml
mex.bat		mex.bat
pyproject.toml		pyproject.toml
renovate.json		renovate.json
requirements.txt		requirements.txt
uv.lock		uv.lock
workspace.yaml		workspace.yaml

Folders and files

Latest commit

History

Repository files navigation

MEx extractors

Project

Publisher

Package

License

Development

Installation

Linting and testing

Updating dependencies

Creating release

Container workflow

Commands

dagster

all extractors

artificial extractor

biospecimen extractor

blueant extractor

confluence-vvt extractor

consent-mailer

contact-point

datscha-web extractor

endnote extractor

ff-projects extractor

grippeweb extractor

ifsg extractor

international-projects extractor

kvis extractor

odk extractor

open-data extractor

seq-repo extractor

sumo extractor

synopse extractor

voxco extractor

publisher

Footnotes

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 37

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages