Project 1: A shape-driven visual interface to integrate heterogeneous biomedical databases into knowledge graphs
Building purpose-specific biomedical knowledge graphs (KGs) remains difficult for users with domain expertise but limited coding experience, and inconsistent even for developers. Understanding knowledge base schemas is often non-trivial, and even with good documentation, it is still necessary to write the adequate SPARQL queries, data transformation rules, and document the end product with its own shape constraints. In previous biohackathons, we developed BioDataFuse as a resource that integrates several ELIXIR Recommended Interoperability Resources (RIRs) and ELIXIR Core Data Resources (CDRs) into KGs, but the task of writing annotators is still a manual task.
This project aims to develop the methodology to provide similarity scores and alignment steps between two (sets of) shapes, building up on the work on BioDataFuse in the 2023 and 2024 hackathons on automated graph schema mining. This methodology will be the base for a graphical interface for low-code or no-code construction of modular KGs from RDF sources where users will define the expected structure of their target graphs by visually editing or creating SHACL or ShEx shapes.
The outcome of the project will be a method to (1) quantify shape similarity across resources and (2) propose alignment transformations with associated confidence scores. These transformations will enable reuse or adaptation of existing schema components based on source constraints automatically extracted using tools such as ShExer and VoID-generator.
Project repository link: https://github.com/BioDataFuse/elixir_biohackathon_2025
Javier Millán Acosta, Tooba Abbassi-Daloii, Yojana Gadiya