Generate RO-Biolink predicate mappings based on a particular Biolink model#104
Generate RO-Biolink predicate mappings based on a particular Biolink model#104
Conversation
Also deleted ro-to-biolink-mappings.tsv, which contains all the mappings.
This should be updated by the Makefile.
Also added ro-to-biolink-predicate-mappings-all.tsv.
|
@balhoff I've now added checks that (1) look for duplication between the local mappings file and generated predicate files, and (2) look for Biolink predicates that are not present in the Biolink model. So far, I'm just printing out concerning PredicateMappings (which is based on the predicate mappings file generated as part of the Biolink model), so unfortunately this isn't very readable. Here's what the output looks like right now with 15 warnings: We can ignore the CTD mappings since we currently don't export those as all. However, it looks like the following terms are duplicated:
|
|
I've deleted RO:0002313 from local mappings in 797ff28. |
|
Hi @balhoff -- just wanted to poke you to review this PR. If you need help in incorporating it into the changes you've made to re-adding CTD, please let me know. |
|
Hi @balhoff -- just wanted to poke you to review this PR. If you need help in incorporating it into the changes you've made to re-adding CTD, please let me know. |
Adds
scripts/generate_ro_biolink_mapping.sc, a Scala CLI script for generating a list of mappings between RDF predicates and Biolink predicates downloaded from two sources:The Biolink model (https://github.com/biolink/biolink-model/blob/68d4e3d7612275d0d7e832a9919bf8666e1d5fde/biolink-model.yaml)These are written into the
ro-to-biolink-predicate-mappings.tsvfile (which I've included in this PR). If you want to see all the predicate mappings (not just the RO/GOREL ones), they are in thero-to-biolink-predicate-mappings-all.tsv(https://github.com/ExposuresProvider/cam-pipeline/blob/e1d6dd063c43de31ac736dbd0ce1ee57008f64fc/ro-to-biolink-predicate-mappings-all.tsv).This file is then used by
scripts/kg_edges.dlto add "qualifiers" tokg.tsv. This does seem to work currently, producing output like:GO:0004842 biolink:regulates GO:0004842 http://model.geneontology.org/R-HSA-9645460 infores:go-cam GO:0004842 biolink:regulates GO:0004842 http://model.geneontology.org/R-HSA-9645460 infores:go-cam {"biolink:object_direction_qualifier":"upregulated"} GO:0004842 biolink:regulates GO:0004842 http://model.geneontology.org/R-HSA-937042 infores:go-cam GO:0004842 biolink:regulates GO:0004842 http://model.geneontology.org/R-HSA-937042 infores:go-cam {"biolink:object_direction_qualifier":"upregulated"} GO:0004842 biolink:regulates GO:0004842 http://model.geneontology.org/R-HSA-983168 infores:go-cam GO:0004842 biolink:regulates GO:0004842 http://model.geneontology.org/R-HSA-983168 infores:go-cam {"biolink:object_direction_qualifier":"upregulated"} GO:0004842 biolink:regulates GO:0004674 http://model.geneontology.org/62b4ffe300004589 infores:go-cam GO:0004842 biolink:regulates GO:0004674 http://model.geneontology.org/62b4ffe300004589 infores:go-cam {"biolink:object_direction_qualifier":"upregulated"} [...] GO:0022857 biolink:affects CHEBI:641 http://model.geneontology.org/5d29221b00001552 infores:go-cam {"biolink:qualified_predicate":"biolink:causes"}||{"biolink:object_aspect_qualifier":"transport"}||{"biolink:object_direction_qualifier":"increased"} GO:0051640 biolink:affects GO:0140494 http://model.geneontology.org/5ee8120100001898 infores:go-cam {"biolink:qualified_predicate":"biolink:causes"}||{"biolink:object_aspect_qualifier":"transport"}||{"biolink:object_direction_qualifier":"increased"} GO:0031503 biolink:affects ComplexPortal:CPX-532 http://model.geneontology.org/5df932e000000551 infores:go-cam {"biolink:qualified_predicate":"biolink:causes"}||{"biolink:object_aspect_qualifier":"transport"}||{"biolink:object_direction_qualifier":"increased"} GO:0034504 biolink:affects MGI:MGI:3036269 http://model.geneontology.org/5df932e000003298 infores:go-cam {"biolink:qualified_predicate":"biolink:causes"}||{"biolink:object_aspect_qualifier":"transport"}||{"biolink:object_direction_qualifier":"increased"} GO:0016197 biolink:affects GO:0005770 http://model.geneontology.org/5ee8120100000250 infores:go-cam {"biolink:qualified_predicate":"biolink:causes"}||{"biolink:object_aspect_qualifier":"transport"}||{"biolink:object_direction_qualifier":"increased"}Things to do:
.asJsonfrom Circe to work. Help?ro-to-biolink-local-mappings.tsvandro-to-biolink-predicate-mappings.tsv-- any examples in the original list should be deleted so that only the qualified predicate is used.ro-to-biolink-local-mappings.tsvfor any predicates that have been deleted -- we can temporarily add those directly toscripts/generate_ro_biolink_mappings.sc, but eventually we should get those into the Biolink model.This PR also adds the command for generating
ro-to-biolink-predicate-mappings.tsv, although at the moment this will never be run, as the GitHub repo includes the predicate mappings file.WIP: will close #95 once implemented.