Skip to content

Remove machinery for fetching non-core sheets and BioPortal submissions #380

@turbomam

Description

@turbomam

Problem

The Makefile contains substantial infrastructure for downloading secondary Google Sheet tabs, BioPortal historical submissions, and external BioPortal ontologies. This machinery is not part of the ontology build — it was used for one-time research/analysis. It adds complexity, maintenance burden, and confusion about what the build actually does.

Relates to #379 (assess non-core tabs for deletion) — once the tabs are gone, the download code has no purpose.
Relates to #378 (deprecated entities in OWL) — some of this machinery was part of the deprecation research workflow.

What to remove

1. Non-core sheet downloads (~40 lines in Makefile)

The download-all-sheets target fetches 7 sheets, but only classes.tsv and properties.tsv are used in the build. Remove download targets for:

  • bactotraits.tsv
  • more_synonyms.tsv
  • more_classes___inconsistent.tsv
  • metabolic_and_respiratory_robot.tsv
  • metabolic_and_respiratory_llm.tsv

And the secondary and deprecated sections in sheets.yaml (after #379 confirms those tabs are deleted).

2. BioPortal historical submission downloads (~30 lines)

METPO_SUBMISSIONS list and download-all-bioportal-submissions target fetch 16 historical METPO OWL submissions from BioPortal. This was one-time provenance research for the ID allocation audit. Remove:

  • METPO_SUBMISSIONS variable
  • download-all-bioportal-submissions target
  • clean-bioportal-submissions target
  • external/metpo_historical/ directory references

3. External BioPortal ontology downloads (~50 lines)

NON_OLS_BIOPORTAL_ONTOLOGIES list and download-external-bioportal-ontologies target fetch 6 ontologies (D3O, EDAM, MCRO, PRIDE, AMPHIMEDON, PDO) for embedding comparison. This was one-time research for ChromaDB evaluation (#364). Remove:

  • NON_OLS_BIOPORTAL_ONTOLOGIES variable
  • download-external-bioportal-ontologies target
  • clean-external-bioportal-ontologies target
  • external/ontologies/bioportal/ directory references
  • Associated SPARQL extraction rules for data/pipeline/non-ols-terms/%.tsv

4. BactoTraits/Madin MongoDB infrastructure

Assess whether the MongoDB import and reconciliation targets (import-bactotraits, import-madin, bactotraits-metpo-set-diff, etc.) are still used. If the analysis is complete, remove:

  • import-bactotraits, import-bactotraits-metadata, import-madin, import-madin-metadata targets
  • clean-bactotraits-db, clean-madin-db targets
  • Reconciliation report targets that depend on MongoDB
  • Associated Python scripts if unused elsewhere

Goal

After this cleanup, the Makefile's sheet-related code should be: download classes + properties tabs, run robot template, merge into OWL. Everything else is the ontology's own scripts and SPARQL queries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions