Problem
The Makefile contains substantial infrastructure for downloading secondary Google Sheet tabs, BioPortal historical submissions, and external BioPortal ontologies. This machinery is not part of the ontology build — it was used for one-time research/analysis. It adds complexity, maintenance burden, and confusion about what the build actually does.
Relates to #379 (assess non-core tabs for deletion) — once the tabs are gone, the download code has no purpose.
Relates to #378 (deprecated entities in OWL) — some of this machinery was part of the deprecation research workflow.
What to remove
1. Non-core sheet downloads (~40 lines in Makefile)
The download-all-sheets target fetches 7 sheets, but only classes.tsv and properties.tsv are used in the build. Remove download targets for:
bactotraits.tsv
more_synonyms.tsv
more_classes___inconsistent.tsv
metabolic_and_respiratory_robot.tsv
metabolic_and_respiratory_llm.tsv
And the secondary and deprecated sections in sheets.yaml (after #379 confirms those tabs are deleted).
2. BioPortal historical submission downloads (~30 lines)
METPO_SUBMISSIONS list and download-all-bioportal-submissions target fetch 16 historical METPO OWL submissions from BioPortal. This was one-time provenance research for the ID allocation audit. Remove:
METPO_SUBMISSIONS variable
download-all-bioportal-submissions target
clean-bioportal-submissions target
external/metpo_historical/ directory references
3. External BioPortal ontology downloads (~50 lines)
NON_OLS_BIOPORTAL_ONTOLOGIES list and download-external-bioportal-ontologies target fetch 6 ontologies (D3O, EDAM, MCRO, PRIDE, AMPHIMEDON, PDO) for embedding comparison. This was one-time research for ChromaDB evaluation (#364). Remove:
NON_OLS_BIOPORTAL_ONTOLOGIES variable
download-external-bioportal-ontologies target
clean-external-bioportal-ontologies target
external/ontologies/bioportal/ directory references
- Associated SPARQL extraction rules for
data/pipeline/non-ols-terms/%.tsv
4. BactoTraits/Madin MongoDB infrastructure
Assess whether the MongoDB import and reconciliation targets (import-bactotraits, import-madin, bactotraits-metpo-set-diff, etc.) are still used. If the analysis is complete, remove:
import-bactotraits, import-bactotraits-metadata, import-madin, import-madin-metadata targets
clean-bactotraits-db, clean-madin-db targets
- Reconciliation report targets that depend on MongoDB
- Associated Python scripts if unused elsewhere
Goal
After this cleanup, the Makefile's sheet-related code should be: download classes + properties tabs, run robot template, merge into OWL. Everything else is the ontology's own scripts and SPARQL queries.
Problem
The Makefile contains substantial infrastructure for downloading secondary Google Sheet tabs, BioPortal historical submissions, and external BioPortal ontologies. This machinery is not part of the ontology build — it was used for one-time research/analysis. It adds complexity, maintenance burden, and confusion about what the build actually does.
Relates to #379 (assess non-core tabs for deletion) — once the tabs are gone, the download code has no purpose.
Relates to #378 (deprecated entities in OWL) — some of this machinery was part of the deprecation research workflow.
What to remove
1. Non-core sheet downloads (~40 lines in Makefile)
The
download-all-sheetstarget fetches 7 sheets, but onlyclasses.tsvandproperties.tsvare used in the build. Remove download targets for:bactotraits.tsvmore_synonyms.tsvmore_classes___inconsistent.tsvmetabolic_and_respiratory_robot.tsvmetabolic_and_respiratory_llm.tsvAnd the
secondaryanddeprecatedsections insheets.yaml(after #379 confirms those tabs are deleted).2. BioPortal historical submission downloads (~30 lines)
METPO_SUBMISSIONSlist anddownload-all-bioportal-submissionstarget fetch 16 historical METPO OWL submissions from BioPortal. This was one-time provenance research for the ID allocation audit. Remove:METPO_SUBMISSIONSvariabledownload-all-bioportal-submissionstargetclean-bioportal-submissionstargetexternal/metpo_historical/directory references3. External BioPortal ontology downloads (~50 lines)
NON_OLS_BIOPORTAL_ONTOLOGIESlist anddownload-external-bioportal-ontologiestarget fetch 6 ontologies (D3O, EDAM, MCRO, PRIDE, AMPHIMEDON, PDO) for embedding comparison. This was one-time research for ChromaDB evaluation (#364). Remove:NON_OLS_BIOPORTAL_ONTOLOGIESvariabledownload-external-bioportal-ontologiestargetclean-external-bioportal-ontologiestargetexternal/ontologies/bioportal/directory referencesdata/pipeline/non-ols-terms/%.tsv4. BactoTraits/Madin MongoDB infrastructure
Assess whether the MongoDB import and reconciliation targets (
import-bactotraits,import-madin,bactotraits-metpo-set-diff, etc.) are still used. If the analysis is complete, remove:import-bactotraits,import-bactotraits-metadata,import-madin,import-madin-metadatatargetsclean-bactotraits-db,clean-madin-dbtargetsGoal
After this cleanup, the Makefile's sheet-related code should be: download
classes+propertiestabs, runrobot template, merge into OWL. Everything else is the ontology's own scripts and SPARQL queries.