Skip to content

Feat/issue triggered curation agent#138

Draft
lwaldron wants to merge 6 commits intomasterfrom
feat/issue-triggered-curation-agent
Draft

Feat/issue triggered curation agent#138
lwaldron wants to merge 6 commits intomasterfrom
feat/issue-triggered-curation-agent

Conversation

@lwaldron
Copy link
Copy Markdown
Member

Title

feat: issue-triggered metadata curation agent + Excel prefill API

Summary

This PR adds an issue-triggered GitHub Actions curation flow with human-in-the-loop (HITL) mapping review, and refactors Excel template generation to use explicit metadata prefilling.

What changed

  • Replaced example_data in generate_data_entry_excel() with prefill_metadata
    • Supports NULL, data.frame, or CSV/TSV path
    • Defaults to package example prefill file in inst/extdata
  • Added package example prefill dataset (cMD_prefill_example.csv)
  • Added curation issue form template for BioProject + attachment + DOI intake
  • Added full workflow for:
    • issue intake + metadata lookup
    • column mapping via GitHub Models API (gpt-4o-mini) with heuristic fallback
    • HITL question/response loop via issue comments
    • state persistence on curation-state/issue-<N> branch
    • Excel artifact generation after mapping resolution
  • Added agent scripts:
    • agent_state.R
    • agent_metadata_lookup.R
    • agent_column_mapping.R
    • agent_prefill_excel.R
  • Added robust script fallback loading when package exports are unavailable in runtime context
  • Updated docs and script usage to the new prefill_metadata API
  • Added tests for CLI agent scripts

Testing

  • devtools::test(filter = "excel-generation") passed
  • devtools::test(filter = "agent-scripts") passed (46 passed, 0 failed, 0 warnings)
  • Script parse checks and end-to-end smoke flow completed successfully

Notes

  • Workflow uses lightweight ubuntu-latest + r-lib/actions setup (no Bioconductor Docker).
  • LLM mapping runs only when GITHUB_TOKEN is available; otherwise heuristic mapping is used.

AI Attribution

This PR was developed with assistance from GitHub Copilot (GPT-5.3-Codex) "Planning" Agent.

These changes have NOT yet been manually tested - only the AI-generated machine tests. Many features require GitHub Actions and Issues for testing.

- Refactor generate_data_entry_excel(): replace example_data bool with
  prefill_metadata param (NULL | data.frame | CSV/TSV path); default
  loads inst/extdata/cMD_prefill_example.csv
- Add load_prefill_metadata() and split_prefill_values() internal helpers
- Add inst/extdata/cMD_prefill_example.csv (4 example rows)
- Update man/, inst/scripts/generate_excel_template.R, README_EXCEL_GENERATION.md
- Update tests: example_data=FALSE -> prefill_metadata=NULL; add data.frame prefill test
- Add .github/ISSUE_TEMPLATE/cMD_sra_curation.yml (BioProject, attachment, DOI fields)
- Add .github/workflows/metadata-curation-agent.yml (HITL loop: intake-and-map +
  process-curator-response jobs; ubuntu-latest + r-lib/actions; state on
  curation-state/issue-N branch; LLM via GitHub Models API gpt-4o-mini)
- Add inst/scripts/agent_state.R (init/update_mapping/apply_response/status)
- Add inst/scripts/agent_metadata_lookup.R (SRA/BioSample fetch + attachment join)
- Add inst/scripts/agent_column_mapping.R (LLM + adist heuristic fallback)
- Add inst/scripts/agent_prefill_excel.R (state -> prefill df -> Excel)
- Add tests/testthat/test-agent-scripts.R (46 tests, 0 failures)
@github-actions
Copy link
Copy Markdown

Metadata Validation Report

Date: $(date -u +"%Y-%m-%d %H:%M:%S UTC")
Trigger: pull_request
Branch: 138/merge

Schema Information

  • Repository: shbrief/OmicsMLRepoCuration
  • Commit: 52cdc8e

Results

  • Total Files: 130
  • Status: ❌ FAIL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant