Skip to content

Latest commit

 

History

History
148 lines (111 loc) · 10.8 KB

File metadata and controls

148 lines (111 loc) · 10.8 KB

RNA Module Function Index

Quick reference table for all Python functions in the METAINFORMANT RNA module.

Amalgkit Step Functions

High-level wrappers for amalgkit CLI subcommands.

Function Module Description Documentation
metadata metainformant.rna.amalgkit Retrieve RNA-seq metadata from NCBI SRA/ENA 01_metadata.md
config metainformant.rna.amalgkit Generate configuration files 02_config.md
select metainformant.rna.amalgkit Filter SRA entries by quality 03_select.md
getfastq metainformant.rna.amalgkit Download and convert SRA to FASTQ 04_getfastq.md
integrate metainformant.rna.amalgkit Integrate FASTQ paths into metadata 05_integrate.md
quant metainformant.rna.amalgkit Quantify transcript abundances 06_quant.md
merge metainformant.rna.amalgkit Merge quantification results 07_merge.md
cstmm metainformant.rna.amalgkit Cross-species TMM normalization 08_cstmm.md
curate metainformant.rna.amalgkit Quality control and batch correction 09_curate.md
csca metainformant.rna.amalgkit Cross-species correlation analysis 10_csca.md
sanity metainformant.rna.amalgkit Validate workflow outputs 11_sanity.md

Step Runner Functions

Amalgkit steps are invoked via CLI wrappers in metainformant.rna.amalgkit. Step execution logic lives in metainformant.rna.engine.workflow_steps, FASTQ retrieval in metainformant.rna.engine.sra_extraction, and pipeline orchestration in metainformant.rna.engine.pipeline.

Workflow Functions

Workflow planning and execution.

Function Module Description Documentation
load_workflow_config metainformant.rna.engine.workflow Load configuration from YAML API.md
plan_workflow metainformant.rna.engine.workflow Plan workflow steps (dry-run) API.md
plan_workflow_with_params metainformant.rna.engine.workflow Plan workflow with overrides API.md
execute_workflow metainformant.rna.engine.workflow Execute complete workflow API.md
AmalgkitWorkflowConfig metainformant.rna.engine.workflow Configuration dataclass API.md

Genome Preparation Functions

Genome download, transcriptome preparation, and indexing.

Function Module Description Documentation
prepare_genome_for_quantification metainformant.rna.genome_prep Complete genome setup pipeline genome_preparation.md
prepare_transcriptome_for_kallisto metainformant.rna.genome_prep Extract RNA FASTA from genome genome_preparation.md
build_kallisto_index metainformant.rna.genome_prep Build kallisto index genome_preparation.md
find_rna_fasta_in_genome_dir metainformant.rna.genome_prep Locate RNA FASTA in genome dir genome_preparation.md
download_rna_fasta_from_ftp metainformant.rna.genome_prep Download RNA FASTA from FTP genome_preparation.md
download_cds_fasta_from_ftp metainformant.rna.genome_prep Download CDS FASTA from FTP genome_preparation.md
extract_transcripts_from_gff metainformant.rna.genome_prep Extract transcripts from GFF genome_preparation.md
get_expected_index_path metainformant.rna.genome_prep Get expected index path genome_preparation.md
verify_genome_status metainformant.rna.genome_prep Check genome/index status genome_preparation.md
orchestrate_genome_setup metainformant.rna.genome_prep Run genome setup for all species genome_setup_guide.md

Orchestration Functions

Multi-species workflow management and monitoring.

Function Module Description Documentation
discover_species_configs metainformant.rna.orchestration Find all species configs API.md
run_workflow_for_species metainformant.rna.orchestration Execute workflow for one species API.md
check_workflow_status metainformant.rna.orchestration Check workflow completion status API.md
cleanup_unquantified_samples metainformant.rna.orchestration Clean up quantified FASTQs API.md
monitor_workflows metainformant.rna.orchestration Monitor multiple workflows API.md

Utility Functions

CLI interaction and parameter handling.

Function Module Description Documentation
check_cli_available metainformant.rna.amalgkit Check if amalgkit is on PATH API.md
ensure_cli_available metainformant.rna.amalgkit Ensure amalgkit available (auto-install) API.md
build_cli_args metainformant.rna.amalgkit Convert params to CLI args API.md
build_amalgkit_command metainformant.rna.amalgkit Build complete command API.md
run_amalgkit metainformant.rna.amalgkit Execute amalgkit subcommand API.md

Processing Functions

Sample-level processing pipelines.

Function Module Description Documentation
quantify_sample metainformant.rna.engine.workflow_steps Quantify single sample API.md
convert_sra_to_fastq metainformant.rna.engine.sra_extraction Convert SRA to FASTQ API.md
delete_sample_fastqs metainformant.rna.engine.sra_extraction Delete sample FASTQs API.md
run_download_quant_workflow metainformant.rna.engine.pipeline Unified download-quantify workflow (sequential/parallel) API.md

Monitoring Functions

Workflow progress and sample status tracking.

Function Module Description Documentation
count_quantified_samples metainformant.rna.monitoring Count quantified and total samples API.md
get_sample_status metainformant.rna.monitoring Get detailed status for a single sample API.md
analyze_species_status metainformant.rna.monitoring Comprehensive analysis of species workflow status API.md
find_unquantified_samples metainformant.rna.monitoring Find all unquantified samples API.md
check_active_downloads metainformant.rna.monitoring Check for samples currently being downloaded API.md
check_workflow_progress metainformant.rna.monitoring Get workflow progress summary API.md
assess_all_species_progress metainformant.rna.monitoring Assess progress for all species API.md
initialize_progress_tracking metainformant.rna.monitoring Initialize progress tracking API.md

Environment Functions

Tool availability and environment validation.

Function Module Description Documentation
check_amalgkit metainformant.rna.environment Check if amalgkit is available API.md
check_kallisto metainformant.rna.environment Check if kallisto is installed API.md
check_metainformant metainformant.rna.environment Check if metainformant package is installed API.md
check_virtual_env metainformant.rna.environment Check if running inside a virtual environment API.md
check_rscript metainformant.rna.environment Check if Rscript is available API.md
check_dependencies metainformant.rna.environment Check all required dependencies API.md
validate_environment metainformant.rna.environment Comprehensive environment validation API.md

Note: SRA Toolkit (check_sra_toolkit) is no longer a required dependency. ENA direct wget is the primary download method; SRA Toolkit is an automatic fallback only.

Cleanup Functions

Partial download cleanup and file naming fixes.

Function Module Description Documentation
cleanup_partial_downloads metainformant.rna.cleanup Clean up partial downloads API.md
fix_abundance_naming metainformant.rna.cleanup Create symlink for abundance file naming API.md
fix_abundance_naming_for_species metainformant.rna.cleanup Fix abundance naming for all samples API.md

Discovery Functions

Species discovery and configuration generation.

Function Module Description Documentation
search_species_with_rnaseq metainformant.rna.discovery Search NCBI SRA for species with RNA-seq data API.md
get_genome_info metainformant.rna.discovery Get genome assembly information API.md
generate_config_yaml metainformant.rna.discovery Generate amalgkit YAML configuration API.md

Quick Links