Skip to content

Latest commit

 

History

History
93 lines (57 loc) · 7.2 KB

File metadata and controls

93 lines (57 loc) · 7.2 KB
graph LR
    Genomic_Data_Sampler["Genomic Data Sampler"]
    Genomic_Region_Sampler["Genomic Region Sampler"]
    Genomic_Position_Sampler["Genomic Position Sampler"]
    Genomic_Track_Sampler["Genomic Track Sampler"]
    BigWig_Dataset["BigWig Dataset"]
    Core_Utilities["Core Utilities"]
    Genomic_Data_Sampler -- "composed of" --> Genomic_Region_Sampler
    Genomic_Data_Sampler -- "composed of" --> Genomic_Position_Sampler
    Genomic_Data_Sampler -- "composed of" --> Genomic_Track_Sampler
    Genomic_Data_Sampler -- "provides input to" --> BigWig_Dataset
    BigWig_Dataset -- "uses" --> Genomic_Region_Sampler
    BigWig_Dataset -- "uses" --> Genomic_Position_Sampler
    BigWig_Dataset -- "uses" --> Genomic_Track_Sampler
    Genomic_Region_Sampler -- "uses" --> Core_Utilities
    Genomic_Position_Sampler -- "uses" --> Core_Utilities
    Genomic_Track_Sampler -- "uses" --> Core_Utilities
    click Genomic_Data_Sampler href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/bigwig-loader/Genomic_Data_Sampler.md" "Details"
Loading

CodeBoardingDemoContact

Details

This subsystem is crucial for defining the "what" and "where" of genomic data to be loaded, acting as the primary interface for specifying regions of interest and data sources for the downstream data loading pipeline. Its modular design, leveraging the Strategy Pattern, allows for flexible and extensible sampling strategies.

Genomic Data Sampler [Expand]

This is the overarching component responsible for orchestrating the generation of genomic positions or intervals, the selection of relevant BigWig files (tracks), and the sampling of genomic sequences. It defines the "regions of interest" and "data sources" for subsequent data extraction and batching, forming the foundational input for the high-performance data loading pipeline. It acts as the control plane for data selection, ensuring the BigWig Dataset receives the necessary context to load the correct genomic data efficiently.

Related Classes/Methods:

Genomic Region Sampler

This sub-component is responsible for generating and managing the overarching genomic regions (e.g., chromosomes, start, and end coordinates) that will be used for data extraction. It provides mechanisms for both single genomic sequence sampling (GenomicSequenceSampler) and batch-oriented genomic sequence sampling (GenomicSequenceBatchSampler), ensuring that data can be requested in a structured manner.

Related Classes/Methods:

Genomic Position Sampler

This sub-component focuses on the precise sampling of individual positions within the broader genomic regions defined by the Genomic Region Sampler. It often incorporates randomness to select specific data points, which is vital for training machine learning models that require diverse input examples and for tasks like point-wise prediction.

Related Classes/Methods:

Genomic Track Sampler

This sub-component manages the selection of specific BigWig files, often referred to as "tracks," from a larger collection. It allows the system to filter and choose which types of genomic data (e.g., different epigenetic marks, gene expression levels) are relevant for a particular sampling operation, enabling flexible data source management.

Related Classes/Methods:

BigWig Dataset

This component handles the actual data access and loading based on the information (regions, positions, tracks) provided by the Genomic Data Sampler and its sub-components. It translates the sampled genomic coordinates and selected tracks into concrete data retrieval operations from BigWig files.

Related Classes/Methods:

Core Utilities

This component provides general helper functions and utilities used across the library, including common data structures, validation routines, and other foundational functionalities that support the sampling process.

Related Classes/Methods: