Pre-processing Pipeline and Imaging Equipment Impact the Performance of CNN for Breast Cancer Detection in Mammograms: Evidence from CBIS-DDSM and VinDr-Mammo
code/pt/learners
- local_mammo_learner.py - Main script that manages the model's lifecycle, including functions for training, loading, and saving weights. This class also serves as the entry point for running preprocessing tests.
code/pt/utils
- birads_categories.json - Configuration file that defines the mapping of BI-RADS categories, used for model class reduction.
- img_utils.py - Module with utility functions for image manipulation and transformation, applied during the preprocessing stage. preprocess_dicom - Script responsible for processing DICOM files, including loading the file, extracting image data, and applying preprocessing routines.
- preprocess_json.py - Module containing utility functions for reading and parsing the dataset's JSON file.
data/datasets This path stores the JSON manifest files for all datasets. Each file contains metadata and paths for the images within its respective dataset. Each file contains the image dataset for a specific test scenario.
data/preprocessed Output directory for the final images generated by the pre-processing pipeline.
The main functions for configuring test scenarios are: pipelines and preprocessing.
-
pipelines : function responsible for building the test scenario, which consists of generating 25 random pipelines for validation for specific dataset.
-
preprocessing : This function executes a predefined set of preprocessing steps to test the following stages: normalization/standardization, image resizing, and filter application.
python3 ./code/pt/learners/local_mammo_learner.py '/home/nfferreira/data/dataset_site-1_DDSM.json' './data/preprocess/'
pipelines(debug_datalist=argumentos[1], debug_dataset_root=argumentos[2])
- debug_datalist: path to the dataset JSON file.
- debug_dataset_root: output folder for image processing.
preprocessing(debug_datalist=argumentos[1], debug_dataset_root=argumentos[2])
- debug_datalist: path to the dataset JSON file.
- debug_dataset_root: output folder for image processing.
lista = ['/home/nfferreira/data/dataset_site-1_VINDR_DDSM.json', '/home/nfferreira/data/dataset_site-1_DDSM.json', '/home/nfferreira/data/dataset_site-1_VINDR_ALLMAN.json']
for i in lista:
preprocessing(debug_datalist=i, debug_dataset_root=argumentos[2])