Many of the Java classes here are in use by the main cbioportal/cbioportal codebase, the metaImport.py Python scripts are all used for importing. We decided to move them all to a single repo so that we might deprecate them in the future once we have come up with a good plan for replacing them.
This repo contains:
- many old Java classes for interacting with the database
- The
metaImport.pyPython script used for importing
The cbioportal-core code is currently included in the final Docker image during the Docker build process: https://github.com/cBioPortal/cbioportal/blob/master/docker/web-and-data/Dockerfile#L48
Build docker image with:
docker build -t cbioportal-core .Import gene panels
docker run -it -v $(pwd)/tests/test_data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
perl importGenePanel.pl --data /data/study_es_0/data_gene_panel_testpanel1.txt
docker run -it -v $(pwd)/tests/test_data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
perl importGenePanel.pl --data /data/study_es_0/data_gene_panel_testpanel2.txtImport gene sets and supplementary data
docker run -it -v $(pwd)/src/test/resources/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
perl importGenesetData.pl --data /data/genesets/study_es_0_genesets.gmt --new-version msigdb_7.5.1 --supp /data/genesets/study_es_0_supp-genesets.txtImport gene set hierarchy data
docker run -it -v $(pwd)/src/test/resources/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
perl importGenesetHierarchy.pl --data /data/genesets/study_es_0_tree.yamlImport study
docker run -it -v $(pwd)/tests/test_data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
python importer/metaImport.py -s /data/study_es_0 -p /data/api_json_system_tests -oTo add or update specific patient, sample, or molecular data in an already loaded study, you can perform an incremental upload. This process is quicker than reloading the entire study.
To execute an incremental upload, use the -d (or --data_directory) option instead of -s (or --study_directory). Here is an example command:
docker run -it -v $(pwd)/data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core python importer/metaImport.py -d /data/study_es_0_inc -p /data/api_json -oNote: While the directory should adhere to the standard cBioPortal file formats and study structure, incremental uploads are not supported for all data types though. For instance, uploading study metadata, resources, or GSVA data incrementally is currently unsupported.
This method ensures efficient updates without the need for complete study reuploads, saving time and computational resources.
This section guides you through the process of running integration tests by setting up a cBioPortal MySQL database environment using Docker. Please follow these steps carefully to ensure your testing environment is configured correctly.
Integration tests now start a MySQL 5.7 container via Testcontainers. When you run mvn integration-test, the test bootstrap:
- downloads
cgds.sqlfor thecbioportal.versioninpom.xmlintotarget/test-db/ - starts a MySQL 5.7 container pre-loaded with
cgds.sqlandsrc/test/resources/seed_mini.sql
Docker is required for integration tests. To use an existing MySQL instance instead, set CBIOPORTAL_TEST_DB_SKIP=true and provide connection overrides via JVM system properties (for example -Ddb.test.host=... -Ddb.test.port=... -Ddb.test.username=... -Ddb.test.password=...).
Optional manual startup (matches the Testcontainers config, assuming target/test-db/cgds.sql exists; download it with curl if needed):
curl -o target/test-db/cgds.sql https://raw.githubusercontent.com/cBioPortal/cbioportal/<cbioportal.version>/src/main/resources/db-scripts/cgds.sql
Replace <cbioportal.version> with the value from pom.xml.
docker run -p 3306:3306 \
-v $(pwd)/src/test/resources/seed_mini.sql:/docker-entrypoint-initdb.d/seed.sql:ro \
-v $(pwd)/target/test-db/cgds.sql:/docker-entrypoint-initdb.d/cgds.sql:ro \
-e MYSQL_ROOT_PASSWORD=root \
-e MYSQL_USER=cbio_user \
-e MYSQL_PASSWORD=somepassword \
-e MYSQL_DATABASE=cgds_test \
mysql:5.7
With the database up and running, you are now ready to execute the integration tests.
Use Maven to run the integration tests. Ensure you are in the root directory of your project and run the following command:
mvn integration-test
To contribute to cbioportal-core, ensure you have the following tools installed:
- Python 3: Required for study validation and orchestration scripts. These scripts utilize the underlying loader jar.
- Perl: Specify the version required based on script compatibility. Necessary for data loading scripts interfacing with lookup tables.
- JDK 21: Essential for developing the data loader component.
- Maven 3.8.3: Used to compile and test the loader jar. Review this issue before starting.
- Create a Python virtual environment (first-time setup):
python -m venv .venv- Activate the virtual environment:
source .venv/bin/activate- Install required Python dependencies (first-time setup or when dependencies have changed):
pip install -r requirements.txtAfter you are done with the setup, you can build and test the project.
- Execute tests through the provided script:
./test_scripts.sh- Build the loader jar using Maven (includes testing):
mvn clean packageNote: The Maven configuration is set to place the jar in the project's root directory to ensure consistent paths in both development and production.
The loader requires specific properties set to establish a connection to the database. These properties should be defined in the application.properties file within your project.
- Begin by creating your application.properties file. This can be done by copying from an example or template provided in the project:
cp application.properties.example application.properties- Open application.properties in your preferred text editor and modify the properties to match your database configuration and other environment-specific settings.
The PORTAL_HOME environment variable should be set to the directory containing your application.properties file, typically the root of your project:
export PORTAL_HOME=$(pwd)
Ensure this command is run in the root directory of your project, where the application.properties file is located. This setup is crucial for the loader to correctly access the required properties.
TODO: Document role of maven.properties file.
To run scripts that require the loader jar, ensure the jar file is in the project root.
The script will search for core-*.jar in the root of the project:
python scripts/importer/metaImport.py -s tests/test_data/study_es_0 -p tests/test_data/api_json_unit_tests -o