This repository contains data derived from publications about synthetic lethality in cancer together with Python code to extract and transform the data into a common format and to create a TSV output file.
To run the script, just enter
$ python parse_human_SLI.py
The script will download the file protein-coding gene.txt
from HGNC, which it uses to find NCBI Gene ids and Ensembl ids.
We have extract relevant data from publications about synthetic
lethality (mainly from Supplemental Tables etc.). The script will
create an output file called SL_data.tsv with positive and
negative (i.e., excluded) synthetic lethal interactions.
The package has a few requirements. The easiest way to set things up is to use a virtual environment.
virtualenv py3
source py3/bin/activate
pip install -r requirements.txt and before each use of the script:
source py3/bin/activateActivate the virtual environment as above, and then install the nose package
source py3/bin/activate
pip install nose
nosetestsBy default, the script will emit a file with the following fields. The current version of the file is to be found here.
| Column | Example |
|---|---|
| geneA | EGFR |
| geneA.ncbi-id | NCBIGene:1956 |
| geneA.ensembl-id | ENSG00000146648 |
| geneB | ANXA6 |
| geneB.ncbi-id | NCBIGene:309 |
| geneB.ensembl-id | ENSG00000197043 |
| geneA.perturbation | inhibitory antibody |
| geneB.perturbation | siRNA |
| assay | cell viability assay |
| cell.line | A-431 |
| cellosaurus.id | CVCL_0037 |
| pmid | 20858866 |