Skip to content

Commit fed31ec

Browse files
authored
Merge pull request #7 from Russel88/dev
0.1.18
2 parents 96add80 + 4c3706c commit fed31ec

11 files changed

Lines changed: 70 additions & 41 deletions

File tree

README.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,17 +58,18 @@ maginator ... --cluster qsub --cluster_info "-l nodes=1:ppn={cores}:thinnode,mem
5858

5959
## Test data
6060

61-
A test set can be found in the test_data directory.
61+
A test set can be found in the maginator/test_data directory.
6262
1. Download the 3 samples used for the test at SRA: https://www.ncbi.nlm.nih.gov/sra?LinkName=bioproject_sra_all&from_uid=715601 with the ID's dfc99c_A, f9d84e_A and 221641_A
63-
2. Change the paths to the read-files in reads.csv
64-
3. Unzip the contigs.fasta.gz
65-
4. Run MAGinator
63+
2. Clone repo: git clone https://github.com/Russel88/MAGinator.git
64+
3. Change the paths to the read-files in reads.csv
65+
4. Unzip the contigs.fasta.gz
66+
5. Run MAGinator
6667

6768
MAGinator has been run on the test data on a slurm server with the following command:
68-
```
69+
```sh
6970
maginator --vamb_clusters clusters.tsv --reads reads.csv --contigs contigs.fasta --gtdb_db data/release207_v2/ --output test_out --cluster slurm --cluster_info "-n {cores} --mem {mem_gb}gb -t {runtime}" --max_mem 180
7071
```
71-
The expected output can be found in test_data/test_out (excluding the GTDB-tk folders, phylogeny alignments and BAM-files due to size limitations)
72+
The expected output can be found as a zipped file on Zenodo: https://doi.org/10.5281/zenodo.8279036
7273

7374
## Recommended workflow
7475

@@ -88,14 +89,23 @@ sed 's/@/_/g' vamb/clusters.tsv > clusters.tsv
8889

8990
Now you are ready to run MAGinator.
9091

92+
## Functional Annotation
93+
9194
To generate the functional annotation of the genes we recommend using EggNOG mapper (https://github.com/eggnogdb/eggnog-mapper).
9295

9396
You can download it and try to run it on the test data
94-
```
97+
```sh
9598
mkdir test_out/functional_annotation
9699
emapper.py -i test/genes/all_genes_rep_seq.fasta --output test_out/functional_annotation -m diamond --cpu 38
97100
```
98101

102+
The eggNOG output can be merged with clusters.tsv and further processed to obtain functional annotations of the MAG, cluster or sample levels with the following command:
103+
```sh
104+
(echo -e '#sample\tMAG_cluster\tMAG\tfunction'; join -1 1 -2 1 <(awk '{print $2 "\t" $1}' clusters.tsv | sort) <(tail -n +6 annotations.tsv | head -n -3 | cut -f1,15 | grep -v '\-$' | sed 's/_[[:digit:]]\+\t/\t/' | sed 's/,/\n/g' | perl -lane '{$q = $F[0] if $#F > 0; unshift(@F, $q) if $#F == 0}; print "$F[0]\t$F[1]"' | sed 's/\tko:/\t/' | sort) | awk '{print $2 "\t" $2 "\t" $3}' | sed 's/_/\t/' | sort -k1,1 -k2,2n) > MAGfunctions.tsv
105+
```
106+
In this case the KEGG ortholog column 15 was picked from the eggNOG-mapper output. But by cutting e.g. column number 13, one would obtain GO terms instead. Refer to the header of the eggNOG-mapper output for other available functional annotations e.g. KEGG pathways, Pfam, CAZy, COGs, etc.
107+
108+
99109
## MAGinator workflow
100110

101111
This is what MAGinator does with your input (if you want to see all parameters run maginator --help):

conda_build/meta.yaml

Lines changed: 0 additions & 31 deletions
This file was deleted.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
name: checkm-genome
2+
channels:
3+
- bioconda
4+
dependencies:
5+
- checkm-genome
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
library(BSgenome.Hsapiens.UCSC.hg19.masked)
2+
genome <- BSgenome.Hsapiens.UCSC.hg19
3+
out_file <- file.path(snakemake@output[["hg19"]])
4+
export(genome, out_file)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
name: metabat2
2+
channels:
3+
- bioconda/label/cf201901
4+
dependencies:
5+
- metabat2
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
channels:
2+
- bioconda
3+
- conda-forge
4+
- r
5+
dependencies:
6+
- biopython=1.79
7+
- pandas=1.4
8+
- bbmap=38.96
9+
- sickle-trim=1.33
10+
- spades=3.15.5
11+
- samtools=1.10
12+
- bwa-mem2=2.2.1
13+
- bioconductor-bsgenome.hsapiens.ucsc.hg19.masked=1.3.993
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
name: samtools
2+
channels:
3+
- bioconda
4+
dependencies:
5+
- samtools
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
name: vamb
2+
channels:
3+
- pytorch
4+
- conda-forge
5+
- bioconda
6+
dependencies:
7+
- pytorch
8+
- pip
9+
- torchvision
10+
- cudatoolkit=10.2
11+
- pysam
12+
- numpy=1.20
13+
- pip:
14+
- git+https://github.com/RasmussenLab/vamb@v3.0.8

maginator/workflow/envs/phylo.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
channels:
2-
- bioconda
32
- conda-forge
3+
- bioconda
44
- biobuilds
55
dependencies:
66
- biopython=1.79

package.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1+
# New version
2+
## 1) Update version in setup.py and commit and push
3+
## 2) Pull request of dev into main
4+
## 3) Make release on GitHub
5+
## 4) Run this code:
16
rm -r maginator.egg-info/ dist/ build/
27
python setup.py sdist
3-
python setup.py install
48
twine upload dist/*

0 commit comments

Comments
 (0)