GWAS rsID annotation and LiftOver

Summary

This repository contains workflow / scripts for:

(1) annotating GWAS with rsIDs (e.g. in regenie output, when only CHR and GENPOS columns are present)

(2) performing liftover of genome build (38 <--> 37)

Annotation and liftover can be done as a part of one workflow or completely independently.

Workflow

Please note:

remember to make all .sh files executable (chmod 777 file.sh)
it is recommended to do this on a server, as it might be very slow in a local environment

Obtaining genome reference data

Both datasets will be downloaded into refdata/ folder in the working directory (i.e. this cloned repo).

Please note: if you are in Exeter, the data download step can be skipped - simply create a symbolic link to data in /slade/projects/Public_Ref_Datasets/dbsnp/annotate_rsids_and_liftover/refdata/ (ln -s)

Data for rsID annotation

Download dbSNP v157 data in VCF format (for the specified genome build 37/38), and subset it to only required for annotation columns in .txt format.

To run: (recommended to run with nohup or screen as it might take a while)

./00_download_ref_data_annotation.sh 38 # specify build 38 or 37

Output: refdata/dbSNP157_refdata_build38.txt

Data for LiftOver

Download the chain files required for performing liftover; must specify the starting genome build (38 or 37).

To run:

./00_download_ref_data_liftover.sh 38 # specify build 38 or 37

Output: refdata/hg38ToHg19.over.chain or refdata/hg19ToHg38.over.chain

(1) Genome annotation

Script for performing rsID annotation; The main .sh script internally calls .R script:

├── 01_annotate_GWAS_with_rsids.sh
    └── 01_annotate_helper.R

Assumptions:

The input GWAS file is in the regenie format (i.e. the specific column order and names; modify those in your file before running the script if needed): CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ INFO N TEST BETA SE CHISQ LOG10P EXTRA
The input GWAS file name does not contain dots in the file name; only to separate the file extension:
- my_GWAS.txt.gz - ok
- my.GWAS.txt.gz - not ok
If running on a server, make sure to load the necessary R modules: (e.g. for Exeter, uncomment line 42 in 01_annotate_GWAS_with_rsids.sh)
The script uses the dbSNP annotation data (refdata/dbSNP151_refdata_build38.txt) generated earlier using 00_download_ref_data_annotation.sh, so it assumes that the input GWAS data is in build 38.

To run:

Provide full paths to annotaion file and input GWAS file:

./01_annotate_GWAS_with_rsids.sh /path/to/refdata/dbSNP157_refdata_build38.txt /path/to/your/GWAS.txt.gz

Output: /path/to/your/GWAS_rsids.txt.gz

(2) Genome liftover

Script for performing liftover; The main .sh script internally calls .R script:

├── 02_liftover_GWAS.sh
    └── 02_lifover_helper.R

Assumptions:

The same assumptions as for (1) apply here;
If running on a server, make sure you load the necessary R module: (e.g. for Exeter, uncomment line 22 in 02_liftover_GWAS.sh )

To run:

Provide full paths to input GWAS file (output from (1) or a separate file in regenie format), path to folder containing the downloaded chain files, and the starting genome build (i.e. if need b38 -> b37, add '38'):

./02_liftover_GWAS.sh /path/to/your/GWAS_rsids.txt.gz /path/to/refdata/ 38

Output: /path/to/your/GWAS_rsids_b37.txt.gz

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
supplementary		supplementary
.gitignore		.gitignore
00_download_ref_data_annotation.sh		00_download_ref_data_annotation.sh
00_download_ref_data_liftover.sh		00_download_ref_data_liftover.sh
01_annotate_GWAS_with_rsids.sh		01_annotate_GWAS_with_rsids.sh
01_annotate_helper.R		01_annotate_helper.R
02_liftover_GWAS.sh		02_liftover_GWAS.sh
02_liftover_helper.R		02_liftover_helper.R
README.md		README.md
fig.png		fig.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GWAS rsID annotation and LiftOver

Summary

Workflow

Obtaining genome reference data

Data for rsID annotation

Data for LiftOver

(1) Genome annotation

(2) Genome liftover

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GWAS rsID annotation and LiftOver

Summary

Workflow

Obtaining genome reference data

Data for rsID annotation

Data for LiftOver

(1) Genome annotation

(2) Genome liftover

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages