This repository contains the scripts necessary to replicate the polygenic risk score comparisons on the ADSP r5 dataset using GenoPred. Postprocessing scripts are also included.
Step 1: Calculate polygenic risk scores (PRS) for each person using GenoPred and combine those scores into a single file called final_merged_output.csv (see GenoPred directory)
Step 2: Separate cases and controls from other data in ADSP using separateCaseControl.py
Step 3: Extract the needed columns from the ADSP dataset for cases and controls using createCSVfromSeparatedPheno.py
Step 4: Extract needed columns from ADNI using readADNI.py
Step 5: Add ADNI output to ADSP output using addADNItoADSP.sh
Step 6: Combine phenotype data with PRS data using mergePRS_pdata.py
Step 7: Separate combined data based on genetic ancestry using ancestry projections from GenoPred and separateByGeneticAncestry.py
Step 8: Remove highly correlated features in each population-separated CSV using the absolute value of Spearman's rho using assessFeatureCorrelation.py. Different thresholds are allowed with the default being 0.7
Step 9: Determine the optimal threshold for maximizing precision for each PRS in each population using findOptimalThreshold.py. Thresholds are defined as a PRS greater than or equal to a PRS value. By default, at least 10 individuals must have a PRS greater than or equal to that threshold.
Step 10: Plot the best PRS for every threshold, stratified by APOE diplotype using plot_feature.py.