Skip to content

Latest commit

 

History

History
44 lines (38 loc) · 4.55 KB

File metadata and controls

44 lines (38 loc) · 4.55 KB

Overview

This repository contains code that builds upon Quercus_IUCN_samp_sims, a previous simulation project by Kaylee Rosenberger. The goal of this subproject is to assess the variation in minimum sample size estimates (MSSEs) required to maintain genetic diversity of various ex situ oak collections. In this subproject, we calculate prediction intervals around MSSEs required for 95% allelic representation. We also explore how the 95% MSSE changes based on sampling of few to many loci for alleles of different frequency categories using MSAT and SNP genetic marker datasets.

Repository structure

Scripts

  1. MSSE_confidenceIntervals.R
  • Script calculates the confidence interval (CI) values around the 95% MSSE using IUCN 14 oaks dataset, and builds a matrix that stores the CI values. Script visualizes allelic representation at a number of randomly sampled individuals for each species with scatterplots, and saves images in .pdf format.
    • Inputs
      • IUCN 14 oaks: quercus_final_results_orig.Rdata--a resampling array containing the total allelic representation values for oaks simulated by Kaylee Rosenberger; source code found in the Quercus_IUCN_samp_sims repo
    • Outputs
      • Quercus14_CI_values.csv
      • 14CIplots.pdf
      • 14CIWidthplots.pdf
      • 14CIWidthplotshigh.pdf
      • 14CIWidthplotslow.pdf
  1. QUAC_MSSE_Quantiles.R
  • Script calculates MSSE means and quantiles, and generates plots for the total allelic representation (and other categories of allelic frequency) in order to create confidence intervals around 95% minimum sample size estimates. The approach used in this script for calculating allelic representation confidence intervals is improved upon by using the predict function (see MSSE_PredictionIntervals.R).
    • Inputs
      • QUAC_Subset_resampArrs folder--resampling arrays built from Quercus acerifolia (QUAC) microsatellite (MSAT) and single nucleotide polymorphism (SNP) genetic data (for SNPs, R0 and R80). These datasets are all subset to the same number of samples, to allow for greater comparability between marker types and missing data levels.
  1. MSSE_PredictionIntervals.R
  • Script calculates the prediction interval (PI) values around the 95% MSSE using two different datasets (QUAC and IUCN 14 oaks), and builds a matrix that stores the PI values
    • Inputs
      • QUAC: QUAC_Subset_resampArrs folder--resampling arrays built from Quercus acerifolia (QUAC) microsatellite (MSAT) and single nucleotide polymorphism (SNP) genetic data (for SNPs, R0 and R80). These datasets are all subset to the same number of samples, to allow for greater comparability between marker types and missing data levels.
      • IUCN 14 oaks: quercus_final_results_orig.Rdata--a resampling array containing the total allelic representation values for oaks simulated by Kaylee Rosenberger; source code found in the Quercus_IUCN_samp_sims repo
    • Outputs
      • QUAC: QUAC_PI_values.csv
      • IUCN 14 oaks: Quercus14_PI_values.csv
  1. QUAC_QUBO_loci_bootstrapping.R
  • Script builds resampling arrays based on different ranges of randomly sampled loci, calculates the prediction intervals around the 95% MSSEs, and builds a matrix that stores the PI values
    • Inputs
      • LociBootstrapping_Datasets folder--genpop objects for wild populations of Q. acerifolia (QUAC) and Q. boyntonii (QUBO), saved as R objects.
    • Outputs
      • QUAC_MSSE_Quantiles.csv

Datasets

This folder contains the input files read in by the analyses in the Scripts folder (see outline of Inputs above). These files are typically either resampling arrays (sets of allelic representation values, for a given number of randomly drawn samples) or genpop objects (read in using the adegenet library) from which resampling arrays are built.

Outputs

This folder contains the CSV outputs generated by the analyses in the Scripts folder (see outline of Outputs above). Generally, the contents of these CSVs are minimum sample size estimates and upper/lower the confidence intervals (CI) or prediction intervals (PI) bounding them.

Archive

This folder contains one archived R script and one archived .csv file that stores the original for loop necessary to loci bootstrap and analyze a resampling array that calculates prediciton intervals around the 95% MSSE, and builds a matrix that stores the PI values.