-
-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Below is a summary of the 20 pages of notes the workshop observers took at the first beta pilot workshop teaching this lesson (2022-05-25-aucenter-online). Issues are listed by episode and are described in an actionable way wherever possible. In a few cases, more discussion may be needed within the developer team before an action can be decided.
Follow-up activity on the list below will be split among the lesson developers. To claim one or more of the items, place your name in bold at the start of the item. After you have created an PR for the item, add a link to your PR at the end of the item. Check off an item once a PR has been merged.
Introduction to the example dataset and file type
Introducing R and RStudio IDE
- For workshop admin team: When providing instance addresses for learners, include :8787 in the link so learners can click the link directly from the spreadsheet rather than copy/pasting into browser and adding :8787. Also make the spreadsheet READ ONLY for learners to prevent learners accidentally deleting AWS instances. Instructors will need write access to assign instances to individual learners. #181
- Include instructions somewhere to Instructors to add learner names next to each instance in the spreadsheet. Learners need to only use the instance they are assigned. #240
- Add a warning box that the AWS link may not work for some browsers (e.g. Edge). For those browsers, learner needs to copy link and paste into browser instead of clicking link in spreadsheet. If anyone is using VPN, please switch it off for this workshop. #282
- Make the “save your script” instructions more prominent - maybe a warning box? #261
- Remove section in 01-introduction that uses ??geom_point as an example of what happens when you search for help on a package that isn’t installed on your system. This was unnecessarily confusing. #288
- In the exercise “Searching for R functions”, change “mixed linear model” to “linear model”. #264
- After exercise “Searching for R functions”, add note that Google search is also a good way to get help for R functions. #265
R Basics
- In R Basics - creating objects in R, add callout box about white space not being necessary but making code easier to read. Link to style guide? #300
- Should we add a callout box explaining why doing assignment operation with
=is a problem? (Erin votes no but including for completeness) #301 - Get rid of the example in R basics that computes the golden ratio and replace with a less confusing mathematical equation (both in the code chunks and in the exercise) #298
- Add an example to the vectors section demonstrating that character vectors are case sensitive #305
- In R basics - creating and subsetting vectors, note to Instructor that they and learners should copy/paste the character vectors provided in the code chunk. There is nothing gained by the learners in having to type those out by hand. #299
- Some code chunks use single quotes for character vectors and some use double quotes. Make this consistent across all code chunks in lesson. #183
- Change the code chunks in “creating and subsetting vectors” to use the snps vector instead of snp_genes. Currently, we have the learners create three new vectors and then not use them until 9 lines of code later. #184
- Missing line break in Exercise: Examining and subsetting vectors and in solution #178
- In “adding to, removing, or replacing values”, change
snp_genes[7]<- "APOA5"to add in 6th position and remove note about NAs. NAs are introduced later in the episode. #175 - In “a few final vector tricks”, note that NA stands for “not available” #179
- Add callout box explaining the difference between
==and%in%at the end of “a few final vector tricks” #258 - Review exercise 1 - change
typeof()tomode()to match what is taught earlier in lesson. #180 - Review exercise 2 - missing line break #182
R basics continued - factors and data frames
- Add callout explaining why ?read.csv opens read.table help file #149
- Improve the explanation of the difference between “base R” and “tidyverse” in a way that limits learner cognitive overhead. This explanation may need to come earlier in the lesson. #280
- At the end of “importing tabular data into R” include a table that describes what each of the column names means / stands for #281
- Change the note “put the first three columns of variants into a new data frame called subset” so that it matches code chunk, which also includes column 6 #292
- Explain the order of columns and rows in two-dimensional subsetting #291
-
str(subset)includes integer data, which hasn’t been introduced before. Explain this in the text. #285 - Instead of introducing snps <- c(alt_alleles . . . ), break this into smaller demos. First show alt_alleles[alt_alleles=="A"] then show a few variations before stringing them in within c(). #284
- Might be good to show str(factor_snps) and summary(factor_snps) on a character vector to reinforce the difference between factors vs character vectors. #283
Using packages from Bioconductor
- We haven’t talked about packages anywhere else, and only mentioned base R briefly. We may need a bit more material to introduce why packages are necessary/useful etc. #170
- Fix broken screenshot of bioconductor homepage #163
- Header level is off on “Second, install the vcfR package from Bioconductor using BiocManager” #168
- Add a warning box about the long install process while vcfR is being installed due to many dependencies. #188
- Installing packages from Bioconductor vs from CRAN) seems pretty nuanced and advanced - is there a way to simplify this for the learner? I think we hadn't introduced installing from CRAN yet, before talking about Bioconductor? #173
- Header level is off and broken screenshot on “Search for Bioconductor packages based on your analysis needs” #169
- Remove the final “challenge” box on this page. Looks like an unfinished note to developers of things that are already incorporated into this episode. #185
- Add a new challenge to this episode. Tracy had learners install a package of their choice from Bioconductor, but says “Feels risky to ask learners to install a package from the “wild” I’ve spent a fair amount of time running into trouble with dependencies, OS issues, etc.” Multiple learners had problems with installing packages, including some cases where our install of R wasn’t new enough. #186
- Learner says that Package that we installed didn’t show up in our package list. Why is this and should we add a warning / callout about it? #187
Data wrangling and analysis with tidyverse
- We used read.csv() to read in the variants data, but the lesson seems to have used read_csv in the background. This is affecting the output for example on the
selectcalls, which on the lesson page is showing tibble output but for learners/instructors is printing the full output to screen. #150 - Change the exercise on creating a table so that the solution is a series of steps rather than a single command. Also may need to make hint more explicit, as we haven’t taught ‘contains’ but information about it appears in the help file for ‘ends_with’. #157
- Note in exercise solution for creating a table that ‘contains’ is not case sensitive. #158
- When demonstrating the OR operator, change the code chunk to use two numerical variables. Is currently filter(variants, sample_id == "SRR2584863", (INDEL | QUAL >= 100)) and INDEL is logical. #160
- The challenge “Select all the mutations that occurred between the positions 1e6 (one million) and 2e6 (included) that are not indels and have QUAL greater than 200.” needs to be worded more clearly. #162
- Move explanation of QUAL from the mutate section to the first time QUAL is used (in challenge above). #166
- This challenge solution includes negation (!INDEL), which hasn’t been taught yet. Can instead do INDEL = FALSE #165
- Exercise for pipe and filter has a typo in question (“shwoing”). Filtered columns should be included in select so learners can confirm result. #164
- In “group by and summarize” section, take out the section using n() and go directly to tally(). Can then introduce summarize later with more complex examples. #172
- When installing packages at the start of the lesson, include tidyr. Otherwise have problems when trying to use pivot_wider or longer in this episode #176
- Make sure when learners are installing packages that we don’t have them install tidyverse - the server will hang. Just have them install the packages they need for the lesson - dplyr, tidyr, ggplot2. #177
Data visualization with ggplot2
- Remove the period in the example doing facet_grid, as it hasn’t been explained. (but see PR Minor edits #138)