differences for PR #8

carpentries-incubator · Jul 2, 2024 · 6d5b0bc · 6d5b0bc
1 parent c1d903b
commit 6d5b0bc
Show file tree

Hide file tree

Showing 6 changed files with 135 additions and 127 deletions.
diff --git a/episode2.md b/episode2.md
@@ -111,7 +111,7 @@ download.file(url = "https://zenodo.org/record/8125141/files/E-MTAB-11349.counts
 In R studio, open your project workbook and read the raw counts data. Then check the dimensions of the matrix to confirm we have the expected number of samples and transcript IDs.
 
 
-```r
+``` r
 raw.counts.ibd <- read.table(file="data/E-MTAB-11349.counts.matrix.csv",
                              sep=",",
                              header=T,
@@ -121,7 +121,7 @@ raw.counts.ibd <- read.table(file="data/E-MTAB-11349.counts.matrix.csv",
 writeLines(sprintf("%i %s", c(dim(raw.counts.ibd)[1], dim(raw.counts.ibd)[2]), c("rows corresponding to transcript IDs", "columns corresponding to samples")))
 ```
 
-```{.output}
+``` output
 22751 rows corresponding to transcript IDs
 592 columns corresponding to samples
 ```
@@ -136,11 +136,11 @@ View a small subset of the data, (e.g. first ten rows and 8 columns) to see how
 :::::::::::::::::::::::: solution 
 
 
-```r
+``` r
 raw.counts.ibd[1:10,1:8]
 ```
 
-```{.output}
+``` output
             read Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
 1   1          *    13961    16595    20722    17696    25703    20848
 2   2 ERCC-00002        0        0        0        0        0        0
@@ -163,34 +163,34 @@ raw.counts.ibd[1:10,1:8]
 Now let's read the sdrf file (a plain text file) into R and check the dimensions of the file.
 
 
-```r
+``` r
 # read in the sdrf file
 
 samp.info.ibd <- read.table(file="data/E-MTAB-11349.sdrf.txt", sep="\t", header=T, fill=T, check.names=F)
 
 sprintf("There are %i rows, corresponding to the samples", dim(samp.info.ibd)[1])
 ```
 
-```{.output}
+``` output
 [1] "There are 590 rows, corresponding to the samples"
 ```
 
-```r
+``` r
 sprintf("There are %i columns, corresponding to the available variables for each sample", dim(samp.info.ibd)[2])
 ```
 
-```{.output}
+``` output
 [1] "There are 32 columns, corresponding to the available variables for each sample"
 ```
 
 If we view the column names, we can see that the file does indeed contain a set of variables describing both phenotypical and experimental protocol information relating to each sample.
 
 
-```r
+``` r
 colnames(samp.info.ibd)
 ```
 
-```{.output}
+``` output
  [1] "Source Name"                            
  [2] "Characteristics[organism]"              
  [3] "Characteristics[age]"                   

diff --git a/episode3.md b/episode3.md
@@ -112,36 +112,36 @@ The function `getGEO()` from the `GEOquery` library provides a convenient way to
 If there is more than one SOFT file for a GEO Series, `getGEO()` will return a list of datasets. Let's download GSE212041.
 
 
-```r
+``` r
 gse212041 <- GEOquery::getGEO("GSE212041")
 ```
 
-```{.output}
+``` output
 Setting options('download.file.method.GEOquery'='auto')
 ```
 
-```{.output}
+``` output
 Setting options('GEOquery.inmemory.gpl'=FALSE)
 ```
 
-```{.output}
+``` output
 Found 2 file(s)
 ```
 
-```{.output}
+``` output
 GSE212041-GPL18573_series_matrix.txt.gz
 ```
 
-```{.output}
+``` output
 GSE212041-GPL24676_series_matrix.txt.gz
 ```
 
 
-```r
+``` r
 sprintf("Number of files downloaded: %i", length(gse212041))
 ```
 
-```{.output}
+``` output
 [1] "Number of files downloaded: 2"
 ```
 
@@ -154,11 +154,11 @@ Write the code to check that the number of samples in each file gives us the tot
 :::::::::::::::::::::::: solution 
 
 
-```r
+``` r
 writeLines(sprintf("file %i: %i samples", 1:2, c(dim(gse212041[[1]])[2], dim(gse212041[[2]])[2])))
 ```
 
-```{.output}
+``` output
 file 1: 16 samples
 file 2: 765 samples
 ```
@@ -175,13 +175,13 @@ file 2: 765 samples
 We'll extract the metadata for the larger dataset (765 samples) and examine the column names to verify that the file contains the expected metadata about the experiment.
 
 
-```r
+``` r
 samp.info.cov19 <- Biobase::pData(gse212041[[2]])
 
 colnames(samp.info.cov19)
 ```
 
-```{.output}
+``` output
  [1] "title"                   "geo_accession"          
  [3] "status"                  "submission_date"        
  [5] "last_update_date"        "type"                   
@@ -215,11 +215,11 @@ colnames(samp.info.cov19)
 If we use the `exprs()` function to extract the counts data from the expression slot of the downloaded dataset, we'll see that in this data series, the counts matrix is not there. The expression set only contains a list of the accession numbers of the samples included in the expression set, but not the actual count data. (The code below attempts to view a sample of the data).
 
 
-```r
+``` r
 Biobase::exprs(gse212041[[2]])[,1:10]
 ```
 
-```{.output}
+``` output
      GSM6507615 GSM6507616 GSM6507617 GSM6507618 GSM6507619 GSM6507620
      GSM6507621 GSM6507622 GSM6507623 GSM6507624
 ```
@@ -228,11 +228,11 @@ Biobase::exprs(gse212041[[2]])[,1:10]
 We can verify this by looking at the dimensions of the object in the exprs slot.
 
 
-```r
+``` r
 dim(Biobase::exprs(gse212041[[2]]))
 ```
 
-```{.output}
+``` output
 [1]   0 765
 ```