Skip to content

Commit

Permalink
differences for PR #8
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Sep 3, 2024
1 parent 714adc1 commit 788b437
Show file tree
Hide file tree
Showing 9 changed files with 135 additions and 127 deletions.
Binary file removed .DS_Store
Binary file not shown.
Binary file removed data/.DS_Store
Binary file not shown.
20 changes: 10 additions & 10 deletions episode2.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ download.file(url = "https://zenodo.org/record/8125141/files/E-MTAB-11349.counts
In R studio, open your project workbook and read the raw counts data. Then check the dimensions of the matrix to confirm we have the expected number of samples and transcript IDs.


```r
``` r
raw.counts.ibd <- read.table(file="data/E-MTAB-11349.counts.matrix.csv",
sep=",",
header=T,
Expand All @@ -121,7 +121,7 @@ raw.counts.ibd <- read.table(file="data/E-MTAB-11349.counts.matrix.csv",
writeLines(sprintf("%i %s", c(dim(raw.counts.ibd)[1], dim(raw.counts.ibd)[2]), c("rows corresponding to transcript IDs", "columns corresponding to samples")))
```

```{.output}
``` output
22751 rows corresponding to transcript IDs
592 columns corresponding to samples
```
Expand All @@ -136,11 +136,11 @@ View a small subset of the data, (e.g. first ten rows and 8 columns) to see how
:::::::::::::::::::::::: solution


```r
``` r
raw.counts.ibd[1:10,1:8]
```

```{.output}
``` output
read Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
1 1 * 13961 16595 20722 17696 25703 20848
2 2 ERCC-00002 0 0 0 0 0 0
Expand All @@ -163,34 +163,34 @@ raw.counts.ibd[1:10,1:8]
Now let's read the sdrf file (a plain text file) into R and check the dimensions of the file.


```r
``` r
# read in the sdrf file

samp.info.ibd <- read.table(file="data/E-MTAB-11349.sdrf.txt", sep="\t", header=T, fill=T, check.names=F)

sprintf("There are %i rows, corresponding to the samples", dim(samp.info.ibd)[1])
```

```{.output}
``` output
[1] "There are 590 rows, corresponding to the samples"
```

```r
``` r
sprintf("There are %i columns, corresponding to the available variables for each sample", dim(samp.info.ibd)[2])
```

```{.output}
``` output
[1] "There are 32 columns, corresponding to the available variables for each sample"
```

If we view the column names, we can see that the file does indeed contain a set of variables describing both phenotypical and experimental protocol information relating to each sample.


```r
``` r
colnames(samp.info.ibd)
```

```{.output}
``` output
[1] "Source Name"
[2] "Characteristics[organism]"
[3] "Characteristics[age]"
Expand Down
32 changes: 16 additions & 16 deletions episode3.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,36 +112,36 @@ The function `getGEO()` from the `GEOquery` library provides a convenient way to
If there is more than one SOFT file for a GEO Series, `getGEO()` will return a list of datasets. Let's download GSE212041.


```r
``` r
gse212041 <- GEOquery::getGEO("GSE212041")
```

```{.output}
``` output
Setting options('download.file.method.GEOquery'='auto')
```

```{.output}
``` output
Setting options('GEOquery.inmemory.gpl'=FALSE)
```

```{.output}
``` output
Found 2 file(s)
```

```{.output}
``` output
GSE212041-GPL18573_series_matrix.txt.gz
```

```{.output}
``` output
GSE212041-GPL24676_series_matrix.txt.gz
```


```r
``` r
sprintf("Number of files downloaded: %i", length(gse212041))
```

```{.output}
``` output
[1] "Number of files downloaded: 2"
```

Expand All @@ -154,11 +154,11 @@ Write the code to check that the number of samples in each file gives us the tot
:::::::::::::::::::::::: solution


```r
``` r
writeLines(sprintf("file %i: %i samples", 1:2, c(dim(gse212041[[1]])[2], dim(gse212041[[2]])[2])))
```

```{.output}
``` output
file 1: 16 samples
file 2: 765 samples
```
Expand All @@ -175,13 +175,13 @@ file 2: 765 samples
We'll extract the metadata for the larger dataset (765 samples) and examine the column names to verify that the file contains the expected metadata about the experiment.


```r
``` r
samp.info.cov19 <- Biobase::pData(gse212041[[2]])

colnames(samp.info.cov19)
```

```{.output}
``` output
[1] "title" "geo_accession"
[3] "status" "submission_date"
[5] "last_update_date" "type"
Expand Down Expand Up @@ -215,11 +215,11 @@ colnames(samp.info.cov19)
If we use the `exprs()` function to extract the counts data from the expression slot of the downloaded dataset, we'll see that in this data series, the counts matrix is not there. The expression set only contains a list of the accession numbers of the samples included in the expression set, but not the actual count data. (The code below attempts to view a sample of the data).


```r
``` r
Biobase::exprs(gse212041[[2]])[,1:10]
```

```{.output}
``` output
GSM6507615 GSM6507616 GSM6507617 GSM6507618 GSM6507619 GSM6507620
GSM6507621 GSM6507622 GSM6507623 GSM6507624
```
Expand All @@ -228,11 +228,11 @@ Biobase::exprs(gse212041[[2]])[,1:10]
We can verify this by looking at the dimensions of the object in the exprs slot.


```r
``` r
dim(Biobase::exprs(gse212041[[2]]))
```

```{.output}
``` output
[1] 0 765
```

Expand Down
Loading

0 comments on commit 788b437

Please sign in to comment.