Skip to content

Commit

Permalink
Add SBC logo
Browse files Browse the repository at this point in the history
Add link to nextflow on stanage and the SBC docs

Mark alignment section as optional
  • Loading branch information
markdunning committed Jul 13, 2024
1 parent afed1c0 commit 149ac9a
Show file tree
Hide file tree
Showing 6 changed files with 158 additions and 290 deletions.
31 changes: 23 additions & 8 deletions part00.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Course Overview"
author: "Mark Dunning"
author: "Mark Dunning - Sheffield Bioinformatics Core"
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
output:
html_notebook:
Expand All @@ -13,7 +13,7 @@ output:
knitr::opts_chunk$set(echo = TRUE)
```


![](images/logo-sm.png)

# Overview

Expand All @@ -30,9 +30,12 @@ The workshop is designed to give you an introduction to using the command-line t
## Links to Individual Sections

- [Transfering files and Assessing Read Quality](part01.nb.html)
- [Alignment and Quantification](part02.nb.html)
- [Alignment and Quantification - OPTIONAL](part02.nb.html)
- [Workflows, Pipelines and Workflow Managers](part03.nb.html)




# Objectives: After this course you should be able to:

- Run Bioinformatics programs from the command-line
Expand All @@ -47,20 +50,22 @@ The workshop is designed to give you an introduction to using the command-line t
- A basic workflow for processing RNA-seq data
- Workflow management systems (such as nextflow), and why they are a recommended approach for running Bioinformatics pipeline

# Background

We will assume that you already have a working knowledge of using a command-line environment to perform tasks such as:-
We will also review some command-line basics to perform tasks such as:-

- log into a command-line environment
- listing the contents of a directory using `ls`
- navigate through a file system using `cd`
- moving and copying files using `cp` and `mv`

Please see below for an introduction / refresher
Please see below for a fuller introduction

- [Introducing the Shell](https://datacarpentry.org/shell-genomics/01-introduction)
- [Navigating Files and Directories](https://datacarpentry.org/shell-genomics/02-the-filesystem.html)
- [Working with Files and Directories](https://datacarpentry.org/shell-genomics/02-the-filesystem.html)
- [Working with Files and Directories](https://datacarpentry.org/shell-genomics/03-working-with-files.html)

There is also a reference guide available for some common commands

- [Unix Cheatsheet](https://upload.wikimedia.org/wikipedia/commons/7/79/Unix_command_cheatsheet.pdf)


Expand Down Expand Up @@ -103,10 +108,20 @@ HOME=/home/dcuser

![](images/set_home.png)


## Command-line review

We will now review the following sections of the Data Carpentry materials to (re-)familiarise ourselves with the Unix environment

- [Introduction](https://datacarpentry.org/shell-genomics/01-introduction.html#navigating-your-file-system)
- [Moving around the file system](https://datacarpentry.org/shell-genomics/02-the-filesystem.html#moving-around-the-file-system)
- [Working with files](https://datacarpentry.org/shell-genomics/03-working-with-files.html#working-with-files)
- [Creating, moving, copying and removing](https://datacarpentry.org/shell-genomics/03-working-with-files.html#creating-moving-copying-and-removing)

<a href="part01.nb.html" style="font-size: 50px; text-decoration: none">Click Here for next part</a>


## Running the environment on your own machine after the workshop
# Running the environment on your own machine after the workshop

Both Mac OSX and Windows 10 have the ability to run some of the commands presented in this course to navigate around a file system, copy files and list directories. However, you may prefer to practice in a "safe" environment, such as that used during the workshop. Furthermore, the NGS tools presented may be difficult to install.

Expand Down
115 changes: 34 additions & 81 deletions part00.nb.html

Large diffs are not rendered by default.

14 changes: 10 additions & 4 deletions part01.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Assessing Read Quality"
author: "Mark Dunning"
author: "Mark Dunning - Sheffield Bioinformatics Core"
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
output:
html_notebook:
Expand All @@ -13,6 +13,8 @@ output:
knitr::opts_chunk$set(echo = TRUE,eval=FALSE)
```

![](images/logo-sm.png)

(Adapted from the Data Carpentry Genomics wrangling materials at:- https://datacarpentry.org/wrangling-genomics/02-quality-control/index.html)

# Overview
Expand Down Expand Up @@ -149,8 +151,6 @@ Since this file is so small, we needn't worry about the file size. However, in a
gzip example_reads.fastq
## list the directory to see what changed...
ls -lh
## Now have to use zcat to print...
zcat example_reads.fastq | head -n4
```


Expand Down Expand Up @@ -185,6 +185,8 @@ md5sum fastq/Sample1/Sample1.fastq
md5sum --help
```

(Optional)

To see what would happen if a file didn't transfer properly, we can create a copy of `Sample1.fastq` containing fewer lines. The resulting md5sum will be different

```{bash}
Expand Down Expand Up @@ -387,4 +389,8 @@ firefox qc/ERR732901_sub.fastqc.html

If this doesn't work, there is a File Explorer tool available in the desktop environment that you can use to navigate to, and view the files.

<a href="part02.nb.html" style="font-size: 50px; text-decoration: none">Click Here for next part</a>
<a href="part02.nb.html" style="font-size: 50px; text-decoration: none">Click Here for an optional section about alignment and quantification</a>

or

<a href="part03.nb.html" style="font-size: 50px; text-decoration: none">Click Here to learn about workflows and pipelines</a>
87 changes: 13 additions & 74 deletions part01.nb.html

Large diffs are not rendered by default.

33 changes: 26 additions & 7 deletions part03.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Workflows, Pipeline and Workflow Managers"
author: "Mark Dunning"
author: "Mark Dunning - Sheffield Bioinformatics Core"
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
output:
html_notebook:
Expand All @@ -13,6 +13,8 @@ output:
knitr::opts_chunk$set(echo = TRUE,eval=FALSE)
```

![](images/logo-sm.png)

# Why do we need a pipeline? How might we create one?

We have now learnt a few commands that, when joined together, can form the basis for a minimal analysis pipeline
Expand Down Expand Up @@ -84,6 +86,20 @@ None of these issues are impossible to solve, but this isn't intended to be a wo
- [nextflow](https://www.nextflow.io/)
- [snakemake](https://snakemake.readthedocs.io/en/stable/)

<div class="information">

nextflow is available as a *module* on the University of Sheffield Stanage HPC.

```
module load Nextflow/23.10.0
```

- [https://docs.hpc.shef.ac.uk/en/latest/stanage/software/apps/nextflow.html](https://docs.hpc.shef.ac.uk/en/latest/stanage/software/apps/nextflow.html)
- [Sheffield Bioinformatics Core documentation on nextflow and nf.core](https://sbc.shef.ac.uk/nf-core-user-docs/)

</div>


## Running a nf.core pipeline

In our opinion, nextflow is particular appealing as many popular Bioinformatics pipelines have already been written using nextflow and have been distributed as part of the nf.core project
Expand All @@ -97,10 +113,14 @@ We will be showing the RNA-seq pipeline in particular
- [nf.core RNA-seq pipeline](https://nf-co.re/rnaseq)


The minimum number of options required to run an nf.core pipeline such as RNA-seq are:-
A minimal number of options required to run an nf.core pipeline such as RNA-seq are:-

```{bash}
nextflow run nf-core/rnaseq --input samplesheet.csv --outdir <OUTDIR> --genome GRCh37 -profile docker
nextflow run nf-core/rnaseq \
--input samplesheet.csv \
--outdir <OUTDIR> \
--genome GRCh37 \
-profile docker\
```

where:-
Expand All @@ -114,7 +134,7 @@ where:-
We have customised some of the options of the pipeline so run a reduced number of steps are run for the workshop, and using a custom genome containing a single chromosome.

```{bash}
cat scripts/run_nextflow.sh
cat run_nextflow.sh
```

The particular steps that we have modified are as follows:-
Expand Down Expand Up @@ -144,10 +164,9 @@ cat nf_samplesheet.csv
```


Before we can run the pipeline we need to move (or copy) the script to the working folder
We can run the pipeline as follows

```{bash}
cp scripts/run_nextflow.sh /data
bash run_nextflow.sh
```

Expand Down Expand Up @@ -225,7 +244,7 @@ ERR732905,fastq/ERR732905_sub.fastq.gz,,unstranded
ERR732906,fastq/ERR732906_sub.fastq.gz,,unstranded
ERR732907,fastq/ERR732907_sub.fastq.gz,,unstranded
ERR732908,fastq/ERR732908_sub.fastq.gz,,unstranded
ERR732907,fastq/ERR732909_sub.fastq.gz,,unstranded
ERR732909,fastq/ERR732909_sub.fastq.gz,,unstranded
```


Expand Down
168 changes: 52 additions & 116 deletions part03.nb.html

Large diffs are not rendered by default.

0 comments on commit 149ac9a

Please sign in to comment.