Skip to content

Commit

Permalink
Merge pull request #17 from rformassspectrometry/phili
Browse files Browse the repository at this point in the history
addition of MetaboLightsParam
  • Loading branch information
jorainer authored Sep 19, 2024
2 parents 1bddd73 + 0a27d69 commit 86f7090
Show file tree
Hide file tree
Showing 8 changed files with 357 additions and 85 deletions.
1 change: 1 addition & 0 deletions .github/workflows/check-bioc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ jobs:
BiocManager::install('rhdf5', dependencies = TRUE, ask = FALSE, update = FALSE, INSTALL_opts = '--force-biarch')
BiocManager::install("mzR", dependencies = TRUE, ask = FALSE, update = FALSE)
BiocManager::install("msdata")
BiocManager::install("RforMassSpectrometry/MsBackendMetaboLights")
message(paste('****', Sys.time(), 'force installation of selected packages ****'))
BiocManager::install(c("BiocStyle", "rmarkdown", "magick", "Spectra"))
Expand Down
5 changes: 3 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: MsIO
Title: Serializing and restoring/importing mass spectrometry data objects
Version: 0.0.4
Version: 0.0.5
Authors@R:
c(person(given = "Johannes", family = "Rainer",
email = "[email protected]",
Expand Down Expand Up @@ -44,7 +44,8 @@ Suggests:
testthat,
xcms,
alabaster.se,
alabaster.matrix
alabaster.matrix,
MsBackendMetaboLights
License: Artistic-2.0
Encoding: UTF-8
VignetteBuilder: knitr
Expand Down
6 changes: 5 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# Version 0.0
# Version 0.0.5

## Changes in 0.0.5

- Add *MetaboLights* `readMsObject()` method for `MsExpriment()` objects.

## Changes in 0.0.4

Expand Down
87 changes: 69 additions & 18 deletions R/MetaboLightsParam.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,64 @@
#' The `MetaboLightsParam` class and the associated `readMsObject()` method
#' allow users to load an `MsExperiment` object from a study in the
#' MetaboLights database (https://www.ebi.ac.uk/metabolights/index) by
#' providing its unique study `studyId`. This function is particularly useful
#' providing its unique study `mtblsId`. This function is particularly useful
#' for importing metabolomics data into an `MsExperiment` object for further
#' analysis within the R environment. It's important to note that this method
#' can *only* be used for import into an R environement using `readMsObject()`.
#' It cannot be used with the `saveMsObject()` method.
#' analysis in the R environment.
#' It is important to note that at present it is only possible to *read*
#' (import) data from MetaboLights, but not to *save* data to MetaboLights.
#'
#' If the study contains multiple assays, the user will be prompted to select
#' which assay to load. The resulting `MsExperiment` object will include a
#' `sampleData` slot populated with data extracted from the selected assay.
#' Columns in the `sampleData` that contain only `NA` values are automatically
#' removed, and an additional column is added to track the injection index.
#'
#' @param studyId `character(1)` The MetaboLights study studyId, which should
#' Users can define how to filter this `sampleData` table by specifying a few
#' parameters. The `keepOntology` parameter is set to `TRUE` by default, meaning
#' that all ontology-related columns are retained. If set to `FALSE`, they are
#' removed. If ontology columns are kept, some column names may be duplicated and
#' therefore numbered. The order of these columns is important, as it reflects the
#' assay and sample information available in MetaboLights.
#'
#' The `keepProtocol` parameter is also set to `TRUE` by default, meaning that
#' all columns related to protocols are kept. If set to `FALSE`, they are removed.
#' The `simplify` parameter (default `simplify = TRUE`) allows to define
#' whether duplicated columns or columns containing only missing values should
#' be removed. In the case of duplicated content, only the first occurring
#' column will be retained.
#'
#' Further filtering can be performed using the `filePattern` parameter of the
#' `MetaboLightsParam` object. The default for this parameter is
#' `"mzML$|CDF$|cdf$|mzXML$"`, which corresponds to the supported raw data file
#' types.
#'
#' @param mtblsId `character(1)` The MetaboLights study ID, which should
#' start with "MTBL". This identifier uniquely specifies the study within the
#' MetaboLights database.
#'
#' @param assayName `character(1)` The name of the assay to load. If the study
#' contains multiple assays and this parameter is not specified, the user will
#' be prompted to select which assay to load.
#'
#' @param filePattern `character(1)` A regular expression pattern to filter the
#' raw data files associated with the selected assay. The default value is
#' `"mzML$|CDF$|cdf$|mzXML$"`, corresponding to the supported raw data file
#' types.
#'
#' @param keepOntology `logical(1)` Whether to keep columns related to ontology
#' in the `sampleData` parameter. Default is `TRUE`.
#'
#' @param keepProtocol `logical(1)` Whether to keep columns related to protocols
#' information in the `sampleData` parameter. Default is `TRUE`.
#'
#' @param simplify `logical(1)` Whether to simplify the `sampleData` table by
#' removing columns filled with NAs or duplicated content. Default is `TRUE`.
#'
#' @inheritParams saveMsObject
#'
#' @returns (for now ?) A `MsExperiment` object with only the sampleData slots
#' filled (will be updated when MetaboLightsBackend available ?).
#' @returns An `MsExperiment` object with the `sampleData` parameter populated using
#' MetaboLights sample and assay information. The spectra data is represented
#' as a `MsBackendMetabolights` object, generated from the raw data files
#' associated with the selected assay of the specified MetaboLights ID
#' (`mtblsId`).
#'
#' @author Philippine Louail
#'
Expand All @@ -40,35 +78,48 @@
#'
#' @examples
#' library(MsExperiment)
#' # Load a study with the studyId "MTBLS10035"
#' param <- MetaboLightsParam(studyId = "MTBLS10035")
#' ms_experiment <- readMsObject(MsExperiment(), param)
#' # Load a study with the mtblsId "MTBLS39" and selecting specific file pattern
#' # as well as removing ontology and protocol information in the metadata.
#' param <- MetaboLightsParam(mtblsId = "MTBLS39", filePattern = "63A.cdf")
#' ms_experiment <- readMsObject(MsExperiment(), param , keepOntology = FALSE,
#' keepProtocol = FALSE)
#'
#' @seealso
#' - `MsExperiment` object, defined in the
#' ([MsExperiment](https://bioconductor.org/packages/MsExperiment)) package.
#'
#' - `MsBackendMetaboLights` object, defined in the
#' ([MsBackendMetaboLights](https://github.com/rformassspectrometry/MsBackendMetaboLights))
#' repository.
#'
#' - [MetaboLights](https://www.ebi.ac.uk/metabolights/index) for accessing
#' the MetaboLights database.
#'
NULL

#' @noRd
setClass("MetaboLightsParam",
slots = c(studyId = "character"),
slots = c(mtblsId = "character",
assayName = "character",
filePattern = "character"),
contains = "Param",
prototype = list(
studyId = character(1)
mtblsId = character(1),
assayName = character(1),
filePattern = character(1)
),
validity = function(object) {
msg <- NULL
if (!grepl("^MTBLS", object@studyId))
msg <- c("'studyId' must start with 'MTBLS'")
if (!grepl("^MTBLS", object@mtblsId))
msg <- c("'mtblsId' must start with 'MTBLS'")
msg
})

#' @rdname MetaboLightsParam
#'
#' @export
MetaboLightsParam <- function(studyId = character(1)) {
new("MetaboLightsParam", studyId = studyId)
MetaboLightsParam <- function(mtblsId = character(), assayName = character(),
filePattern = "mzML$|CDF$|cdf$|mzXML$"){
new("MetaboLightsParam", mtblsId = mtblsId, assayName = assayName,
filePattern = filePattern)
}
129 changes: 92 additions & 37 deletions R/MsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -214,48 +214,103 @@ setMethod("readMsObject", signature(object = "MsExperiment",
setMethod("readMsObject",
signature(object = "MsExperiment",
param = "MetaboLightsParam"),
function(object, param, ...) {
url <- file.path(
"https://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/",
param@studyId, "/")
## Retrieve the HTML content using curl system command
html_content <- system(paste("curl -s", url), intern = TRUE) ##idk if that works for all OS
if (all(grepl("The requested URL was not found on this server",
html_content)))
stop("Study not found. Please check the study ID and try again.")
files <- regmatches(html_content,
gregexpr("href=\"([^\"]+\\.txt|[^\"]+\\.tsv)\"",
html_content))
files <- unlist(files)
files <- gsub("href=\"|\"", "", files)
function(object, param, keepOntology = TRUE, keepProtocol = TRUE,
simplify = TRUE, ...) {
if (!requireNamespace("MsBackendMetaboLights", quietly = TRUE)) {
stop("Required package 'MsBackendMetaboLights' is missing. ",
"Please install it and try again.", call. = FALSE)
}
pth <- MsBackendMetaboLights::mtbls_ftp_path(param@mtblsId)
all_fls <- MsBackendMetaboLights::mtbls_list_files(param@mtblsId)

## check assay files
assays <- grep("a_", files, value = TRUE)
if (length(assays) > 1) {
cat("Multiple assay files found:\n")
selection <- menu(assays,
title = paste("Please choose the assay",
"file you want to use:"))
selected_assay <- assays[selection]
} else if (length(assays) == 1) {
selected_assay <- assays[1]
cat("Only one assay file found:", selected_assay, "\n")
} else stop("No assay files found.") ## i don't think that would happen...
## Extract and read assay files
assays <- all_fls[grepl("^a_", all_fls)]
if (length(param@assayName) > 0)
selected_assay <- param@assayName
else {
if (length(assays) == 1) {
selected_assay <- assays
message("Only one assay file found:", selected_assay, "\n")
} else {
message("Multiple assay files found:\n")
selection <- menu(assays,
title = paste("Please choose the assay",
"file you want to use:"))
selected_assay <- assays[selection]
}
}

assay_data <- read.table(file.path(url, selected_assay),
header = TRUE, sep = "\t")
assay_data$injection_index <- seq_len(nrow(assay_data))
assay_data <- read.table(paste0(pth, selected_assay),
header = TRUE, sep = "\t",
check.names = FALSE)

## Extract and read sample info files
sample_info_files <- grep("s_", files, value = TRUE)
sample_info <- read.table(file.path(url, sample_info_files),
header = TRUE, sep = "\t")
merged_data <- merge(assay_data, sample_info, by = "Sample.Name")
merged_data <- merged_data[order(merged_data$injection_index), ]
merged_data <- merged_data[, colSums(is.na(merged_data)) <
nrow(merged_data)]
s_files <- all_fls[grepl("^s_", all_fls)]
sample_info <- read.table(paste0(pth, s_files),
header = TRUE, sep = "\t",
check.names = FALSE)

# merging
ord <- match(assay_data$`Sample Name`, sample_info$`Sample Name`)
merged_data <- cbind(assay_data, sample_info[ord, ])
names(merged_data) <- gsub(" ", "_", names(merged_data))
if (keepProtocol || keepOntology || simplify)
merged_data <- .clean_merged(x = merged_data,
keepProtocol = keepProtocol,
keepOntology = keepOntology,
simplify = simplify)

object@sampleData <- DataFrame(merged_data)
## Assemble object
object@spectra <- Spectra::Spectra(mtblsId = param@mtblsId,
source = MsBackendMetaboLights::MsBackendMetaboLights(),
assayName = selected_assay,
filePattern = param@filePattern)

## sample to spectra link
fl <- object@spectra@backend@spectraData[1, "derived_spectral_data_file"]
nme <- colnames(merged_data)[which(merged_data[1, ] == fl)]
merged_data <- merged_data[grepl(param@filePattern,
merged_data[, nme]), ]
nme <- gsub(" ", "_", nme) #use concatenate instead ?
object@sampleData <- DataFrame(merged_data, check.names = FALSE)
object <- MsExperiment::linkSampleData(object,
with = paste0("sampleData.",
nme,
"= spectra.derived_spectral_data_file"))
validObject(object)
object
})


#####HELPERS

#' function that takes the extra parameters and clean the metadata if asked by
#' the user.
#'
#' Note: the subsetting of the merged data is done here, which WILL rename the
#' duplicated columns. I could fix that by first transforming the data.frame into
#' a list but I am not sure that it is useful.. The user might do some
#' subsetting too later and then the same thing will happen. Might as well have
#' it from the beginning.
#'
#' Note2: I would move that function later, keeping it her for review to help
#' clarity
#'
#' @noRd
.clean_merged <- function(x, keepProtocol, keepOntology, simplify) {
# remove ontology
if (!keepOntology)
x <- x[, -which(grepl("Term", names(x))), drop = FALSE]

# remove protocol
if (!keepProtocol)
x <- x[, -which(grepl("Protocol|Parameter", names(x))), drop = FALSE]

# remove duplicated columns contents and NAs
if (simplify) {
x <- x[, !duplicated(as.list(x)), drop = FALSE]
x <- x[, colSums(is.na(x)) != nrow(x), drop = FALSE]
}
return(x)
}

Loading

0 comments on commit 86f7090

Please sign in to comment.