Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addition of MetaboLightsParam #17

Merged
merged 8 commits into from
Sep 19, 2024
Merged

addition of MetaboLightsParam #17

merged 8 commits into from
Sep 19, 2024

Conversation

philouail
Copy link
Collaborator

Here is the code in order to get a full MsExperiment object with spectra and sample data from the database.

I changed from merged() to cbind(), it is more stable i found and does not show warnings because of duplicated columns.

Tell me what you think about the extra parameters I implemented in order to simplify the sampleData !

Also see my note in the code (in the helper function description) about the duplicated ontology column automatic numbering so we can discuss it.

Lastly, unfortunately the GHA beaks because of MsBackendMetaboLights, is there a workaround ? For me the unit tests work locally.

Any other things I'm missing also ? Should there be an offline parameter similar to MsBackendMetaboLights ?

R/MsExperiment.R Outdated Show resolved Hide resolved
@philouail
Copy link
Collaborator Author

@jorainer the tests pass, but not for linux aha, do you know what is happening ?

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice @philouail ! Thanks!

I have some change requests - one is actually open for discussion: maybe it would help if we in addition add a parameter check.names = FALSE that is passed to all read.table() functions as well as the conversion of the sample data data.frame to a DataFrame (which will again start messing with column names). So, with check.names = FALSE users would get what they see online in the MetaboLights database, and with check.names = TRUE the columns would be formatted/fixed in the standard R fashion.

NEWS.md Show resolved Hide resolved
R/MetaboLightsParam.R Outdated Show resolved Hide resolved
R/MetaboLightsParam.R Outdated Show resolved Hide resolved
R/MetaboLightsParam.R Outdated Show resolved Hide resolved
R/MetaboLightsParam.R Outdated Show resolved Hide resolved
R/MsExperiment.R Show resolved Hide resolved
R/MsExperiment.R Outdated
fl <- object@spectra@backend@spectraData[1, "derived_spectral_data_file"]
nme <- colnames(merged_data)[which(merged_data[1, ] == fl)]
merged_data <- merged_data[grepl(param@filePattern,
merged_data[, nme]), ]

object@sampleData <- DataFrame(merged_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would suggest to also call DataFrame(merged_data, check.names = FALSE) to avoid renaming of column names. IMHO it would be best if the content (column names and content) of the sampleData() is exactly the same as what the user sees in MetaboLights online/web. Thus, also no replacing of with _ etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing the DataFrame thing but see my other comment regarding the rest :)

R/MsExperiment.R Show resolved Hide resolved
tests/testthat/test_MetaboLightsParam.R Show resolved Hide resolved
files <- unlist(files)
files <- gsub("href=\"|\"", "", files)
function(object, param, keepOntology = TRUE, keepProtocol = TRUE,
simplify = TRUE, ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a check.names = FALSE parameter too? that one could be passed to both the read.table() calls as well as the object@sampleData <- DataFrame call. To me that would be a nice parameter that allows users to configure how they want the sample data column names to be processed - and saves us from defining how to handle white spaces in column names etc.

@philouail philouail changed the title addition of MsBackendMetaboLights addition of MetaboLightsParam Sep 19, 2024
R/MsExperiment.R Outdated
.clean_merged <- function(x, keepProtocol, keepOntology, simplify) {
# remove ontology
if (!keepOntology)
x <- x[, -which(grepl("Term", names(x)))]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorainer continuing the discussion about checking column names or not here. I believe we cannot give the choice to the user whether or not to have numbered column. because as soon as this (or the one below) subsetting happen, there is an automatic renaming of duplicated columns. I could not find a way to prevent that..

Also I don't really want to allow the read.table(check.names = TRUE) when importing the assay and sample info, because in each there is already duplication of column, so after merging AND subsetting there would be double numbering...

Also the tables will be different compared to MetaboLights anyway because a lot of columns are not even present on the MetaboLights website...(they derive from the ontology term that the user input)

Lastly, even If i find a way to have as an output a table with duplicated column names with NO numbering, if the user subset the MsExperiment object by calling [] the numbering will get automatically added..

So I think it is a bit more complicated than expected this numbering thing....

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! thanks Phili!

@jorainer jorainer merged commit 86f7090 into main Sep 19, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants