Releases: openpipelines-bio/openpipeline
OpenPipelines.bio v1.0.3
OpenPipelines.bio v0.12.7
OpenPipelines.bio v1.0.2
BUG FIXES
dataflow/concatenate_h5mu
: fix writing out multidimensional annotation dataframes (e.g..varm
) that had their
data dtype (dtype) changed as a result of adding more observations after concatenation, causingTypeError
.
One notable example of this happening is when one of the samples does not have a multimodal annotation dataframe
which is present in another sample; causing the values being filled withNA
(PR #842, backported from PR #837).
OpenPipelines.bio v1.0.1
OpenPipelines.bio v1.0.0
BREAKING CHANGES
-
query/cellxgene_census
: Refactored the interface, documentation and internal workings of this component (PR #621).- Renamed arguments to align with standard OpenPipelines notations and cellxgene census API:
--input_database
became--input_uri
--cellxgene_release
became--census_version
--cell_query
became--obs_value_filter
--cells_filter_columns
became--cell_filter_grouping
--min_cells_filter_columns
became--cell_filter_minimum_count
--modality
became--output_modality
- Removed
--dataset_id
since it was no longer being used. - Added
--add_dataset_meta
to add metadata to the output MuData object.
- Documentation of the component and its arguments was improved.
- Renamed arguments to align with standard OpenPipelines notations and cellxgene census API:
-
Docker image names now use
/
instead of_
between the name of the component and the namespace (PR #712). -
Change separator for arguments with multiple inputs from
:
to;
(PR #700 and #707). Now, all arguments withmultiple: true
will use;
as the separator.
This change was made to be able to deal with file paths that contain:
, e.g.s3://my-bucket/my:file.txt
. Furthermore, the;
separator will become
the default separator for all arguments withmultiple: true
in Viash >= 0.9.0. -
This project now uses viash version 0.8.4 to build components and workflows. Changes related to this version update should
be mostly backwards compatible with respect to the results and execution of the pipelines. From a development perspective,
drastic updates have been made to the developemt workflow.Development related changes:
- Bump viash version to 0.8.4 (PR #598, PR#638, #697 and #706) in the project configuration.
- All pipelines no longer use the anonymous workflow. Instead, these workflows were given
a name which was added to the viash config as the entrypoint to the pipeline (PR #598). - Removed the
workflows
folder and moved its contents to new locations:-
The
resources_test_scripts
folder now resides in the root of the project (PR #605). -
All workflows have been moved to the
src/workflows
folder (PR #605).
This implies that workflows must now be build usingviash (ns) build
, just like with components. -
Adjust GitHub Actions to account for new workflow paths (PR #605).
-
In order to be backwards compatible, the
workflows
folder now contains symbolic
links to the build workflows intarget
. This is not a problem when using the repository for pipeline
execution. However, if a developer wishes to contribute to the project, symlink support should be enabled
in git usinggit config core.symlinks=true
. Alternatively, use
git clone -c core.symlinks=true [email protected]:openpipelines-bio/openpipeline.git
when cloning the
repository. This avoids the symlinks being resolved (PR #628).
4bis. With PR #668, the workflows have been renamed. This does not hamper the backwards compatibility
of the symlinks that have been described in 4, because they still use the original location
which includes the original name.
*multiomics/rna_singlesample
has been renamed torna/process_single_sample
,
*multiomics/rna_multisample
has been renamed torna/rna_multisample
,
*multiomics/prot_multisample
becameprot/prot_multisample
,
*multiomics/prot_singlesample
becameprot/prot_singlesample
,
*multiomics/full_pipeline
was moved tomultiomics/process_samples
,
*multiomics/multisample
has been renamed tomultiomics/process_batches
,
*multiomics/integration/initialize_integration
changed tomultiomics/dimensionality_reduction
,
* finally, all workflows atmultiomics/integration/*
were moved tointegration/*
-
Removed the
workflows/utils
folder. Functionality that was provided by theDataflowHelper
andWorkflowHelper
is now being provided by viash when the workflow is being build (PR #605).
-
End-user facing changes:
- The
concat
component had been deprecated and will be removed in a future release.
It's functionality has been copied to theconcatenate_h5mu
component because the name is in
conflict with theconcat
operator from nextflow (PR #598). prot_singlesample
,rna_singlesample
,prot_multisample
andrna_multisample
: QC statistics
are now only calculated once where needed. This means that the mitochondrial gene detection is
performed in therna_singlesample
pipeline and the other count based statistics are calculated
during theprot_multisample
andrna_multisample
pipelines. In both cases, theqc
pipeline
is being used, but only parts of that workflow are activated by parametrization. Previously
the count based statistics were calculated in both thesinglesample
andmultisample
pipelines,
with the results from the multisample pipelines overwriting the previous results. What is breaking here
is that the qc statistics are not being added to the results of the singlesample worklows.
This is not an issue when using thefull_pipeline
because in this case the singlesample and
multisample workflows are executed in-tandem. If you wish to execute the singlesample workflows
in a seperate manner and still include count based statistics, please run theqc
pipeline
on the result of the singlesample workflow (PR #604).filter/filter_with_hvg
has been renamed tofeature_annotation/highly_variable_features_scanpy
, along with the following changes (PR #667).--do_filter
was removed--n_top_genes
has been renamed to--n_top_features
full_pipeline
,multisample
andrna_multisample
: Renamed arguments (PR #667).--filter_with_hvg_var_output
became--highly_variable_features_obs_batch_key
--filter_with_hvg_obs_batch_key
became--highly_variable_features_var_output
rna_multisample
: Renamed arguments (PR #667).--filter_with_hvg_n_top_genes
became--highly_variable_features_n_top_features
--filter_with_hvg_flavor
became--highly_variable_features_flavor
-
Renamed
obsm_metrics
touns_metrics
for thecellranger_mapping
workflow because the cellranger metrics are stored in.uns
and not.obsm
(PR #610). -
mapping/cellranger_mkfastq
: update from cellranger6.0.2
to7.0.1
(PR #675)
BUG FIXES
-
mapping/cellranger_multi
: Fix the regex for the fastq input files to allow dropping the lane from the input file names (e.g._L001
) (PR #778). -
workflows/rna/rna_singlesample
: Fix argument passingtop_n_vars
andobs_name_mitochondrial_fraction
to theqc
subworkflow (PR #779). -
rna_singlesample
: fixed a bug where selecting the column for the filtering with mitochondrial fractions
usingobs_name_mitochondrial_fraction
was done with the wrong column name, causingValueError
(PR #743). -
Fix publishing in
process_samples
andprocess_batches
(PR #759). -
Cellranger multi: Fix using a relative input path for
--vdj_inner_enrichment_primers
(PR #717) -
dataflow/split_modalities
: remove unusedcompression
argument. Useoutput_compression
instead (PR #714). -
metadata/grep_annotation_column
: fix calculating fraction when an input observation has no counts, which caused
the result to be out of bounds. -
Fix
--output
argument not working for several workflows (PR #740). -
transform/log1p
: fix--input_layer
argument not functioning (PR #678). -
dataflow/concat
anddataflow/concatenate_h5mu
: Fix an issue where using--mode move
on samples with non-overlapping features would causevar_names
to become unaligned to the data (PR #653). -
filter/filter_with_scrublet
: (Testing) Fix duplicate test function names (PR #641). -
dataflow/concatenate_h5mu
anddataflow/concat
: FixTypeError
when using mode 'move' and a column with conflicting metadata does not exist across all samples (PR #631). -
dataflow/concatenate_h5mu
anddataflow/concat
: Fix an issue where joining columns with different datatypes causedTypeError
(PR #619). -
qc/calculate_qc_metrics
: Resolved an issue where statistics based on the input columns selected with--var_qc_metrics
were incorrect when these input columns were encoded inpd.BooleanDtype()
(PR #685). -
move_obsm_to_obs
: fix setting output columns when they already exist (PR #690). -
workflows/dimensionality_reduction
workflow: nearest neighbour calculations no longer recalcalates the PCA whenobm_input
--obsm_pca
is not set toX_pca
. -
feature_annotation/highly_variable_scanpy
: fix .X being used to remove observations with 0 counts when--layer
has been specified. -
filter/filter_with_counts
: fix--layer
argument not being used. -
transform/normalize_total
: fix incorrect layer being written to the output when the input layer if not.X
. -
src/workflows/qc
: fix input layer not being taken into account when calculating the fraction of mitochondrial genes (always used .X). -
convert/from_cellranger_multi_to_h5mu
: fix metric values not repesented as percentages being devided by 100. (#704).
NEW FUNCTIONALITY
-
dimred/tsne
component: Added a tSNE dimensionality reduction component (PR #742). -
multisample
pipeline: This workflow now works when provided multimple unimodal files or multiple multimodal files, in addition to the previously supported single multimodal file (PR #606). The modalities are processed independently from each other:- As before, a single multimodal file is split into several unimodal MuData objects, e...
OpenPipelines.bio v1.0.0-rc6
BUG FIXES
dataflow/concatenate_h5mu
: fix regression bug where observations are no longer linked to the correct metadata
after concatenation (PR #807)
OpenPipelines.bio v1.0.0-rc5
BUG FIXES
cluster/leiden
: prevent leiden component from hanging when a child process is killed (e.g. when there is not enough memory available) (PR #805).
OpenPipelines.bio v1.0.0-rc4
BREAKING CHANGES
query/cellxgene_census
: Refactored the interface, documentation and internal workings of this component (PR #621).- Renamed arguments to align with standard OpenPipelines notations and cellxgene census API:
--input_database
became--input_uri
--cellxgene_release
became--census_version
--cell_query
became--obs_value_filter
--cells_filter_columns
became--cell_filter_grouping
--min_cells_filter_columns
became--cell_filter_minimum_count
--modality
became--output_modality
- Removed
--dataset_id
since it was no longer being used. - Added
--add_dataset_meta
to add metadata to the output MuData object.
- Documentation of the component and its arguments was improved.
- Renamed arguments to align with standard OpenPipelines notations and cellxgene census API:
BUG FIXES
OpenPipelines.bio v1.0.0-rc3
BREAKING CHANGES
- Docker image names now use
/
instead of_
between the name of the component and the namespace (PR #712).
BUG FIXES
-
rna_singlesample
: fixed a bug where selecting the column for the filtering with mitochondrial fractions
usingobs_name_mitochondrial_fraction
was done with the wrong column name, causingValueError
(PR #743). -
Fix publishing in
process_samples
andprocess_batches
(PR #759).
NEW FUNCTIONALITY
dimred/tsne
component: Added a tSNE dimensionality reduction component (PR #742).
OpenPipelines.bio v1.0.0-rc2
BUG FIXES
-
Cellranger multi: Fix using a relative input path for
--vdj_inner_enrichment_primers
(PR #717) -
dataflow/split_modalities
: remove unusedcompression
argument. Useoutput_compression
instead (PR #714). -
metadata/grep_annotation_column
: fix calculating fraction when an input observation has no counts, which caused
the result to be out of bounds. -
Fix
--output
argument not working for several workflows (PR #740).
MINOR CHANGES
-
metadata/grep_annotation_column
: Added more logging output (PR #697). -
metadata/add_id
andmetadata/grep_annotation_column
: Bump python to 3.11 (PR #697). -
Bump viash to 0.8.5 (PR #697)
-
dataflow/split_modalities
: add more logging output and bump python to 3.12 (PR #714). -
correction/cellbender
: Update nextflow resource labels fromsinglecpu
andlowmem
tomidcpu
andmidmem
(PR #736)