You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background:
The MiCall pipeline currently processes reads on per-real-sample basis and outputs an assembled consensus sequence for them. Each run relies on SampleSheet.csv files for input and output details. A feature to merge samples, ideally across different runs, would simplify the downstream analysis.
Feature Description:
Introduce a merger tool that takes a .csv mapping file and generates a merged SampleSheet.csv, RunInfo.xml, and a duplicate of the input .csv for traceability. The mapping file correlates sample_name and run_folder with output_name, specifying the merging plan.
Feature Objectives:
Facilitate efficient sample mergers across different run folders.
Ensure consistency and traceability for merged samples.
Handle default values and conflicts in input .csv files.
Functional Requirements:
Input to the tool:
Path to the mapping .csv file.
Path to the output folder.
Outputs of the tool:
SampleSheet.csv with merged output_name records.
RunInfo.xml copied from the first associated run_folder.
Input .csv file to trace origins of merged data.
Conflict resolution strategy, with a strict mode option (--strict flag).
Conflict Resolution Rules:
project_name header field to follow the $current_date.merged pattern.
date header field to reflect the actual merge date.
All other fields should use the first observed value unless --strict is enabled.
Fields index and index2 should default to XXXXX.
Implementation Tasks:
Develop a merging script for the underlying sample files.
Develop logic to parse the input .csv and handle row defaults.
Implement conflict detection logic with stdout reporting.
Create file generation procedures for SampleSheet.csv and RunInfo.xml.
Build merging algorithm to create a consolidated .csv from the mapping file.
Add a --non-strict mode for conflict resolution, with it becoming the default.
Write unit tests to validate merging logic and conflict handling.
Add documentation for the merger tool usage and features.
The text was updated successfully, but these errors were encountered:
Background:
The MiCall pipeline currently processes reads on per-real-sample basis and outputs an assembled consensus sequence for them. Each run relies on
SampleSheet.csv
files for input and output details. A feature to merge samples, ideally across different runs, would simplify the downstream analysis.Feature Description:
Introduce a merger tool that takes a
.csv
mapping file and generates a mergedSampleSheet.csv
,RunInfo.xml
, and a duplicate of the input.csv
for traceability. The mapping file correlatessample_name
andrun_folder
withoutput_name
, specifying the merging plan.Feature Objectives:
.csv
files.Functional Requirements:
.csv
file.SampleSheet.csv
with mergedoutput_name
records.RunInfo.xml
copied from the first associatedrun_folder
..csv
file to trace origins of merged data.--strict
flag).Conflict Resolution Rules:
project_name
header field to follow the$current_date.merged
pattern.date
header field to reflect the actual merge date.--strict
is enabled.index
andindex2
should default toXXXXX
.Implementation Tasks:
.csv
and handle row defaults.SampleSheet.csv
andRunInfo.xml
..csv
from the mapping file.--non-strict
mode for conflict resolution, with it becoming the default.The text was updated successfully, but these errors were encountered: