Skip to content

Commit

Permalink
Updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
JRandy77 committed Jul 28, 2023
1 parent 9cc258d commit 35e00c5
Show file tree
Hide file tree
Showing 7 changed files with 355 additions and 7 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
/graph_image.png
/results/*
__pycache__
Scripts

results

Expand Down
113 changes: 113 additions & 0 deletions docs/source/file_tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# File_Tree

File Tree's are a way to specify convention for data structures and naming schemes.
If you are interested you can find the documentation [here](https://pypi.org/project/file-tree/).
However, I would recommend reading this documentation instead because not all of the file_tree
features are used in the utility and you may find some differences in the way things are handled
here.
There are further ways to increase robustness of this tool using file_tree functionality, so
expect this section to evolve in the future.

## Placeholders

Placeholders are a way denote portions of directory and file names that may vary. A common
instance of this is subject ID's. i.e. sub-01, sub-02. The portion of '01' and '02' is the
place holder.
There are two kinds of placeholders: required and optional.

### Required Placeholders

Required placeholders, as the name implies, are segments that must exist in order to make
a match between a file/directory and a template name.
Required placeholders are denoted by curly braces '{}'.
Example:
```
identity-{required identity}
```

### Optional Placeholders

Optional placeholders, conversely, are segments that may or may not exist in order to find a
match between a file/directory and a template name.
Optional placeholders are denoted by square braces '[]'.
They are useful for options in names that might exist but don't always have to.
Example:
```
[optional-{placeholder}]
```
## Subtrees

With file trees it is possible to reference a file tree within another. This is useful for
mainting clarity for large data structures. A subtree must be located within the same folder
the parent file tree. To reference a sub tree the following expression is used: '-> {subtree name}'
Example:
```
parent-{parent_id}
[optional-{optional sub directory}]
-> subtree
```

## Identifiers

Identifiers are the "nicknames" used to reference a file/directory template name. These can
be specified within the file tree using '()'. They will be automatically generated if none is
given by removing an extension. The specifics of this can be found [here](https://git.fmrib.ox.ac.uk/ndcn0236/file-tree/-/blob/master/src/file_tree/template.py#:~:text=def%20guess_key(,)%5B0%5D) in the guess_key function.
It is highly recommended to supply these "nicknames" for clarity.
Example:
```
subject-{subject}_important-processed-data.data (processed_data)
```

## Example Trees
Several example trees are included with the file_tree_check passage.
Example of bids_raw tree:
```
ext=.nii.gz
participant = 1, 2, 3, 4, 5, 6, 7, 8
dataset_description.json
participants.tsv
README (readme)
CHANGES (changes)
LICENSE (license)
genetic_info.json
sub-{participant}
[ses-{session}]
sub-{participant}_sessions.tsv (sessions_tsv)
anat (anat_dir)
sub-{participant}[_ses-{session}][_acq-{acq}][_ce-{ce}][_rec-{rec}][_run-{run_index}]_{modality}{ext} (anat_image)
sub-{participant}[_ses-{session}][_acq-{acq}][_ce-{ce}][_rec-{rec}][_run-{run_index}][_mod-{modality}]_defacemask{ext} (anat_deface)
func (func_dir)
sub-{participant}[_ses-{session}]_task-{task}[_acq-{acq}][_ce-{ce}][_dir-{dir}][_rec-{rec}][_run-{run_index}][_echo-{echo}]_bold.nii.gz (task_image)
sub-{participant}[_ses-{session}]_task-{task}[_acq-{acq}][_ce-{ce}][_dir-{dir}][_rec-{rec}][_run-{run_index}][_echo-{echo}]_bold.json (task_image_json)
sub-{participant}[_ses-{session}]_task-{task}[_acq-{acq}][_ce-{ce}][_dir-{dir}][_rec-{rec}][_run-{run_index}][_echo-{echo}]_sbref{ext} (task_sbref)
sub-{participant}[_ses-{session}]_task-{task}[_acq-{acq}][_ce-{ce}][_dir-{dir}][_rec-{rec}][_run-{run_index}][_echo-{echo}]_events.tsv (task_events)
sub-{participant}[_ses-{session}]_task-{task}[_acq-{acq}][_ce-{ce}][_dir-{dir}][_rec-{rec}][_run-{run_index}][_echo-{echo}][_recording-{recording}]_physio.tsv.gz (task_physio)
sub-{participant}[_ses-{session}]_task-{task}[_acq-{acq}][_ce-{ce}][_dir-{dir}][_rec-{rec}][_run-{run_index}][_echo-{echo}][_recording-{recording}]_stim.tsv.gz (task_stim)
sub-{participant}[_ses-{session}]_task-{task}[_acq-{acq}][_ce-{ce}][_dir-{dir}][_rec-{rec}][_run-{run_index}][_echo-{echo}]_desc-confounds_{confounds}.tsv (desc-confounds)
dwi (dwi_dir)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_dwi{ext} (dwi_image)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_dwi.bval (bval)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_dwi.bvec (bvec)
fmap (fmap_dir)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_phasediff{ext} (fmap_phasediff)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_magnitude{ext} (fmap_mag)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_magnitude1{ext} (fmap_mag1)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_magnitude2{ext} (fmap_mag2)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_phase1{ext} (fmap_phase1)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_phase2{ext} (fmap_phase2)
sub-{participant}[_ses-{session}][_acq-{acq}][_run-{run_index}]_fieldmap{ext} (fmap)
sub-{participant}[_ses-{session}][_acq-{acq}]_dir-{dir}[_run-{run_index}]_epi{ext} (fmap_epi)
meg (meg_dir)
sub-{participant}[_ses-{session}]_task-{task}[_run-{run}][_proc-{proc}]_meg.{meg_ext} (meg)
eeg (eeg_dir)
sub-{participant}[_ses-{session}]_task-{task}[_run-{run}][_proc-{proc}]_eeg.{eeg_ext} (eeg)
ieeg (ieeg_dir)
sub-{participant}[_ses-{session}]_task-{task}[_run-{run}][_proc-{proc}]_ieeg.{ieeg_ext} (ieeg)
beh (behavioral_dir)
sub-{participant}[_ses-{session}]_task-{task}_events.tsv (behavioural_events)
sub-{participant}[_ses-{session}]_task-{task}_beh.tsv (behavioural)
sub-{participant}[_ses-{session}]_task-{task}_physio.tsv.gz (behavioural_physio)
sub-{participant}[_ses-{session}]_task-{task}_stim.tsv.gz (behavioral_stim)
```
More examples can be found [here](https://git.fmrib.ox.ac.uk/fsl/fslpy/-/tree/master/fsl/utils/filetree/trees)
24 changes: 24 additions & 0 deletions docs/source/general_information.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# file_tree_check

File checking project for McGill NeuroDataScience - ORIGAMI lab

The file_tree_check package takes a repeating file organization (large amount of
folders with same name or files with similar name) and will do comparisons
between every occurrences to highlight missing or unusual files/folders.

Written initially for neural imaging data structure like
[BIDS](https://bids.neuroimaging.io/) but compatible with any data structure
where folder names and file of similar name are repeating.

Uses a [file_tree](https://pypi.org/project/file-tree/) as a template for data structure.
This allows easy and convenient c onfiguration of the tool. With a properly written
file_tree any data structure is able to be analyzed.

## Requirements

Requires python >=3.8.

The required python libraries are the following are listed in the
[pyproject.toml](pyproject.toml) file.

## Demo
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ Welcome to file_tree_check's documentation!
:maxdepth: 2
:caption: Contents:

general_information
usage
examples
file_tree
config
faq
glossary
Expand Down
149 changes: 146 additions & 3 deletions docs/source/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,153 @@ There are two ways to use file_tree_check. It can either be installed and used a
or you can fork the repository and use it as a python script. The instructions for each are
different, but the underlying principles are the same.

## How to install as CLI.
## Inputs

There are two things that always must be inputted into the program, the root directory, and a File_Tree template. There are a couple ways to pass this information which will be elaborated on later.
Optionally you can also input a config file that the program will read for your configuration options.

### Root Directory

This is the directory where your tree is rooted. The contents of this directory are what will be analyzed.

### File_Tree template

This is how you template for the program what you expect the structure of the dataset to be. This will be described in detail in its own section of the documentation.

### Config file

This is an optional input that streamlines the process of configuring parameters individually. The options are explained in detail under Usage as Python script/Configure config file.

## Outputs

There are several output options you will see but at the moment only two are recommended for use. Unless specified otherwise all outputs will be deposited in a folder called 'results' in your current working directory.
The primary output is a summary.txt file. The specifics of this file can vary depending on your configuration.
The other output you may find useful is the Tree. This is a visual output of the tree structure itself. This can be useful for ensuring that tree discovery is executed as intended.

## Usage as CLI

### Installation as CLI

```
git clone https://github.com/neurodatascience/file_tree_check.git
cd file_tree_check
pip install .
```

### Usage as CLI

####
To run with default settings, use the following command:
```
file_tree_check -r {root directory} -f {file_tree}
```
When inputting file_tree you can either input a path to a file_tree or you can use the short name* of one of the built in file_trees**.
Ex:
```
file_tree_check -r {root directory} -f bids_raw
```
*The short name is just the name of the file_tree without the '.tree' extension.
**Current built in trees are bids_raw, fMRIPrep, and freesurfer. More are on the way, if you require another feel free to open an issue on the github repository requesting one.

### Commands

`-c` or `--config`: Specifies the path to the configuration file. Usage: `-c path/to/configfile`

`-r` or `--root`: Specifies the path to the root directory to be explored. Usage: `-r path/to/root/directory`

`-f` or `--file_tree`: Specifies the path to the file tree to be used. Usage: `-f path/to/file/tree` or `-f name_of_std_file_tree`

`-ff` or `--filter_files`: If this flag is present, files will be filtered. Usage: `-ff`

`-fd` or `--filter_directories`: If this flag is present, directories will be filtered. Usage:
`-fd`

`-fh` or `--filter_hidden`: If this flag is present, hidden files and directories will be filtered. Usage: `-fh`

`-fc` or `--filter_custom`: Specifies a list of files and directories to be ignored by the program. Usage: `-fc name1,name2,...`

`-mfc` or `--file_count`: If this flag is present, the file_count measure will be on. Usage: `-mfc`

`-mdc` or `--dir_count`: If this flag is present, the dir_count measure will be on. Usage: `-mdc`

`-ms` or `--file_size`: If this flag is present, the file_size measure will be on. Usage: `-ms`

`-mt` or `--modified_time`: If this flag is present, the modified_time measure will be on. Usage: `-mt`

`-mtr` or `--time_round`: Specifies the rounding margin for modified time measurement (in seconds). Default is 500 seconds Usage: `-mtr integer_value`

`-msr` or `--size_rounding`: Specifies the rounding percentage for file size measurement. Based off of percentage of mean. Default is .01 Usage: `-msr float_value`

`-o` or `--output`: Specifies the path to the output directory. Relevant output files will be overwritten/created. Usage: `-o path/to/output/directory`

`-os` or `--summary`: If this flag is present, a summary file will be created. Usage: `-os`

`-ot` or `--tree`: If this flag is present, a text tree file will be created. Usage: `-ot`

`-oc` or `--csv`: If this flag is present, a csv file will be created. Usage: `-oc`

`-p` or `--pipe_data`: If this flag is present, data will be piped to stdout. Usage: `-p`

`-gc` or `---get_configurations`: If this flag is present, directory content configurations
will be compared. Usage: `-gc`

`-td` or `--target_depth`: Specifies the target depth for directory content configurations. Usage: `-td integer_value`

`-dr` or `--depth_range`: Specifies a range of depths for directory content configurations. Usage: `-dr integer_value1 integer_value2`.

`-dl` or `--depth_limit`: Specifies the depth limit of exploration. Usage: `-dl integer_value`

`-l` or `--log`: Specify a path to log file. Usage: `-l path\to\logfile`

`-ll` or `--loge_level`: Specify log level. Usage: `-ll DEBUG`

`-v` or `--verbose`: If toggled then verbose mode will be on. Usage: `-v`

`-d` or `--debug`: If toggled then debug mode will be on. Usage: `-d`

## Usage as Python script

You can also use file_tree_check as a python script. This may be more convenient if you prefer using custom config file to specify parameters.

### Installation as Python script

#### Clone the repository

First fork the repository to allow to save your modifications of the config file
on github, then clone the forked repository on your machine.

You can then run the following command in a terminal to install the package and
its dependencies.

```bash
pip install .
```

### Configure config file

Inside the _src/file_tree_check_ folder in your local installation, open the
`src/file_tree_check/config.ini` file and change the options to suit your need
and use case.

Be sure to modify the paths to the proper locations you want the outputs to be
saved in. By default root directory and file tree paths are left empty. This will return an error unless you pass these arguments through the command line, so it is recommended to configure these to your use.

By default the config.ini file in file_tree_check folder will be used. To change this edit line ~17 of main.py to add path to another config.ini file.

The config options are detailed in the
["Config File" section of the documentation](https://file-tree-check.readthedocs.io/en/latest/config.html).

### Running the script

## How to install as python script.
Once the repository and config files are set up running the script is fairly simple. From the command line, run python and give it the main.py path to launch the script. Remember that file_tree_check requires a root directory and a file_tree. If you haven't configured these paths in the config.ini file pass them as arguments.
Example with root directory and file_tree paths in config file:
```
python main.py
```
Example with root directory and file_tree paths passed as arguments:
```
python main.py -r {root directory} -f {file_tree}
```
Notes:
-Depending on your operating system to run a python script you may type python or py3.
-To run main.py make sure you're calling it with the correct relative path. (if cwd is not file_tree_check)
-The same options apply for passing file_tree as in [Usage as CLI](#usage-as-cli-1) section.
7 changes: 4 additions & 3 deletions file_tree_check/_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
class Parser:
"""Class to parse command line arguments and configuration file."""

DEFAULT_CONFIG_PATH = os.path.join(Path(__file__).parent, "config.ini")
DEFAULT_CONFIG_PATH = os.path.join(Path(__file__).parent, "config_default.ini")

def __init__(self) -> None:
self.logger = logging.getLogger(f"file_tree_check.{__name__}")
Expand Down Expand Up @@ -197,10 +197,11 @@ def pars_args(
# Logging
parser.add_argument("-l", "--log", type=Path, help="Path to log file.")
parser.add_argument("-ll", "--log_level", type=int, help="Specify log level.")
parser.add_argument(
group = parser.add_mutually_exclusive_group()
group.add_argument(
"-v", "--verbose", help="If toggled then verbose mode will be on.", action="store_true"
)
parser.add_argument(
group.add_argument(
"-d", "--debug", help="If toggled then debug mode will be on.", action="store_true"
)
return parser
Expand Down
Loading

0 comments on commit 35e00c5

Please sign in to comment.