Skip to content

Commit

Permalink
support named capture groups
Browse files Browse the repository at this point in the history
  • Loading branch information
yaa110 authored Jan 18, 2025
1 parent 0dc779e commit 3bd3259
Show file tree
Hide file tree
Showing 9 changed files with 147 additions and 60 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ license = "MIT OR Apache-2.0"
name = "nomino"
readme = "README.md"
repository = "https://github.com/yaa110/nomino"
version = "1.5.2"
version = "1.6.0"

[dependencies]
anyhow = "1.0"
Expand Down
47 changes: 29 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,37 @@ Options:
-V, --version Print version
-w, --overwrite Overwrites output files, otherwise, a '_' is prepended to filename

OUTPUT pattern accepts placeholders that have the format of '{I:P}' where 'I' is the index of captured group and 'P' is the padding of digits with `0`. Please refer to https://github.com/yaa110/nomino for more information.
OUTPUT pattern accepts placeholders that have the format of '{G:P}' where 'G' is the captured group and 'P' is the padding of digits with `0`. Please refer to https://github.com/yaa110/nomino for more information.
```
### Placeholders
1. Placeholders have the format of `{G:P}` where `G` is the captured group and `P` is the padding of digits with `0`. For example, `{2:3}` means the third captured group with a padding of 3, i.e. `1` is formatted as `001`.
1. Indices start from `0`, and `{0}` means the filename.
1. The capture group `G` could be dropped, i.e. `{}` or `{:3}`. In this case an auto incremental index is used which starts from `1`. For example, `{} {}` equals `{1} {2}`.
1. `{` and `}` characters could be escaped using `\` character, i.e. `\\{` and `\\}` in cli.
1. Padding is only used for positive numbers, e.g. the formatted result of `{:3}` for `1` is `001`, for `-1` is `-1` and for `a` is `a`.
1. If `--sort` option is used, the first index `{0}` is the filename and the second index `{1}` or first occurrence of `{}` is the enumerator index.
### Capture Groups
The accepted syntax of regex pattern is [Rust Regex](https://docs.rs/regex/latest/regex/).
Consider this example:
```regex
(?<first>\w)(\w)\w(?<last>\w)
```
This regular expression defines 4 capture groups:
- The group at index `0` corresponds to the overall match. It is always present in every match and never has a name: `{0}`.
- The group at index `1` with name `first` corresponding to the first letter: `{1}`, `{first}` or the first occurrence of `{}`.
- The group at index `2` with no name corresponding to the second letter: `{2}` or the second occurrence of `{}`.
- The group at index `3` with name `last` corresponding to the fourth and last letter: `{3}`, `{last}` or the third occurrence of `{}`.
`?<first>` and `?<last>` are named capture groups.
### Windows
On Windows, `\\` must be used to separate path components in file paths because `\` is a special character in regular expressions.
Expand All @@ -73,23 +101,6 @@ On Windows, `\\` must be used to separate path components in file paths because
}
```
## Output

The output is necessary when using `--sort` or `--regex` options.

### Regex

The accepted syntax of regex pattern is [Rust Regex](https://docs.rs/regex/latest/regex/).

### Placeholders

1. Placeholders have the format of `{I:P}` where `I` is the index of captured group and `P` is the padding of digits with `0`. For example, `{2:3}` means the third captured group with a padding of 3, i.e. `1` is formatted as `001`.
1. Indices start from `0`, and `{0}` means the filename.
1. The index `I` could be dropped, i.e. `{}` or `{:3}`. In this case an auto incremental index is used which starts from `1`. For example, `{} {}` equals `{1} {2}`.
1. `{` and `}` characters could be escaped using `\` character, i.e. `\\{` and `\\}` in cli.
1. Padding is only used for positive numbers, e.g. the formatted result of `{:3}` for `1` is `001`, for `-1` is `-1` and for `a` is `a`.
1. If `--sort` option is used, the first index `{0}` is the filename and the second index `{1}` or first occurrence of `{}` is the enumerator index.
## Wiki
- **[Examples](https://github.com/yaa110/nomino/wiki/Examples)** learn nomino by examples
Expand Down
4 changes: 2 additions & 2 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ use std::path::PathBuf;
about,
author,
version,
after_help = "OUTPUT pattern accepts placeholders that have the format of '{I:P}' where 'I' \
is the index of captured group and 'P' is the padding of digits with `0`. Please refer to \
after_help = "OUTPUT pattern accepts placeholders that have the format of '{G:P}' where 'G' \
is the captured group and 'P' is the padding of digits with `0`. Please refer to \
https://github.com/yaa110/nomino for more information.",
next_display_order = None,
)]
Expand Down
55 changes: 24 additions & 31 deletions src/input/formatter.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
use super::{provider::Capture, Provider};
use crate::errors::FormatError;

#[derive(Debug, PartialEq)]
enum Segment {
PlaceHolder {
padding: Option<usize>,
index: usize,
capture: Capture,
},
String(String),
}
Expand All @@ -16,12 +17,12 @@ impl Formatter {
pub fn new(format: &str) -> Result<Self, FormatError> {
let mut segments = Vec::new();
let mut should_escape = false;
let mut is_parsing_index = false;
let mut is_parsing_capture = false;
let mut is_parsing_padding = false;
let mut current_segment = String::new();
let mut current_index: usize = 0;
let mut current_capture = Capture::Index(0);
let mut current_padding: Option<usize> = None;
let mut incremental_index = 1;
let mut incremental_index: usize = 1;
for (i, ch) in format.chars().enumerate() {
if !should_escape && ch == '\\' {
should_escape = true;
Expand All @@ -31,29 +32,26 @@ impl Formatter {
return Err(FormatError::InvalidEscapeCharacter(i, ch));
}
match ch {
'{' if !should_escape && !is_parsing_index && !is_parsing_padding => {
'{' if !should_escape && !is_parsing_capture && !is_parsing_padding => {
if !current_segment.is_empty() {
segments.push(Segment::String(current_segment));
current_segment = String::new();
}
is_parsing_index = true;
is_parsing_capture = true;
}
'}' if !should_escape => {
if !is_parsing_index && !is_parsing_padding {
if !is_parsing_capture && !is_parsing_padding {
return Err(FormatError::UnopenedPlaceholder);
}
if current_segment.is_empty() {
if is_parsing_index {
current_index = incremental_index;
if is_parsing_capture {
current_capture = Capture::Index(incremental_index);
incremental_index += 1;
} else if is_parsing_padding {
current_padding = None;
}
} else if is_parsing_index {
current_index = current_segment
.as_str()
.parse()
.map_err(|_| FormatError::InvalidIndex(current_segment.clone()))?;
} else if is_parsing_capture {
current_capture = current_segment.as_str().into();
current_padding = None;
} else if is_parsing_padding {
current_padding =
Expand All @@ -63,25 +61,22 @@ impl Formatter {
}
segments.push(Segment::PlaceHolder {
padding: current_padding,
index: current_index,
capture: current_capture,
});
current_segment.clear();
current_padding = None;
current_index = 0;
is_parsing_index = false;
current_capture = Capture::Index(0);
is_parsing_capture = false;
is_parsing_padding = false;
}
':' if is_parsing_index => {
is_parsing_index = false;
':' if is_parsing_capture => {
is_parsing_capture = false;
is_parsing_padding = true;
if current_segment.is_empty() {
current_index = incremental_index;
current_capture = Capture::Index(incremental_index);
incremental_index += 1;
} else {
current_index = current_segment
.as_str()
.parse()
.map_err(|_| FormatError::InvalidIndex(current_segment.clone()))?;
current_capture = current_segment.as_str().into();
current_segment.clear();
}
}
Expand All @@ -91,7 +86,7 @@ impl Formatter {
}
}
}
if is_parsing_index || is_parsing_padding {
if is_parsing_capture || is_parsing_padding {
return Err(FormatError::UnclosedPlaceholder);
}
if !current_segment.is_empty() {
Expand All @@ -100,12 +95,12 @@ impl Formatter {
Ok(Self(segments))
}

pub fn format(&self, vars: &[&str]) -> String {
pub fn format(&self, provider: impl Provider) -> String {
let mut formatted = String::new();
for segment in self.0.as_slice() {
match segment {
Segment::PlaceHolder { padding, index } => {
let Some(var) = vars.get(*index) else {
Segment::PlaceHolder { padding, capture } => {
let Some(var) = provider.provide(capture) else {
continue;
};
if let Some((padding, digits)) =
Expand Down Expand Up @@ -187,7 +182,7 @@ mod tests {
while let Some((format, vars, expected)) = format_vars_expected.pop() {
let output = Formatter::new(format)
.expect(format!("unable to parse format '{}'", format).as_str());
let actual = output.format(vars.as_slice());
let actual = output.format(vars);
assert_eq!(actual, expected);
}
}
Expand All @@ -200,8 +195,6 @@ mod tests {
("2:5}", FormatError::UnopenedPlaceholder),
(r"\{2:5}", FormatError::UnopenedPlaceholder),
(r"{2:5\}", FormatError::UnclosedPlaceholder),
("{{2:5}}", FormatError::InvalidIndex("{2".to_string())),
("{a}", FormatError::InvalidIndex("a".to_string())),
("{2:5a}", FormatError::InvalidPadding("5a".to_string())),
("init {2:5", FormatError::UnclosedPlaceholder),
("init {2:5 end", FormatError::UnclosedPlaceholder),
Expand Down
10 changes: 3 additions & 7 deletions src/input/iterator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ impl InputIterator {
});
for (i, input) in inputs.into_iter().enumerate() {
let index = (i + 1).to_string();
let mut output = formatter.format(vec![input.as_str(), index.as_str()].as_slice());
let mut output = formatter.format(vec![input.as_str(), index.as_str()]);
if preserve_extension {
if let Some(extension) = Path::new(input.as_str()).extension() {
output.push('.');
Expand Down Expand Up @@ -94,14 +94,10 @@ impl Iterator for InputIterator {
};
let path = entry.path();
let input = path.strip_prefix("./").unwrap_or(path).to_string_lossy();
let Some(cap) = re.captures(input.as_ref()) else {
let Some(captures) = re.captures(input.as_ref()) else {
continue;
};
let vars: Vec<&str> = cap
.iter()
.map(|c| c.map(|c| c.as_str()).unwrap_or_default())
.collect();
let mut output = formatter.format(vars.as_slice());
let mut output = formatter.format(captures);
if *preserve_extension {
if let Some(extension) = Path::new(input.as_ref()).extension() {
output.push('.');
Expand Down
40 changes: 40 additions & 0 deletions src/input/provider.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
use regex::Captures;

#[derive(Debug, PartialEq)]
pub enum Capture {
Index(usize),
Name(String),
}

pub trait Provider {
fn provide(&self, cap: &Capture) -> Option<&str>;
}

impl Provider for Captures<'_> {
fn provide(&self, cap: &Capture) -> Option<&str> {
match cap {
Capture::Index(index) => self.get(*index),
Capture::Name(name) => self.name(name.as_str()),
}
.map(|m| m.as_str())
}
}

impl Provider for Vec<&'_ str> {
fn provide(&self, cap: &Capture) -> Option<&str> {
match cap {
Capture::Index(index) => self.get(*index).copied(),
_ => None,
}
}
}

impl From<&str> for Capture {
fn from(value: &str) -> Self {
if let Ok(index) = value.parse() {
Capture::Index(index)
} else {
Capture::Name(value.into())
}
}
}
2 changes: 2 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@ pub mod cli;
pub mod input {
mod formatter;
mod iterator;
mod provider;
mod separator;
mod source;
pub use self::formatter::*;
pub use self::iterator::*;
pub use self::provider::*;
pub use self::separator::*;
pub use self::source::*;
}
Expand Down
45 changes: 45 additions & 0 deletions tests/regex_test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,51 @@ fn test_regex() {
dir.close().unwrap();
}

#[test]
fn test_named_regex() {
let dir = tempfile::tempdir().unwrap();

let inputs = vec![
"Nomino (2020) S1.E1.1080p.mkv",
"Nomino (2020) S1.E2.1080p.mkv",
"Nomino (2020) S1.E3.1080p.mkv",
"Nomino (2020) S1.E4.1080p.mkv",
"Nomino (2020) S1.E5.1080p.mkv",
];

let mut outputs = vec!["01.mkv", "02.mkv", "03.mkv", "04.mkv", "05.mkv"];

for input in inputs {
let _ = File::create(dir.path().join(input)).unwrap();
}

let cmd = Command::cargo_bin(env!("CARGO_PKG_NAME"))
.unwrap()
.args(&[
"-E",
"-d",
dir.path().to_str().unwrap(),
"-r",
r".*E(?<episode>\d+).*",
"{episode:2}.mkv",
])
.unwrap();

let mut files: Vec<String> = read_dir(dir.path())
.unwrap()
.map(|entry| entry.unwrap().file_name().to_str().unwrap().to_string())
.collect();

files.sort();
outputs.sort();

assert!(cmd.status.success());
assert_eq!(files.len(), outputs.len());
assert!(outputs.iter().zip(files.iter()).all(|(a, b)| a == b));

dir.close().unwrap();
}

#[test]
fn test_regex_not_overwrite() {
let dir = tempfile::tempdir().unwrap();
Expand Down

0 comments on commit 3bd3259

Please sign in to comment.