Finder#

class Finder(root, pattern, use_regex=False, scan_everything=False)#

Bases: object

Find files using a filename pattern.

The Finder object is the main entrance point to this library. Given a root directory and a filename pattern, it can search for all corresponding files.

Parameters:
  • root (str) – The root directory of the filetree where all files can be found.

  • pattern (str) – The filename pattern. See Pattern for details.

  • use_regex (bool) – If True, characters outside of groups are considered as valid regex (and not escaped). Default is False.

  • scan_everything (bool) – If true, look into all sub-directories up to a depth of max_scan_depth . This is appropriate if the pattern contains optional sub-directories. If false (default), check that every sub-directory matches its part of the regular expression, thus avoiding some work.

_add_file(filename, pattern)#

Add file to cache if it matches pattern and pass filters.

Parameters:
_find_files_scan_everything()#

Find files in all sub-directories.

Because having to check if a sub-directory matches the pattern is difficult, this allows for more exotic patterns where a folder separator can appear in a capturing group, by example for optional sub-directories.

This will scan the whole filetree under root and check every file found, which can be significant work in some cases.

Return type:

None

_find_files_subdirectories()#

Find files checking sub-directories along the way.

Each sub-directory must match against its corresponding part of the generated regular expression. This is ill suited if any group contains a folder separator. But it will limit the number of sub-directories to explore and thus the number of files to check.

Return type:

None

_find_groups(pattern)#

Find the groups within the pattern and their corresponding string indices.

  • The returned indices should be sorted in order of appearance in the pattern.

  • The indices should correspond to the first and last character of the group, including the delimiter characters.

  • On the contrary, the string specification of the group should not include them.

This implementation finds the matching pair defined by the attribute _group_delimiters. A match of the start of a group that does not have a matching end will raise.

Parameters:

pattern (str)

Return type:

list[tuple[str, int, int]]

_void_cache()#

Clear the cache.

Return type:

None

add_filter(func, **kwargs)#

Add a filter with which to select scanned files.

The filter will be applied to files already in the cache.

See Filtering for details.

Parameters:
  • func (Callable[[Finder, str, Matches, ...], bool]) – Callable that returns True if the file is to be kept, False otherwise.

  • kwargs (Any) – Will be passed to the function when executed.

clear_filters()#

Remove all filters.

Return type:

None

find_files()#

Find files to scan and store them in cache.

Is automatically called when accessing files or get_files(). Apply all filters and sort files alphabetically.

Return type:

None

find_matches(filename, relative=True)#

Alias for get_matches().

Parameters:
Return type:

Matches | None

fix_by_filter(key, func, fix_discard=False, default_date=None, pass_unparsed=False, **kwargs)#

Fix a group value by using a filter function.

When a file is scanned, if it matches the pattern, it will only be kept if func returns True when called with the group parsed value. If the group cannot parse the value, if pass_unparse is True the unparsed string will be passed to the predicate function nonetheless, otherwise it will not keep the file (default).

This adds a filter (see add_filter()) with a name consisting of the key and a unique id (this allows multiple filters for a single group).

Parameters:
  • key (int | str) – Can be the index of a group in the pattern (starts at 0), or the name of a group. If multiple groups share the same name, they are all fixed.

  • func (Callable[[...], bool]) – A function that takes the parsed value of the group and returns True if the corresponding file should be kept, or False otherwise. If multiple groups correspond to the key, all values will be tested succesively.

  • fix_discard (bool) – If True, also use groups values with the discard flag. Default is False.

  • pass_unparsed (bool) – In case the group cannot parse the string, if True pass the unparsed string to the predicate function func anyway. If False (default) the file will not be kept.

  • default_date (datetime | Mapping[str, int] | None) – Passed to library.get_date() if key is “date”.

  • kwargs – Will be passed to the function.

fix_group(key, value, fix_discard=False)#

Fix a group to a string.

This will void the cache.

Parameters:
  • key (int | str) – Can be the index of a group in the pattern (starts at 0), or the name of a group. If multiple groups share the same name, they are all fixed to the same value.

  • value (str | Any) – Can be a string, or a value that will be formatted using the group format string. A string will be interpreted as a regular expression, so all special characters should be properly escaped. A list of values will be joined by the regex ‘|’ OR.

  • fix_discard (bool) – If True, groups with the ‘discard’ option will still be fixed. Default is False.

fix_groups(fixes=None, fix_discard=False, **fixes_kw)#

Fix multiple groups at once.

Parameters:
  • fixes (dict[Any, str | Any] | None) – Dictionnary of {group key: value}. See fix_group() for details.

  • fix_discard (bool) – If True, groups with the ‘discard’ option will still be fixed. Default is False.

  • fixes_kw (str | Any) – Same as fixes. Takes precedence.

get_absolute(filename)#

Concatenate the finder root directory and a filename.

Parameters:

filename (str)

Return type:

str

get_files(relative=False, nested=None)#

Return files that matches the regex.

Lazily scan files: if files were already scanned, just return the stored list of files.

Parameters:
  • relative (bool) – If True, filenames are returned relative to the finder root directory. If not, paths are absolute (default).

  • nested (Sequence[str | Sequence[str]] | None) – If not None, return nested list of filenames with each level corresponding to a group, or set of group. Last set in the list is at the innermost level.

Raises:

KeyError – A group name in nested is not found in the pattern.

Return type:

list

get_group_names(fixed=None)#

Get the names of groups in the pattern.

Parameters:

fixed (bool | None) – If True, only return names of groups with a fixed value. If False, return only those without a fixed value. If None (default), return for all groups.

Return type:

set[str]

get_groups(key)#

Return list of groups corresponding to key.

If date_is_first_class is True, for the key ‘date’ return all time related groups.

Parameters:

key (int, str, or list of int) – Can be group index or name.

Returns:

List of groups corresponding to key.

Raises:
Return type:

list[Group]

get_matches(filename, relative=True)#

Find matches for a given filename.

Apply regex to filename and return the results as a Matches object. Fixed values are applied as normal.

Parameters:
  • filename (str) – Filename to retrieve matches from.

  • relative (bool) – True if the filename is relative to the finder root directory (default). If False, the filename is made relative before being matched.

Returns:

matches – A Matches object, or None if the filename did not match.

Return type:

Matches | None

get_pattern()#

Get filename pattern.

Return type:

str

get_regex()#

Return regex.

Return type:

str

get_regex_subdirs()#

Return regexes for each sub-directory.

Return type:

list[str]

get_relative(filename)#

Get filename path relative to root.

Parameters:

filename (str)

Return type:

str

make_filename(fixes=None, relative=False, **kw_fixes)#

Return a filename.

Replace groups with provided values. All groups must be fixed prior, or with fixes argument.

Only works if use_regex is set to False (default).

Parameters:
  • fixes (dict | None) – Dictionnary of fixes (group name or index: value). For details, see fix_group(). Will (temporarily) supplant group fixed prior. If prior fix is a list, first item will be used.

  • relative (bool) – If the filename should be relative to the finder root directory. Default is False.

  • kw_fixes (Any) – Same as fixes. Takes precedence.

Raises:

ValueErroruse_regex is activated.

Return type:

str

set_pattern(pattern)#

Set pattern and parse for group objects.

Parameters:

pattern (str)

set_scan_everything(scan_everything, /)#

Set value for attribute scan_everything.

Void cache if necessary.

Parameters:

scan_everything (bool)

Return type:

None

set_use_regex(use_regex, /)#

Set value for attribute use_regex.

Parameters:

use_regex (bool)

Return type:

None

unfix_groups(*keys)#

Unfix groups, and remove group related filters.

This will void the cache.

Parameters:

keys (int | str) – Keys to find groups to unfix. See get_groups(). If no key is provided, all groups will be unfixed.

_group_delimiters: tuple[str, str, str] = ('%', '(', ')')#

Delimiter characters of groups in the pattern.

Tuple of (prefix, start characters, end characters). Start and end character must be balanced within the group. Prefix can be empty.

_segments: list[str]#

Segments of the pattern. Used to replace specific groups. [‘text before group 1’, ‘group 1’, ‘text before group 2, ‘group 2’, …, ‘text after last group’]

date_is_first_class: bool = True#

If True, the group name ‘date’ is considered special.

property files: list[tuple[str, Matches]]#

List of filenames and their matches.

Will scan files when accessed and cache the result, if it has not already been done.

filters: FilterList#

List of filters to apply to found files.

max_scan_depth: int = 32#

Maximum sub-directory depth to scan when scan_everything is True.

property n_groups: int#

Number of groups in pre-regex.

root: str#

The root directory of the finder.

scan_everything: bool#

Whether to scan all subdirectories.

scanned: bool#

True if files have been scanned with current parameters.

Is reset to False if the cache (of scanned files) is voided, for instance by operation like changing fixed values of groups.

use_regex: bool#

If True, characters outside of groups are considered as valid regex (and not escaped). Default is False.