PyTA Project: Editing pre-commit hook configuration

·

3 min read

Pre-commit hooks are checkers that automatically fix code style issues before commiting the code. Instead of reporting the style errors to users (like PythonTA), they normally do not display the specific style rules being used and just directly refactor the code without confirmation from users. However, in PythonTA project, the unit tests are codes that are intended to exhibit certain style errors, and should not be "fixed" by pre-commit hooks. Thus, I need to modify the pre-commit configuration to exclude the test files from pre-commit checking.

The pre-commit configuration

.pre-commit-config.yaml is the configuration file for pre-commit hooks. It specifies the hooks being used and their versions, as well as files to be included and excluded from the checking. A full list of configurations can be found in the documentation (https://pre-commit.com/#plugins)

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: check-yaml
      - id: end-of-file-fixer
      - id: trailing-whitespace
  - repo: https://github.com/PyCQA/isort
    rev: 5.13.2
    hooks:
      - id: isort
  - repo: https://github.com/psf/black-pre-commit-mirror
    rev: 24.3.0
    hooks:
      - id: black
        args: [--safe, --quiet]
  - repo: https://github.com/pre-commit/mirrors-prettier
    rev: v4.0.0-alpha.8
    hooks:
      - id: prettier
exclude: examples

ci:
  autoupdate_schedule: quarterly

In the exisitng configuration, examples , the directory that stores the test cases, has been added to exclude . However, in my latest modification, files under the directory tests/fixtures/sample_dir also need to be ignored by pre-commit hooks. Thus, I need to add this path to exclude as well.

However, when I try to assign a list to exclude , both examples and sample_dir are then being not excluded from the checks. Refering to the documentation (https://pre-commit.com/#plugins), it turns out exclude can only accept a single string, and I can use regular expression to match multiple directories.

Regular Expression

Regular expression is a sequence of characters that form a search pattern. It contains these metacharacters used to match a specific pattern:

  • [] : match a set of characters, for example [a-m] matches any character from "a" to "m"

  • \ : escape character, or matching for specific sequences. Commonly used are \d (match a digit), \w (match a letter, digit, or _), \s (match for white space)

  • . : match any character

  • ^ : match the beginning of string

  • $ : match the end of string

  • * : match zero or more occurances

  • + : match one or more occurances

  • ? : match zero or one occurances

  • {} : indicate the specific number of occurances

  • | : or

  • () : grouping an expression

In the pre-commit documentation, there is an example that shows how to match multiple files with regular expression:

# ...
    -   id: my-hook
        exclude: |
            (?x)^(
                path/to/file1.py|
                path/to/file2.py|
                path/to/file3.py
            )$

The beginning pipe in exclude: | indicates a multi-line string. (?x) is a verbose flag that allows inline comments in the regular expression. ^ and $ before and after the brackets matches beginning and end of the string. Finally, inside the brackets different file names are separated with or operator | . In summary, this pattern matches files whose full paths that matches one of the paths listed.

In our case, we need to make a slight modification on the regular expression, since we are not matching for specific files but all files under given directories. Thus, the $ in the end need to be omited so that we are not just matching the specific directory but sub-directories and files within.

exclude: |
  (?x)^(
      examples|
      tests/fixtures/sample_dir
  )

Did you find this article valuable?

Support Raine's blog by becoming a sponsor. Any amount is appreciated!