Skip to content

GH-126363: Speed up pattern parsing in pathlib.Path.glob() #126364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 4, 2024

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Nov 3, 2024

The implementation of Path.glob() does rather a hacky thing: it calls self.with_segments() to convert the given pattern to a Path object, and then peeks at the private _raw_path attribute to see if pathlib removed a trailing slash from the pattern.

In this patch, we make glob() use a new _parse_pattern() classmethod that splits the pattern into parts while preserving information about any trailing slash. This skips the cost of creating a Path object, and avoids some path anchor normalization, which makes Path.glob() slightly faster. But mostly it's about making the code less naughty.

This makes a no-match glob ~50% faster:

$ ./python -m timeit -s "import pathlib; p = pathlib.Path()" "list(p.glob('nope'))" 
50000 loops, best of 5: 8.3 usec per loop  # before
50000 loops, best of 5: 5.3 usec per loop  # after

The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.
@barneygale barneygale merged commit 9b7294c into python:main Nov 4, 2024
36 checks passed
picnixz pushed a commit to picnixz/cpython that referenced this pull request Dec 8, 2024
…ython#126364)

The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.

Co-authored-by: Tomas R. <[email protected]>
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
…ython#126364)

The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.

Co-authored-by: Tomas R. <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants