Speed up `pathlib.Path.glob()` by removing redundant regex matching

In #104512 we made `pathlib.Path.glob()` use a "walk-and-filter" strategy for expanding `**` wildcards in patterns: when we encounter a `**` segment, we immediately consume subsequent segments and use them to build a regex that is used to filter results. This saves a bunch of `scandir()` calls.

However! We actually build a regex for the _entire_ pattern given to `glob()`, rather than just the segments following `**` wildcards. And so when evaluating a pattern like `dir*/**/file*`, the `dir*` part is needlessly matched twice against each path. @zooba noted this in a [review comment](https://github.com/python/cpython/pull/104512#discussion_r1212825322) at the time.

We should be able to improve performance by building an `re.Pattern` only for segments following `**` wildcards, and not the entire `glob()` pattern.


### Linked PRs
* gh-115061
* gh-116152
* gh-117732
* gh-117831

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up `pathlib.Path.glob()` by removing redundant regex matching #115060

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Speed up pathlib.Path.glob() by removing redundant regex matching #115060

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Speed up `pathlib.Path.glob()` by removing redundant regex matching #115060