In #104512 we made pathlib.Path.glob() use a "walk-and-filter" strategy for expanding ** wildcards in patterns: when we encounter a ** segment, we immediately consume subsequent segments and use them to build a regex that is used to filter results. This saves a bunch of scandir() calls.
However! We actually build a regex for the entire pattern given to glob(), rather than just the segments following ** wildcards. And so when evaluating a pattern like dir*/**/file*, the dir* part is needlessly matched twice against each path. @zooba noted this in a review comment at the time.
We should be able to improve performance by building an re.Pattern only for segments following ** wildcards, and not the entire glob() pattern.
Linked PRs
In #104512 we made
pathlib.Path.glob()use a "walk-and-filter" strategy for expanding**wildcards in patterns: when we encounter a**segment, we immediately consume subsequent segments and use them to build a regex that is used to filter results. This saves a bunch ofscandir()calls.However! We actually build a regex for the entire pattern given to
glob(), rather than just the segments following**wildcards. And so when evaluating a pattern likedir*/**/file*, thedir*part is needlessly matched twice against each path. @zooba noted this in a review comment at the time.We should be able to improve performance by building an
re.Patternonly for segments following**wildcards, and not the entireglob()pattern.Linked PRs
pathlib.Path.glob()by removing redundant regex matching #115061pathlib.Path.glob()by skipping directory scanning #116152pathlib.Path.glob()by not scanning literal parts #117732pathlib.Path.glob()by omitting initialstat()#117831