Skip to content

ripgrep fails to match pattern including digit character class #1203

@ravron

Description

@ravron

What version of ripgrep are you using?

ripgrep 0.10.0 (rev 8a7db1a918)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

$ brew tap burntsushi/ripgrep https://github.com/BurntSushi/ripgrep.git
$ brew install ripgrep-bin

What operating system are you using ripgrep on?

macOS 10.14.3 (18D109)

Describe your question, feature request, or bug.

rg appears to fail to find a certain pattern in a one-line file that definitely contains that pattern.

I must be missing something — this seems very unlikely to be a legitimate bug — but I can't figure out what.

If this is a bug, what are the steps to reproduce the behavior?

  1. echo 153.230000 >| test.txt
  2. rg '\d\d\d00' test.txt. This successfully finds a match of 23000.
  3. rg '\d\d\d000' test.txt. This fails to find any match, when it should match 230000

Note that grep '\d\d\d000' test.txt correctly matches 230000. (grep --version grep (BSD grep) 2.5.1-FreeBSD)

If this is a bug, what is the actual behavior?

$ echo 153.230000 >| test.txt
$ rg --debug '\d\d\d000' test.txt
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:110: required literal found: "000"
DEBUG|globset|globset/src/lib.rs:429: built glob set; 0 literals, 0 basenames, 8 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:424: glob converted to regex: Glob { glob: "**/.*.s[a-w][a-z]", re: "(?-u)^(?:/?|.*/)\\..*\\.s[a-w][a-z]$", opts: GlobOptions { case_insensitive: false, literal_separator: false, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('.'), ZeroOrMore, Literal('.'), Literal('s'), Class { negated: false, ranges: [('a', 'w')] }, Class { negated: false, ranges: [('a', 'z')] }]) }
DEBUG|globset|globset/src/lib.rs:429: built glob set; 0 literals, 3 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 1 regexes
$

If this is a bug, what is the expected behavior?

rg '\d\d\d000' test.txt should identify the single match in the file, as grep does. Specifically:

$ rg '\d\d\d000' test.txt
1:153.230000

Other

Note that changing the corpus in seemingly irrelevant ways can cause the bug to change or disappear. For example, the \d\d\d000 pattern matches if three 0 characters are prepended to the contents of the file (that is, the file contains 000153.230000).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions