-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
match bug #1319
Copy link
Copy link
Closed
Labels
bugA bug.A bug.
Description
$ rg --version
ripgrep 11.0.1 (rev 7bf7ceb5d3)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)
This matches:
$ echo 'CCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAGTTC' | egrep 'CCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAG[ATCG]{2}C'
CCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAGTTC
But this doesn't:
$ echo 'CCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAGTTC' | rg 'CCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAG[ATCG]{2}C'
To minimize, this doesn't match:
$ rg 'TTGAGTCCAGGAG[ATCG]{2}C' /tmp/subject
But this does:
$ rg 'TGAGTCCAGGAG[ATCG]{2}C' /tmp/subject
1:CCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAGTTC
The only difference between the latter two is that the latter removes the first
T from the regex.
From inspecting the --trace output, I note that from the former regex, it
says this:
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:105: required literals found: [Complete(TTGAGTCCAGGAGAC), Complete(TTGAGTCCAGGAGCC), Complete(TTGAGTCCAGGAGGC), Complete(TTGAGTCCAGGAGTC)]
TRACE|grep_regex::matcher|grep-regex/src/matcher.rs:52: extracted fast line regex: (?-u:TTGAGTCCAGGAGAC|TTGAGTCCAGGAGCC|TTGAGTCCAGGAGGC|TTGAGTCCAGGAGTC)
But in the latter regex (the one that works), we have this:
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:59: literal prefixes detected: Literals { lits: [Complete(TGAGTCCAGGAGAAC), Complete(TGAGTCCAGGAGCAC), Complete(TGAGTCCAGGAGGAC), Complete(TGAGTCC
AGGAGTAC), Complete(TGAGTCCAGGAGACC), Complete(TGAGTCCAGGAGCCC), Complete(TGAGTCCAGGAGGCC), Complete(TGAGTCCAGGAGTCC), Complete(TGAGTCCAGGAGAGC), Complete(TGAGTCCAGGAGCGC), Complete(TGAGTCCAGGAGGGC)
, Complete(TGAGTCCAGGAGTGC), Complete(TGAGTCCAGGAGATC), Complete(TGAGTCCAGGAGCTC), Complete(TGAGTCCAGGAGGTC), Complete(TGAGTCCAGGAGTTC)], limit_size: 250, limit_class: 10 }
Therefore, this is almost certainly a bug in literal extraction. Moreover,
this Rust program correctly prints true:
fn main() {
let pattern = r"CCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAG[ATCG]{2}C";
let haystack = "CCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAGTTC";
let re = regex::Regex::new(pattern).unwrap();
println!("{:?}", re.is_match(haystack));
}Which points the finger at grep-regex's inner literal extraction. Sigh.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugA bug.A bug.