-
Notifications
You must be signed in to change notification settings - Fork 612
Description
While messing around with Extended Bracketed Character Classes1 I got an unexpected compilation error when using the /x (or /xx) modifier on a regex using an EBCC into which other regexes using EBCCs are interpolated, e.g. for performing set operations. Probably easier to demonstrate this with a reproducer:
use v5.36;
my $RE_CLASS_LOWER = qr/(?[ [a-z] ])/;
my $RE_CLASS_UPPER = qr/(?[ [A-Z] ])/;
my $RE_CLASS_DIGIT = qr/(?[ [0-9] ])/;
my $RE_CLASS_ALPHANUM = qr/(?[ $RE_CLASS_LOWER + $RE_CLASS_UPPER + $RE_CLASS_DIGIT ])/x;This fails to compile (using Perl v5.40.1 shipped with Debian):
$ perl -w character-classes.pl
Operand with no preceding operator in regex; marked by <-- HERE in m/(?[ (?^u:(?[ [a-z] ])) + (?^u:(?[ [A-Z] ])) <-- HERE + (?^u:(?[ [0-9] ])) ])/ at character-classes.pl line 9.
Dropping the /x modifier makes it work:
my $RE_CLASS_ALPHANUM = qr/(?[ $RE_CLASS_LOWER + $RE_CLASS_UPPER + $RE_CLASS_DIGIT ])/;Maybe some extra info that could be useful—using the /x or /xx modifiers works on the regexes being interpolated:
use v5.36;
my $RE_CLASS_LOWER = qr/(?[ [a-z] ])/;
my $RE_CLASS_UPPER = qr/
(?[ [A-Z] ]) # This works just fine
/x;
my $RE_CLASS_DIGIT = qr/
(?[ [0-9] ]) # This here as well
/xx;
my $RE_CLASS_ALPHANUM = qr/(?[ $RE_CLASS_LOWER + $RE_CLASS_UPPER + $RE_CLASS_DIGIT ])/;This regex is printed as:
(?^u:(?[ (?^u:(?[ [a-z] ])) + (?^ux:
(?[ [A-Z] ]) # This works just fine
) + (?^uxx:
(?[ [0-9] ]) # This here as well
) ]))
With the /x modifier, this fails to compile:
Operand with no preceding operator in regex; marked by <-- HERE in m/(?[ (?^u:(?[ [a-z] ])) + (?^ux:
(?[ [A-Z] ]) # This works just fine
) <-- HERE + (?^uxx:
(?[ [0-9] ]) # This here as well
) ])/ at character-classes.pl line 13.
There's a subtle difference here—in the compilation error, the expression doesn't seem to be enclosed in (?^u:( ... ), from what I can tell. Perhaps that's what's causing it to fail..?
Anyhow, I figured I'd report this since I haven't seen this behavior mentioned anywhere in the docs, only that the /xx modifier is automatically turned on within an EBCC construct.