Skip to content

Regexes using Extended Bracketed Character Classes fail to compile when interpolated and using the /x modifier #24238

@Aequitosh

Description

@Aequitosh

While messing around with Extended Bracketed Character Classes1 I got an unexpected compilation error when using the /x (or /xx) modifier on a regex using an EBCC into which other regexes using EBCCs are interpolated, e.g. for performing set operations. Probably easier to demonstrate this with a reproducer:

use v5.36;

my $RE_CLASS_LOWER = qr/(?[ [a-z] ])/;

my $RE_CLASS_UPPER = qr/(?[ [A-Z] ])/;

my $RE_CLASS_DIGIT = qr/(?[ [0-9] ])/;

my $RE_CLASS_ALPHANUM = qr/(?[ $RE_CLASS_LOWER + $RE_CLASS_UPPER + $RE_CLASS_DIGIT ])/x;

This fails to compile (using Perl v5.40.1 shipped with Debian):

$ perl -w character-classes.pl
Operand with no preceding operator in regex; marked by <-- HERE in m/(?[ (?^u:(?[ [a-z] ])) + (?^u:(?[ [A-Z] ]))  <-- HERE + (?^u:(?[ [0-9] ])) ])/ at character-classes.pl line 9.

Dropping the /x modifier makes it work:

my $RE_CLASS_ALPHANUM = qr/(?[ $RE_CLASS_LOWER + $RE_CLASS_UPPER + $RE_CLASS_DIGIT ])/;

Maybe some extra info that could be useful—using the /x or /xx modifiers works on the regexes being interpolated:

use v5.36;

my $RE_CLASS_LOWER = qr/(?[ [a-z] ])/;

my $RE_CLASS_UPPER = qr/
    (?[ [A-Z] ]) # This works just fine
/x;

my $RE_CLASS_DIGIT = qr/
    (?[ [0-9] ]) # This here as well
/xx;

my $RE_CLASS_ALPHANUM = qr/(?[ $RE_CLASS_LOWER + $RE_CLASS_UPPER + $RE_CLASS_DIGIT ])/;

This regex is printed as:

(?^u:(?[ (?^u:(?[ [a-z] ])) + (?^ux:
    (?[ [A-Z] ]) # This works just fine
) + (?^uxx:
    (?[ [0-9] ]) # This here as well
) ]))

With the /x modifier, this fails to compile:

Operand with no preceding operator in regex; marked by <-- HERE in m/(?[ (?^u:(?[ [a-z] ])) + (?^ux:
    (?[ [A-Z] ]) # This works just fine
)  <-- HERE + (?^uxx:
    (?[ [0-9] ]) # This here as well
) ])/ at character-classes.pl line 13.

There's a subtle difference here—in the compilation error, the expression doesn't seem to be enclosed in (?^u:( ... ), from what I can tell. Perhaps that's what's causing it to fail..?

Anyhow, I figured I'd report this since I haven't seen this behavior mentioned anywhere in the docs, only that the /xx modifier is automatically turned on within an EBCC construct.

Footnotes

  1. https://perldoc.perl.org/perlrecharclass#Extended-Bracketed-Character-Classes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions