Mercurial > repo

COMPARISON BETWEEN PERL AND PLOKI REGEXES


General:

ploki doesn't support perl's /cgimosx regex switches:
/c and /g don't make sense in ploki (strings don't have an associated pos());
/i can be emulated by writing \L str ~ "lowercaseregex";
/m and /s aren't needed (^, A!, $, Z! match at beginning-of-string,
beginning-of-line, end-of-string, end-of-line respectively; . matches any
character (use '^\n' if you want any character except newline));
/o is mostly equivalent to ploki's ?o~ operator;
/x is the default in ploki: comments and whitespace are mostly ignored
(except when it's escaped, in a character class, in a quantifier (like "*?"
or ":2,:") or inside "^]", "&]", "?]" and "[number]").


[Perl]                              [ploki]


Escaping:

\x                                  x!
No word character is special. No escaped non-word character is special.


Alternation:

foo|bar                             foo|bar


Repetition:

x*                                  x*
x+                                  x+
x?                                  x?
x*?                                 x*?
x+?                                 x+?
x??                                 x??
x{n}                                x:n:
x{n,}                               x:n,:
x{n,m}                              x:n,m:


Grouping:

(?:foo)                             (foo)


Character classes:

[ab0-9\-\]']                        'ab0-9-!]'!'
[^a-z]                              '^a-z'
.                                   '^\n'
.  (with /s)                        .
[[:alnum:]]                         '[:alnum:]'

Ploki provides the following POSIXish character classes (inside an ordinary
character class only):
[:alnum:]    alphanumeric char
[:alpha:]    alphabetic char
[:cntrl:]    control char
[:digit:]    digit
[:graph:]    printable char (except space)
[:lower:]    lowercase char
[:print:]    printable char (including space)
[:punct:]    punctuation char ([:graph:] without [:alnum:])
[:space:]    whitespace char
[:upper:]    uppercase char
[:xdigit:]   hex digit
Every POSIXish subclass [:foo:] can be negated by writing [:^foo:] (this is
compatible with Perl).

In addition there are the following built-in character classes (inside and
outside of user-defined character classes):

                                     q!   (equivalent to '[:alpha:]')
                                     c!   (equivalent to '[:cntrl:]')
\d                                   d!   (equivalent to '[:digit:]')
                                     l!   (equivalent to '[:lower:]')
                                     p!   (equivalent to '[:print:]')
\s                                   s!   (equivalent to '[:space:]') [1]
                                     u!   (equivalent to '[:upper:]')
                                     x!   (equivalent to '[:xdigit:]')
\w                                   w!   (equivalent to '_[:alnum:]')

They can be negated by using the corresponding uppercase letter, e.g. D!
matches a non-digit character.

[1] Perl's \s does not include \v (vertical tab).


Independent groups:

(?>foo)                              <foo>


Capturing:

(foo)                                {foo}


Backreferences:

\1, \2, ...                          0!, 1!, ...

Note that ploki's backreferences start with 0 *counting from the right*,
i.e. after a successful match against "{{f}o{o}}", \0 is "foo", \1 is "o"
and \2 is "f".


Assertions:

^  (without /m)                      ^
\A                                   ^
^  (with /m)                         A!
\z                                   $
$  (with /m)                         Z!  (roughly)
\b                                   b!
\B                                   B!
(?=foo)                              [foo&]
(?!foo)                              [foo^]
(?<=foo)                             NOT IMPLEMENTED
(?<!foo)                             NOT IMPLEMENTED

There's a special assertion that doesn't exist in Perl: [N] succeeds if
N! is set, i.e. if the Nth capturing group succeeded.


Selection:

(?(cond)yes-pattern|no-pattern)      [no-pattern|yes-patterncond?]
(?(cond)yes-pattern)                 [yes-patterncond?]

Note that in ploki "cond" can be any piece: [oof?] matches "foo" if the next
character is "f". Perl's special case of (?(N)yes|no) can be written as
[no|yes[N]?] in ploki; Perl's m{(\()?[^()]+(?(1)\))} corresponds to ploki's
"{(!}?'^()'+[)![0]?]".


Comments:

(?#foo)                              [foo#]


Debugging:

perl has a very nice module/command-line option: perl -Mre=debug (or perl
-Mre=debugcolor) will show how perl compiles and matches your regexes. ploki
offers a far less powerful feature: ploki -dr displays a human-readable form
of every regex being compiled. It doesn't show how it matches, however. (On
the other hand I think the regex output of ploki -dr is much more readable
than what perl -Mre=debug produces.)


# vi: set tw=76 et:
author	HackBot
date	Sun, 15 May 2016 13:14:57 +0000
parents	ac0403686959
children