view src/ploki/doc/perl-ploki-regex.txt @ 11340:77399ae45cb1

<wob_jonas> slashlearn peace witch//Peace witches do alchemy: they turn mundane building material to gold. They\'re in the same universe where Bowser turned peaceful citizens of the Mushroom Kingdom to building material.
author HackBot
date Tue, 06 Feb 2018 23:37:00 +0000
parents ac0403686959
children
line wrap: on
line source

COMPARISON BETWEEN PERL AND PLOKI REGEXES


General:

ploki doesn't support perl's /cgimosx regex switches:
/c and /g don't make sense in ploki (strings don't have an associated pos());
/i can be emulated by writing \L str ~ "lowercaseregex";
/m and /s aren't needed (^, A!, $, Z! match at beginning-of-string,
beginning-of-line, end-of-string, end-of-line respectively; . matches any
character (use '^\n' if you want any character except newline));
/o is mostly equivalent to ploki's ?o~ operator;
/x is the default in ploki: comments and whitespace are mostly ignored
(except when it's escaped, in a character class, in a quantifier (like "*?"
or ":2,:") or inside "^]", "&]", "?]" and "[number]").



[Perl]                              [ploki]


Escaping:

\x                                  x!
No word character is special. No escaped non-word character is special.


Alternation:

foo|bar                             foo|bar


Repetition:

x*                                  x*
x+                                  x+
x?                                  x?
x*?                                 x*?
x+?                                 x+?
x??                                 x??
x{n}                                x:n:
x{n,}                               x:n,:
x{n,m}                              x:n,m:


Grouping:

(?:foo)                             (foo)


Character classes:

[ab0-9\-\]']                        'ab0-9-!]'!'
[^a-z]                              '^a-z'
.                                   '^\n'
.  (with /s)                        .
[[:alnum:]]                         '[:alnum:]'

Ploki provides the following POSIXish character classes (inside an ordinary
character class only):
[:alnum:]    alphanumeric char
[:alpha:]    alphabetic char
[:cntrl:]    control char
[:digit:]    digit
[:graph:]    printable char (except space)
[:lower:]    lowercase char
[:print:]    printable char (including space)
[:punct:]    punctuation char ([:graph:] without [:alnum:])
[:space:]    whitespace char
[:upper:]    uppercase char
[:xdigit:]   hex digit
Every POSIXish subclass [:foo:] can be negated by writing [:^foo:] (this is
compatible with Perl).

In addition there are the following built-in character classes (inside and
outside of user-defined character classes):

                                     q!   (equivalent to '[:alpha:]')
                                     c!   (equivalent to '[:cntrl:]')
\d                                   d!   (equivalent to '[:digit:]')
                                     l!   (equivalent to '[:lower:]')
                                     p!   (equivalent to '[:print:]')
\s                                   s!   (equivalent to '[:space:]') [1]
                                     u!   (equivalent to '[:upper:]')
                                     x!   (equivalent to '[:xdigit:]')
\w                                   w!   (equivalent to '_[:alnum:]')

They can be negated by using the corresponding uppercase letter, e.g. D!
matches a non-digit character.

[1] Perl's \s does not include \v (vertical tab).


Independent groups:

(?>foo)                              <foo>


Capturing:

(foo)                                {foo}


Backreferences:

\1, \2, ...                          0!, 1!, ...

Note that ploki's backreferences start with 0 *counting from the right*,
i.e. after a successful match against "{{f}o{o}}", \0 is "foo", \1 is "o"
and \2 is "f".


Assertions:

^  (without /m)                      ^
\A                                   ^
^  (with /m)                         A!
\z                                   $
$  (with /m)                         Z!  (roughly)
\b                                   b!
\B                                   B!
(?=foo)                              [foo&]
(?!foo)                              [foo^]
(?<=foo)                             NOT IMPLEMENTED
(?<!foo)                             NOT IMPLEMENTED

There's a special assertion that doesn't exist in Perl: [N] succeeds if
N! is set, i.e. if the Nth capturing group succeeded.


Selection:

(?(cond)yes-pattern|no-pattern)      [no-pattern|yes-patterncond?]
(?(cond)yes-pattern)                 [yes-patterncond?]

Note that in ploki "cond" can be any piece: [oof?] matches "foo" if the next
character is "f". Perl's special case of (?(N)yes|no) can be written as
[no|yes[N]?] in ploki; Perl's m{(\()?[^()]+(?(1)\))} corresponds to ploki's
"{(!}?'^()'+[)![0]?]".


Comments:

(?#foo)                              [foo#]


Debugging:

perl has a very nice module/command-line option: perl -Mre=debug (or perl
-Mre=debugcolor) will show how perl compiles and matches your regexes. ploki
offers a far less powerful feature: ploki -dr displays a human-readable form
of every regex being compiled. It doesn't show how it matches, however. (On
the other hand I think the regex output of ploki -dr is much more readable
than what perl -Mre=debug produces.)


# vi: set tw=76 et: