Mercurial > repo
view src/ploki/doc/perl-ploki-regex.txt @ 11559:467d5b4a5976 draft
<oerjan> revert
author | HackEso <hackeso@esolangs.org> |
---|---|
date | Tue, 15 May 2018 02:37:49 +0100 |
parents | ac0403686959 |
children |
line wrap: on
line source
COMPARISON BETWEEN PERL AND PLOKI REGEXES General: ploki doesn't support perl's /cgimosx regex switches: /c and /g don't make sense in ploki (strings don't have an associated pos()); /i can be emulated by writing \L str ~ "lowercaseregex"; /m and /s aren't needed (^, A!, $, Z! match at beginning-of-string, beginning-of-line, end-of-string, end-of-line respectively; . matches any character (use '^\n' if you want any character except newline)); /o is mostly equivalent to ploki's ?o~ operator; /x is the default in ploki: comments and whitespace are mostly ignored (except when it's escaped, in a character class, in a quantifier (like "*?" or ":2,:") or inside "^]", "&]", "?]" and "[number]"). [Perl] [ploki] Escaping: \x x! No word character is special. No escaped non-word character is special. Alternation: foo|bar foo|bar Repetition: x* x* x+ x+ x? x? x*? x*? x+? x+? x?? x?? x{n} x:n: x{n,} x:n,: x{n,m} x:n,m: Grouping: (?:foo) (foo) Character classes: [ab0-9\-\]'] 'ab0-9-!]'!' [^a-z] '^a-z' . '^\n' . (with /s) . [[:alnum:]] '[:alnum:]' Ploki provides the following POSIXish character classes (inside an ordinary character class only): [:alnum:] alphanumeric char [:alpha:] alphabetic char [:cntrl:] control char [:digit:] digit [:graph:] printable char (except space) [:lower:] lowercase char [:print:] printable char (including space) [:punct:] punctuation char ([:graph:] without [:alnum:]) [:space:] whitespace char [:upper:] uppercase char [:xdigit:] hex digit Every POSIXish subclass [:foo:] can be negated by writing [:^foo:] (this is compatible with Perl). In addition there are the following built-in character classes (inside and outside of user-defined character classes): q! (equivalent to '[:alpha:]') c! (equivalent to '[:cntrl:]') \d d! (equivalent to '[:digit:]') l! (equivalent to '[:lower:]') p! (equivalent to '[:print:]') \s s! (equivalent to '[:space:]') [1] u! (equivalent to '[:upper:]') x! (equivalent to '[:xdigit:]') \w w! (equivalent to '_[:alnum:]') They can be negated by using the corresponding uppercase letter, e.g. D! matches a non-digit character. [1] Perl's \s does not include \v (vertical tab). Independent groups: (?>foo) <foo> Capturing: (foo) {foo} Backreferences: \1, \2, ... 0!, 1!, ... Note that ploki's backreferences start with 0 *counting from the right*, i.e. after a successful match against "{{f}o{o}}", \0 is "foo", \1 is "o" and \2 is "f". Assertions: ^ (without /m) ^ \A ^ ^ (with /m) A! \z $ $ (with /m) Z! (roughly) \b b! \B B! (?=foo) [foo&] (?!foo) [foo^] (?<=foo) NOT IMPLEMENTED (?<!foo) NOT IMPLEMENTED There's a special assertion that doesn't exist in Perl: [N] succeeds if N! is set, i.e. if the Nth capturing group succeeded. Selection: (?(cond)yes-pattern|no-pattern) [no-pattern|yes-patterncond?] (?(cond)yes-pattern) [yes-patterncond?] Note that in ploki "cond" can be any piece: [oof?] matches "foo" if the next character is "f". Perl's special case of (?(N)yes|no) can be written as [no|yes[N]?] in ploki; Perl's m{(\()?[^()]+(?(1)\))} corresponds to ploki's "{(!}?'^()'+[)![0]?]". Comments: (?#foo) [foo#] Debugging: perl has a very nice module/command-line option: perl -Mre=debug (or perl -Mre=debugcolor) will show how perl compiles and matches your regexes. ploki offers a far less powerful feature: ploki -dr displays a human-readable form of every regex being compiled. It doesn't show how it matches, however. (On the other hand I think the regex output of ploki -dr is much more readable than what perl -Mre=debug produces.) # vi: set tw=76 et: