Mercurial > repo
view src/ploki/doc/perl-ploki-regex.txt @ 8065:591b1467ccdf
<int-e> le/rn paste/"Paste" is a short story by Henry James. Its contents has been cut into pieces and distributed over numerous tin boxes on the World Wide Web, little pearls of wisdom buried among ordinary pastes.
author | HackBot |
---|---|
date | Sun, 15 May 2016 13:14:57 +0000 |
parents | ac0403686959 |
children |
line wrap: on
line source
COMPARISON BETWEEN PERL AND PLOKI REGEXES General: ploki doesn't support perl's /cgimosx regex switches: /c and /g don't make sense in ploki (strings don't have an associated pos()); /i can be emulated by writing \L str ~ "lowercaseregex"; /m and /s aren't needed (^, A!, $, Z! match at beginning-of-string, beginning-of-line, end-of-string, end-of-line respectively; . matches any character (use '^\n' if you want any character except newline)); /o is mostly equivalent to ploki's ?o~ operator; /x is the default in ploki: comments and whitespace are mostly ignored (except when it's escaped, in a character class, in a quantifier (like "*?" or ":2,:") or inside "^]", "&]", "?]" and "[number]"). [Perl] [ploki] Escaping: \x x! No word character is special. No escaped non-word character is special. Alternation: foo|bar foo|bar Repetition: x* x* x+ x+ x? x? x*? x*? x+? x+? x?? x?? x{n} x:n: x{n,} x:n,: x{n,m} x:n,m: Grouping: (?:foo) (foo) Character classes: [ab0-9\-\]'] 'ab0-9-!]'!' [^a-z] '^a-z' . '^\n' . (with /s) . [[:alnum:]] '[:alnum:]' Ploki provides the following POSIXish character classes (inside an ordinary character class only): [:alnum:] alphanumeric char [:alpha:] alphabetic char [:cntrl:] control char [:digit:] digit [:graph:] printable char (except space) [:lower:] lowercase char [:print:] printable char (including space) [:punct:] punctuation char ([:graph:] without [:alnum:]) [:space:] whitespace char [:upper:] uppercase char [:xdigit:] hex digit Every POSIXish subclass [:foo:] can be negated by writing [:^foo:] (this is compatible with Perl). In addition there are the following built-in character classes (inside and outside of user-defined character classes): q! (equivalent to '[:alpha:]') c! (equivalent to '[:cntrl:]') \d d! (equivalent to '[:digit:]') l! (equivalent to '[:lower:]') p! (equivalent to '[:print:]') \s s! (equivalent to '[:space:]') [1] u! (equivalent to '[:upper:]') x! (equivalent to '[:xdigit:]') \w w! (equivalent to '_[:alnum:]') They can be negated by using the corresponding uppercase letter, e.g. D! matches a non-digit character. [1] Perl's \s does not include \v (vertical tab). Independent groups: (?>foo) <foo> Capturing: (foo) {foo} Backreferences: \1, \2, ... 0!, 1!, ... Note that ploki's backreferences start with 0 *counting from the right*, i.e. after a successful match against "{{f}o{o}}", \0 is "foo", \1 is "o" and \2 is "f". Assertions: ^ (without /m) ^ \A ^ ^ (with /m) A! \z $ $ (with /m) Z! (roughly) \b b! \B B! (?=foo) [foo&] (?!foo) [foo^] (?<=foo) NOT IMPLEMENTED (?<!foo) NOT IMPLEMENTED There's a special assertion that doesn't exist in Perl: [N] succeeds if N! is set, i.e. if the Nth capturing group succeeded. Selection: (?(cond)yes-pattern|no-pattern) [no-pattern|yes-patterncond?] (?(cond)yes-pattern) [yes-patterncond?] Note that in ploki "cond" can be any piece: [oof?] matches "foo" if the next character is "f". Perl's special case of (?(N)yes|no) can be written as [no|yes[N]?] in ploki; Perl's m{(\()?[^()]+(?(1)\))} corresponds to ploki's "{(!}?'^()'+[)![0]?]". Comments: (?#foo) [foo#] Debugging: perl has a very nice module/command-line option: perl -Mre=debug (or perl -Mre=debugcolor) will show how perl compiles and matches your regexes. ploki offers a far less powerful feature: ploki -dr displays a human-readable form of every regex being compiled. It doesn't show how it matches, however. (On the other hand I think the regex output of ploki -dr is much more readable than what perl -Mre=debug produces.) # vi: set tw=76 et: