4223
|
1 COMPARISON BETWEEN PERL AND PLOKI REGEXES
|
|
2
|
|
3
|
|
4 General:
|
|
5
|
|
6 ploki doesn't support perl's /cgimosx regex switches:
|
|
7 /c and /g don't make sense in ploki (strings don't have an associated pos());
|
|
8 /i can be emulated by writing \L str ~ "lowercaseregex";
|
|
9 /m and /s aren't needed (^, A!, $, Z! match at beginning-of-string,
|
|
10 beginning-of-line, end-of-string, end-of-line respectively; . matches any
|
|
11 character (use '^\n' if you want any character except newline));
|
|
12 /o is mostly equivalent to ploki's ?o~ operator;
|
|
13 /x is the default in ploki: comments and whitespace are mostly ignored
|
|
14 (except when it's escaped, in a character class, in a quantifier (like "*?"
|
|
15 or ":2,:") or inside "^]", "&]", "?]" and "[number]").
|
|
16
|
|
17
|
|
18
|
|
19 [Perl] [ploki]
|
|
20
|
|
21
|
|
22 Escaping:
|
|
23
|
|
24 \x x!
|
|
25 No word character is special. No escaped non-word character is special.
|
|
26
|
|
27
|
|
28 Alternation:
|
|
29
|
|
30 foo|bar foo|bar
|
|
31
|
|
32
|
|
33 Repetition:
|
|
34
|
|
35 x* x*
|
|
36 x+ x+
|
|
37 x? x?
|
|
38 x*? x*?
|
|
39 x+? x+?
|
|
40 x?? x??
|
|
41 x{n} x:n:
|
|
42 x{n,} x:n,:
|
|
43 x{n,m} x:n,m:
|
|
44
|
|
45
|
|
46 Grouping:
|
|
47
|
|
48 (?:foo) (foo)
|
|
49
|
|
50
|
|
51 Character classes:
|
|
52
|
|
53 [ab0-9\-\]'] 'ab0-9-!]'!'
|
|
54 [^a-z] '^a-z'
|
|
55 . '^\n'
|
|
56 . (with /s) .
|
|
57 [[:alnum:]] '[:alnum:]'
|
|
58
|
|
59 Ploki provides the following POSIXish character classes (inside an ordinary
|
|
60 character class only):
|
|
61 [:alnum:] alphanumeric char
|
|
62 [:alpha:] alphabetic char
|
|
63 [:cntrl:] control char
|
|
64 [:digit:] digit
|
|
65 [:graph:] printable char (except space)
|
|
66 [:lower:] lowercase char
|
|
67 [:print:] printable char (including space)
|
|
68 [:punct:] punctuation char ([:graph:] without [:alnum:])
|
|
69 [:space:] whitespace char
|
|
70 [:upper:] uppercase char
|
|
71 [:xdigit:] hex digit
|
|
72 Every POSIXish subclass [:foo:] can be negated by writing [:^foo:] (this is
|
|
73 compatible with Perl).
|
|
74
|
|
75 In addition there are the following built-in character classes (inside and
|
|
76 outside of user-defined character classes):
|
|
77
|
|
78 q! (equivalent to '[:alpha:]')
|
|
79 c! (equivalent to '[:cntrl:]')
|
|
80 \d d! (equivalent to '[:digit:]')
|
|
81 l! (equivalent to '[:lower:]')
|
|
82 p! (equivalent to '[:print:]')
|
|
83 \s s! (equivalent to '[:space:]') [1]
|
|
84 u! (equivalent to '[:upper:]')
|
|
85 x! (equivalent to '[:xdigit:]')
|
|
86 \w w! (equivalent to '_[:alnum:]')
|
|
87
|
|
88 They can be negated by using the corresponding uppercase letter, e.g. D!
|
|
89 matches a non-digit character.
|
|
90
|
|
91 [1] Perl's \s does not include \v (vertical tab).
|
|
92
|
|
93
|
|
94 Independent groups:
|
|
95
|
|
96 (?>foo) <foo>
|
|
97
|
|
98
|
|
99 Capturing:
|
|
100
|
|
101 (foo) {foo}
|
|
102
|
|
103
|
|
104 Backreferences:
|
|
105
|
|
106 \1, \2, ... 0!, 1!, ...
|
|
107
|
|
108 Note that ploki's backreferences start with 0 *counting from the right*,
|
|
109 i.e. after a successful match against "{{f}o{o}}", \0 is "foo", \1 is "o"
|
|
110 and \2 is "f".
|
|
111
|
|
112
|
|
113 Assertions:
|
|
114
|
|
115 ^ (without /m) ^
|
|
116 \A ^
|
|
117 ^ (with /m) A!
|
|
118 \z $
|
|
119 $ (with /m) Z! (roughly)
|
|
120 \b b!
|
|
121 \B B!
|
|
122 (?=foo) [foo&]
|
|
123 (?!foo) [foo^]
|
|
124 (?<=foo) NOT IMPLEMENTED
|
|
125 (?<!foo) NOT IMPLEMENTED
|
|
126
|
|
127 There's a special assertion that doesn't exist in Perl: [N] succeeds if
|
|
128 N! is set, i.e. if the Nth capturing group succeeded.
|
|
129
|
|
130
|
|
131 Selection:
|
|
132
|
|
133 (?(cond)yes-pattern|no-pattern) [no-pattern|yes-patterncond?]
|
|
134 (?(cond)yes-pattern) [yes-patterncond?]
|
|
135
|
|
136 Note that in ploki "cond" can be any piece: [oof?] matches "foo" if the next
|
|
137 character is "f". Perl's special case of (?(N)yes|no) can be written as
|
|
138 [no|yes[N]?] in ploki; Perl's m{(\()?[^()]+(?(1)\))} corresponds to ploki's
|
|
139 "{(!}?'^()'+[)![0]?]".
|
|
140
|
|
141
|
|
142 Comments:
|
|
143
|
|
144 (?#foo) [foo#]
|
|
145
|
|
146
|
|
147 Debugging:
|
|
148
|
|
149 perl has a very nice module/command-line option: perl -Mre=debug (or perl
|
|
150 -Mre=debugcolor) will show how perl compiles and matches your regexes. ploki
|
|
151 offers a far less powerful feature: ploki -dr displays a human-readable form
|
|
152 of every regex being compiled. It doesn't show how it matches, however. (On
|
|
153 the other hand I think the regex output of ploki -dr is much more readable
|
|
154 than what perl -Mre=debug produces.)
|
|
155
|
|
156
|
|
157 # vi: set tw=76 et:
|