annotate src/ploki/doc/perl-ploki-regex.txt @ 12292:d51f2100210c draft

<kspalaiologos> `` cat <<<"asmbf && bfi output.b" > /hackenv/ibin/asmbf
author HackEso <hackeso@esolangs.org>
date Thu, 02 Jan 2020 15:38:21 +0000
parents ac0403686959
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4223
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
1 COMPARISON BETWEEN PERL AND PLOKI REGEXES
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
2
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
3
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
4 General:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
5
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
6 ploki doesn't support perl's /cgimosx regex switches:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
7 /c and /g don't make sense in ploki (strings don't have an associated pos());
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
8 /i can be emulated by writing \L str ~ "lowercaseregex";
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
9 /m and /s aren't needed (^, A!, $, Z! match at beginning-of-string,
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
10 beginning-of-line, end-of-string, end-of-line respectively; . matches any
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
11 character (use '^\n' if you want any character except newline));
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
12 /o is mostly equivalent to ploki's ?o~ operator;
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
13 /x is the default in ploki: comments and whitespace are mostly ignored
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
14 (except when it's escaped, in a character class, in a quantifier (like "*?"
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
15 or ":2,:") or inside "^]", "&]", "?]" and "[number]").
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
16
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
17
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
18
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
19 [Perl] [ploki]
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
20
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
21
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
22 Escaping:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
23
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
24 \x x!
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
25 No word character is special. No escaped non-word character is special.
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
26
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
27
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
28 Alternation:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
29
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
30 foo|bar foo|bar
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
31
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
32
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
33 Repetition:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
34
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
35 x* x*
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
36 x+ x+
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
37 x? x?
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
38 x*? x*?
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
39 x+? x+?
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
40 x?? x??
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
41 x{n} x:n:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
42 x{n,} x:n,:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
43 x{n,m} x:n,m:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
44
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
45
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
46 Grouping:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
47
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
48 (?:foo) (foo)
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
49
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
50
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
51 Character classes:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
52
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
53 [ab0-9\-\]'] 'ab0-9-!]'!'
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
54 [^a-z] '^a-z'
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
55 . '^\n'
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
56 . (with /s) .
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
57 [[:alnum:]] '[:alnum:]'
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
58
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
59 Ploki provides the following POSIXish character classes (inside an ordinary
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
60 character class only):
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
61 [:alnum:] alphanumeric char
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
62 [:alpha:] alphabetic char
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
63 [:cntrl:] control char
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
64 [:digit:] digit
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
65 [:graph:] printable char (except space)
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
66 [:lower:] lowercase char
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
67 [:print:] printable char (including space)
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
68 [:punct:] punctuation char ([:graph:] without [:alnum:])
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
69 [:space:] whitespace char
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
70 [:upper:] uppercase char
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
71 [:xdigit:] hex digit
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
72 Every POSIXish subclass [:foo:] can be negated by writing [:^foo:] (this is
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
73 compatible with Perl).
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
74
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
75 In addition there are the following built-in character classes (inside and
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
76 outside of user-defined character classes):
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
77
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
78 q! (equivalent to '[:alpha:]')
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
79 c! (equivalent to '[:cntrl:]')
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
80 \d d! (equivalent to '[:digit:]')
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
81 l! (equivalent to '[:lower:]')
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
82 p! (equivalent to '[:print:]')
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
83 \s s! (equivalent to '[:space:]') [1]
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
84 u! (equivalent to '[:upper:]')
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
85 x! (equivalent to '[:xdigit:]')
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
86 \w w! (equivalent to '_[:alnum:]')
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
87
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
88 They can be negated by using the corresponding uppercase letter, e.g. D!
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
89 matches a non-digit character.
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
90
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
91 [1] Perl's \s does not include \v (vertical tab).
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
92
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
93
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
94 Independent groups:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
95
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
96 (?>foo) <foo>
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
97
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
98
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
99 Capturing:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
100
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
101 (foo) {foo}
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
102
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
103
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
104 Backreferences:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
105
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
106 \1, \2, ... 0!, 1!, ...
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
107
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
108 Note that ploki's backreferences start with 0 *counting from the right*,
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
109 i.e. after a successful match against "{{f}o{o}}", \0 is "foo", \1 is "o"
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
110 and \2 is "f".
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
111
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
112
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
113 Assertions:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
114
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
115 ^ (without /m) ^
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
116 \A ^
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
117 ^ (with /m) A!
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
118 \z $
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
119 $ (with /m) Z! (roughly)
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
120 \b b!
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
121 \B B!
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
122 (?=foo) [foo&]
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
123 (?!foo) [foo^]
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
124 (?<=foo) NOT IMPLEMENTED
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
125 (?<!foo) NOT IMPLEMENTED
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
126
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
127 There's a special assertion that doesn't exist in Perl: [N] succeeds if
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
128 N! is set, i.e. if the Nth capturing group succeeded.
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
129
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
130
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
131 Selection:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
132
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
133 (?(cond)yes-pattern|no-pattern) [no-pattern|yes-patterncond?]
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
134 (?(cond)yes-pattern) [yes-patterncond?]
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
135
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
136 Note that in ploki "cond" can be any piece: [oof?] matches "foo" if the next
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
137 character is "f". Perl's special case of (?(N)yes|no) can be written as
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
138 [no|yes[N]?] in ploki; Perl's m{(\()?[^()]+(?(1)\))} corresponds to ploki's
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
139 "{(!}?'^()'+[)![0]?]".
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
140
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
141
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
142 Comments:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
143
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
144 (?#foo) [foo#]
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
145
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
146
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
147 Debugging:
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
148
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
149 perl has a very nice module/command-line option: perl -Mre=debug (or perl
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
150 -Mre=debugcolor) will show how perl compiles and matches your regexes. ploki
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
151 offers a far less powerful feature: ploki -dr displays a human-readable form
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
152 of every regex being compiled. It doesn't show how it matches, however. (On
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
153 the other hand I think the regex output of ploki -dr is much more readable
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
154 than what perl -Mre=debug produces.)
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
155
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
156
ac0403686959 <oerjan> rm -rf src/ploki; mv ploki src
HackBot
parents:
diff changeset
157 # vi: set tw=76 et: