7267
|
1 <html>
|
|
2 <head>
|
|
3 <title>PLY Internals</title>
|
|
4 </head>
|
|
5 <body bgcolor="#ffffff">
|
|
6
|
|
7 <h1>PLY Internals</h1>
|
|
8
|
|
9 <b>
|
|
10 David M. Beazley <br>
|
|
11 dave@dabeaz.com<br>
|
|
12 </b>
|
|
13
|
|
14 <p>
|
|
15 <b>PLY Version: 3.0</b>
|
|
16 <p>
|
|
17
|
|
18 <!-- INDEX -->
|
|
19 <div class="sectiontoc">
|
|
20 <ul>
|
|
21 <li><a href="#internal_nn1">Introduction</a>
|
|
22 <li><a href="#internal_nn2">Grammar Class</a>
|
|
23 <li><a href="#internal_nn3">Productions</a>
|
|
24 <li><a href="#internal_nn4">LRItems</a>
|
|
25 <li><a href="#internal_nn5">LRTable</a>
|
|
26 <li><a href="#internal_nn6">LRGeneratedTable</a>
|
|
27 <li><a href="#internal_nn7">LRParser</a>
|
|
28 <li><a href="#internal_nn8">ParserReflect</a>
|
|
29 <li><a href="#internal_nn9">High-level operation</a>
|
|
30 </ul>
|
|
31 </div>
|
|
32 <!-- INDEX -->
|
|
33
|
|
34
|
|
35 <H2><a name="internal_nn1"></a>1. Introduction</H2>
|
|
36
|
|
37
|
|
38 This document describes classes and functions that make up the internal
|
|
39 operation of PLY. Using this programming interface, it is possible to
|
|
40 manually build an parser using a different interface specification
|
|
41 than what PLY normally uses. For example, you could build a gramar
|
|
42 from information parsed in a completely different input format. Some of
|
|
43 these objects may be useful for building more advanced parsing engines
|
|
44 such as GLR.
|
|
45
|
|
46 <p>
|
|
47 It should be stressed that using PLY at this level is not for the
|
|
48 faint of heart. Generally, it's assumed that you know a bit of
|
|
49 the underlying compiler theory and how an LR parser is put together.
|
|
50
|
|
51 <H2><a name="internal_nn2"></a>2. Grammar Class</H2>
|
|
52
|
|
53
|
|
54 The file <tt>ply.yacc</tt> defines a class <tt>Grammar</tt> that
|
|
55 is used to hold and manipulate information about a grammar
|
|
56 specification. It encapsulates the same basic information
|
|
57 about a grammar that is put into a YACC file including
|
|
58 the list of tokens, precedence rules, and grammar rules.
|
|
59 Various operations are provided to perform different validations
|
|
60 on the grammar. In addition, there are operations to compute
|
|
61 the first and follow sets that are needed by the various table
|
|
62 generation algorithms.
|
|
63
|
|
64 <p>
|
|
65 <tt><b>Grammar(terminals)</b></tt>
|
|
66
|
|
67 <blockquote>
|
|
68 Creates a new grammar object. <tt>terminals</tt> is a list of strings
|
|
69 specifying the terminals for the grammar. An instance <tt>g</tt> of
|
|
70 <tt>Grammar</tt> has the following methods:
|
|
71 </blockquote>
|
|
72
|
|
73 <p>
|
|
74 <b><tt>g.set_precedence(term,assoc,level)</tt></b>
|
|
75 <blockquote>
|
|
76 Sets the precedence level and associativity for a given terminal <tt>term</tt>.
|
|
77 <tt>assoc</tt> is one of <tt>'right'</tt>,
|
|
78 <tt>'left'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is a positive integer. The higher
|
|
79 the value of <tt>level</tt>, the higher the precedence. Here is an example of typical
|
|
80 precedence settings:
|
|
81
|
|
82 <pre>
|
|
83 g.set_precedence('PLUS', 'left',1)
|
|
84 g.set_precedence('MINUS', 'left',1)
|
|
85 g.set_precedence('TIMES', 'left',2)
|
|
86 g.set_precedence('DIVIDE','left',2)
|
|
87 g.set_precedence('UMINUS','left',3)
|
|
88 </pre>
|
|
89
|
|
90 This method must be called prior to adding any productions to the
|
|
91 grammar with <tt>g.add_production()</tt>. The precedence of individual grammar
|
|
92 rules is determined by the precedence of the right-most terminal.
|
|
93
|
|
94 </blockquote>
|
|
95 <p>
|
|
96 <b><tt>g.add_production(name,syms,func=None,file='',line=0)</tt></b>
|
|
97 <blockquote>
|
|
98 Adds a new grammar rule. <tt>name</tt> is the name of the rule,
|
|
99 <tt>syms</tt> is a list of symbols making up the right hand
|
|
100 side of the rule, <tt>func</tt> is the function to call when
|
|
101 reducing the rule. <tt>file</tt> and <tt>line</tt> specify
|
|
102 the filename and line number of the rule and are used for
|
|
103 generating error messages.
|
|
104
|
|
105 <p>
|
|
106 The list of symbols in <tt>syms</tt> may include character
|
|
107 literals and <tt>%prec</tt> specifiers. Here are some
|
|
108 examples:
|
|
109
|
|
110 <pre>
|
|
111 g.add_production('expr',['expr','PLUS','term'],func,file,line)
|
|
112 g.add_production('expr',['expr','"+"','term'],func,file,line)
|
|
113 g.add_production('expr',['MINUS','expr','%prec','UMINUS'],func,file,line)
|
|
114 </pre>
|
|
115
|
|
116 <p>
|
|
117 If any kind of error is detected, a <tt>GrammarError</tt> exception
|
|
118 is raised with a message indicating the reason for the failure.
|
|
119 </blockquote>
|
|
120
|
|
121 <p>
|
|
122 <b><tt>g.set_start(start=None)</tt></b>
|
|
123 <blockquote>
|
|
124 Sets the starting rule for the grammar. <tt>start</tt> is a string
|
|
125 specifying the name of the start rule. If <tt>start</tt> is omitted,
|
|
126 the first grammar rule added with <tt>add_production()</tt> is taken to be
|
|
127 the starting rule. This method must always be called after all
|
|
128 productions have been added.
|
|
129 </blockquote>
|
|
130
|
|
131 <p>
|
|
132 <b><tt>g.find_unreachable()</tt></b>
|
|
133 <blockquote>
|
|
134 Diagnostic function. Returns a list of all unreachable non-terminals
|
|
135 defined in the grammar. This is used to identify inactive parts of
|
|
136 the grammar specification.
|
|
137 </blockquote>
|
|
138
|
|
139 <p>
|
|
140 <b><tt>g.infinite_cycle()</tt></b>
|
|
141 <blockquote>
|
|
142 Diagnostic function. Returns a list of all non-terminals in the
|
|
143 grammar that result in an infinite cycle. This condition occurs if
|
|
144 there is no way for a grammar rule to expand to a string containing
|
|
145 only terminal symbols.
|
|
146 </blockquote>
|
|
147
|
|
148 <p>
|
|
149 <b><tt>g.undefined_symbols()</tt></b>
|
|
150 <blockquote>
|
|
151 Diagnostic function. Returns a list of tuples <tt>(name, prod)</tt>
|
|
152 corresponding to undefined symbols in the grammar. <tt>name</tt> is the
|
|
153 name of the undefined symbol and <tt>prod</tt> is an instance of
|
|
154 <tt>Production</tt> which has information about the production rule
|
|
155 where the undefined symbol was used.
|
|
156 </blockquote>
|
|
157
|
|
158 <p>
|
|
159 <b><tt>g.unused_terminals()</tt></b>
|
|
160 <blockquote>
|
|
161 Diagnostic function. Returns a list of terminals that were defined,
|
|
162 but never used in the grammar.
|
|
163 </blockquote>
|
|
164
|
|
165 <p>
|
|
166 <b><tt>g.unused_rules()</tt></b>
|
|
167 <blockquote>
|
|
168 Diagnostic function. Returns a list of <tt>Production</tt> instances
|
|
169 corresponding to production rules that were defined in the grammar,
|
|
170 but never used anywhere. This is slightly different
|
|
171 than <tt>find_unreachable()</tt>.
|
|
172 </blockquote>
|
|
173
|
|
174 <p>
|
|
175 <b><tt>g.unused_precedence()</tt></b>
|
|
176 <blockquote>
|
|
177 Diagnostic function. Returns a list of tuples <tt>(term, assoc)</tt>
|
|
178 corresponding to precedence rules that were set, but never used the
|
|
179 grammar. <tt>term</tt> is the terminal name and <tt>assoc</tt> is the
|
|
180 precedence associativity (e.g., <tt>'left'</tt>, <tt>'right'</tt>,
|
|
181 or <tt>'nonassoc'</tt>.
|
|
182 </blockquote>
|
|
183
|
|
184 <p>
|
|
185 <b><tt>g.compute_first()</tt></b>
|
|
186 <blockquote>
|
|
187 Compute all of the first sets for all symbols in the grammar. Returns a dictionary
|
|
188 mapping symbol names to a list of all first symbols.
|
|
189 </blockquote>
|
|
190
|
|
191 <p>
|
|
192 <b><tt>g.compute_follow()</tt></b>
|
|
193 <blockquote>
|
|
194 Compute all of the follow sets for all non-terminals in the grammar.
|
|
195 The follow set is the set of all possible symbols that might follow a
|
|
196 given non-terminal. Returns a dictionary mapping non-terminal names
|
|
197 to a list of symbols.
|
|
198 </blockquote>
|
|
199
|
|
200 <p>
|
|
201 <b><tt>g.build_lritems()</tt></b>
|
|
202 <blockquote>
|
|
203 Calculates all of the LR items for all productions in the grammar. This
|
|
204 step is required before using the grammar for any kind of table generation.
|
|
205 See the section on LR items below.
|
|
206 </blockquote>
|
|
207
|
|
208 <p>
|
|
209 The following attributes are set by the above methods and may be useful
|
|
210 in code that works with the grammar. All of these attributes should be
|
|
211 assumed to be read-only. Changing their values directly will likely
|
|
212 break the grammar.
|
|
213
|
|
214 <p>
|
|
215 <b><tt>g.Productions</tt></b>
|
|
216 <blockquote>
|
|
217 A list of all productions added. The first entry is reserved for
|
|
218 a production representing the starting rule. The objects in this list
|
|
219 are instances of the <tt>Production</tt> class, described shortly.
|
|
220 </blockquote>
|
|
221
|
|
222 <p>
|
|
223 <b><tt>g.Prodnames</tt></b>
|
|
224 <blockquote>
|
|
225 A dictionary mapping the names of nonterminals to a list of all
|
|
226 productions of that nonterminal.
|
|
227 </blockquote>
|
|
228
|
|
229 <p>
|
|
230 <b><tt>g.Terminals</tt></b>
|
|
231 <blockquote>
|
|
232 A dictionary mapping the names of terminals to a list of the
|
|
233 production numbers where they are used.
|
|
234 </blockquote>
|
|
235
|
|
236 <p>
|
|
237 <b><tt>g.Nonterminals</tt></b>
|
|
238 <blockquote>
|
|
239 A dictionary mapping the names of nonterminals to a list of the
|
|
240 production numbers where they are used.
|
|
241 </blockquote>
|
|
242
|
|
243 <p>
|
|
244 <b><tt>g.First</tt></b>
|
|
245 <blockquote>
|
|
246 A dictionary representing the first sets for all grammar symbols. This is
|
|
247 computed and returned by the <tt>compute_first()</tt> method.
|
|
248 </blockquote>
|
|
249
|
|
250 <p>
|
|
251 <b><tt>g.Follow</tt></b>
|
|
252 <blockquote>
|
|
253 A dictionary representing the follow sets for all grammar rules. This is
|
|
254 computed and returned by the <tt>compute_follow()</tt> method.
|
|
255 </blockquote>
|
|
256
|
|
257 <p>
|
|
258 <b><tt>g.Start</tt></b>
|
|
259 <blockquote>
|
|
260 Starting symbol for the grammar. Set by the <tt>set_start()</tt> method.
|
|
261 </blockquote>
|
|
262
|
|
263 For the purposes of debugging, a <tt>Grammar</tt> object supports the <tt>__len__()</tt> and
|
|
264 <tt>__getitem__()</tt> special methods. Accessing <tt>g[n]</tt> returns the nth production
|
|
265 from the grammar.
|
|
266
|
|
267
|
|
268 <H2><a name="internal_nn3"></a>3. Productions</H2>
|
|
269
|
|
270
|
|
271 <tt>Grammar</tt> objects store grammar rules as instances of a <tt>Production</tt> class. This
|
|
272 class has no public constructor--you should only create productions by calling <tt>Grammar.add_production()</tt>.
|
|
273 The following attributes are available on a <tt>Production</tt> instance <tt>p</tt>.
|
|
274
|
|
275 <p>
|
|
276 <b><tt>p.name</tt></b>
|
|
277 <blockquote>
|
|
278 The name of the production. For a grammar rule such as <tt>A : B C D</tt>, this is <tt>'A'</tt>.
|
|
279 </blockquote>
|
|
280
|
|
281 <p>
|
|
282 <b><tt>p.prod</tt></b>
|
|
283 <blockquote>
|
|
284 A tuple of symbols making up the right-hand side of the production. For a grammar rule such as <tt>A : B C D</tt>, this is <tt>('B','C','D')</tt>.
|
|
285 </blockquote>
|
|
286
|
|
287 <p>
|
|
288 <b><tt>p.number</tt></b>
|
|
289 <blockquote>
|
|
290 Production number. An integer containing the index of the production in the grammar's <tt>Productions</tt> list.
|
|
291 </blockquote>
|
|
292
|
|
293 <p>
|
|
294 <b><tt>p.func</tt></b>
|
|
295 <blockquote>
|
|
296 The name of the reduction function associated with the production.
|
|
297 This is the function that will execute when reducing the entire
|
|
298 grammar rule during parsing.
|
|
299 </blockquote>
|
|
300
|
|
301 <p>
|
|
302 <b><tt>p.callable</tt></b>
|
|
303 <blockquote>
|
|
304 The callable object associated with the name in <tt>p.func</tt>. This is <tt>None</tt>
|
|
305 unless the production has been bound using <tt>bind()</tt>.
|
|
306 </blockquote>
|
|
307
|
|
308 <p>
|
|
309 <b><tt>p.file</tt></b>
|
|
310 <blockquote>
|
|
311 Filename associated with the production. Typically this is the file where the production was defined. Used for error messages.
|
|
312 </blockquote>
|
|
313
|
|
314 <p>
|
|
315 <b><tt>p.lineno</tt></b>
|
|
316 <blockquote>
|
|
317 Line number associated with the production. Typically this is the line number in <tt>p.file</tt> where the production was defined. Used for error messages.
|
|
318 </blockquote>
|
|
319
|
|
320 <p>
|
|
321 <b><tt>p.prec</tt></b>
|
|
322 <blockquote>
|
|
323 Precedence and associativity associated with the production. This is a tuple <tt>(assoc,level)</tt> where
|
|
324 <tt>assoc</tt> is one of <tt>'left'</tt>,<tt>'right'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is
|
|
325 an integer. This value is determined by the precedence of the right-most terminal symbol in the production
|
|
326 or by use of the <tt>%prec</tt> specifier when adding the production.
|
|
327 </blockquote>
|
|
328
|
|
329 <p>
|
|
330 <b><tt>p.usyms</tt></b>
|
|
331 <blockquote>
|
|
332 A list of all unique symbols found in the production.
|
|
333 </blockquote>
|
|
334
|
|
335 <p>
|
|
336 <b><tt>p.lr_items</tt></b>
|
|
337 <blockquote>
|
|
338 A list of all LR items for this production. This attribute only has a meaningful value if the
|
|
339 <tt>Grammar.build_lritems()</tt> method has been called. The items in this list are
|
|
340 instances of <tt>LRItem</tt> described below.
|
|
341 </blockquote>
|
|
342
|
|
343 <p>
|
|
344 <b><tt>p.lr_next</tt></b>
|
|
345 <blockquote>
|
|
346 The head of a linked-list representation of the LR items in <tt>p.lr_items</tt>.
|
|
347 This attribute only has a meaningful value if the <tt>Grammar.build_lritems()</tt>
|
|
348 method has been called. Each <tt>LRItem</tt> instance has a <tt>lr_next</tt> attribute
|
|
349 to move to the next item. The list is terminated by <tt>None</tt>.
|
|
350 </blockquote>
|
|
351
|
|
352 <p>
|
|
353 <b><tt>p.bind(dict)</tt></b>
|
|
354 <blockquote>
|
|
355 Binds the production function name in <tt>p.func</tt> to a callable object in
|
|
356 <tt>dict</tt>. This operation is typically carried out in the last step
|
|
357 prior to running the parsing engine and is needed since parsing tables are typically
|
|
358 read from files which only include the function names, not the functions themselves.
|
|
359 </blockquote>
|
|
360
|
|
361 <P>
|
|
362 <tt>Production</tt> objects support
|
|
363 the <tt>__len__()</tt>, <tt>__getitem__()</tt>, and <tt>__str__()</tt>
|
|
364 special methods.
|
|
365 <tt>len(p)</tt> returns the number of symbols in <tt>p.prod</tt>
|
|
366 and <tt>p[n]</tt> is the same as <tt>p.prod[n]</tt>.
|
|
367
|
|
368 <H2><a name="internal_nn4"></a>4. LRItems</H2>
|
|
369
|
|
370
|
|
371 The construction of parsing tables in an LR-based parser generator is primarily
|
|
372 done over a set of "LR Items". An LR item represents a stage of parsing one
|
|
373 of the grammar rules. To compute the LR items, it is first necessary to
|
|
374 call <tt>Grammar.build_lritems()</tt>. Once this step, all of the productions
|
|
375 in the grammar will have their LR items attached to them.
|
|
376
|
|
377 <p>
|
|
378 Here is an interactive example that shows what LR items look like if you
|
|
379 interactively experiment. In this example, <tt>g</tt> is a <tt>Grammar</tt>
|
|
380 object.
|
|
381
|
|
382 <blockquote>
|
|
383 <pre>
|
|
384 >>> <b>g.build_lritems()</b>
|
|
385 >>> <b>p = g[1]</b>
|
|
386 >>> <b>p</b>
|
|
387 Production(statement -> ID = expr)
|
|
388 >>>
|
|
389 </pre>
|
|
390 </blockquote>
|
|
391
|
|
392 In the above code, <tt>p</tt> represents the first grammar rule. In
|
|
393 this case, a rule <tt>'statement -> ID = expr'</tt>.
|
|
394
|
|
395 <p>
|
|
396 Now, let's look at the LR items for <tt>p</tt>.
|
|
397
|
|
398 <blockquote>
|
|
399 <pre>
|
|
400 >>> <b>p.lr_items</b>
|
|
401 [LRItem(statement -> . ID = expr),
|
|
402 LRItem(statement -> ID . = expr),
|
|
403 LRItem(statement -> ID = . expr),
|
|
404 LRItem(statement -> ID = expr .)]
|
|
405 >>>
|
|
406 </pre>
|
|
407 </blockquote>
|
|
408
|
|
409 In each LR item, the dot (.) represents a specific stage of parsing. In each LR item, the dot
|
|
410 is advanced by one symbol. It is only when the dot reaches the very end that a production
|
|
411 is successfully parsed.
|
|
412
|
|
413 <p>
|
|
414 An instance <tt>lr</tt> of <tt>LRItem</tt> has the following
|
|
415 attributes that hold information related to that specific stage of
|
|
416 parsing.
|
|
417
|
|
418 <p>
|
|
419 <b><tt>lr.name</tt></b>
|
|
420 <blockquote>
|
|
421 The name of the grammar rule. For example, <tt>'statement'</tt> in the above example.
|
|
422 </blockquote>
|
|
423
|
|
424 <p>
|
|
425 <b><tt>lr.prod</tt></b>
|
|
426 <blockquote>
|
|
427 A tuple of symbols representing the right-hand side of the production, including the
|
|
428 special <tt>'.'</tt> character. For example, <tt>('ID','.','=','expr')</tt>.
|
|
429 </blockquote>
|
|
430
|
|
431 <p>
|
|
432 <b><tt>lr.number</tt></b>
|
|
433 <blockquote>
|
|
434 An integer representing the production number in the grammar.
|
|
435 </blockquote>
|
|
436
|
|
437 <p>
|
|
438 <b><tt>lr.usyms</tt></b>
|
|
439 <blockquote>
|
|
440 A set of unique symbols in the production. Inherited from the original <tt>Production</tt> instance.
|
|
441 </blockquote>
|
|
442
|
|
443 <p>
|
|
444 <b><tt>lr.lr_index</tt></b>
|
|
445 <blockquote>
|
|
446 An integer representing the position of the dot (.). You should never use <tt>lr.prod.index()</tt>
|
|
447 to search for it--the result will be wrong if the grammar happens to also use (.) as a character
|
|
448 literal.
|
|
449 </blockquote>
|
|
450
|
|
451 <p>
|
|
452 <b><tt>lr.lr_after</tt></b>
|
|
453 <blockquote>
|
|
454 A list of all productions that can legally appear immediately to the right of the
|
|
455 dot (.). This list contains <tt>Production</tt> instances. This attribute
|
|
456 represents all of the possible branches a parse can take from the current position.
|
|
457 For example, suppose that <tt>lr</tt> represents a stage immediately before
|
|
458 an expression like this:
|
|
459
|
|
460 <pre>
|
|
461 >>> <b>lr</b>
|
|
462 LRItem(statement -> ID = . expr)
|
|
463 >>>
|
|
464 </pre>
|
|
465
|
|
466 Then, the value of <tt>lr.lr_after</tt> might look like this, showing all productions that
|
|
467 can legally appear next:
|
|
468
|
|
469 <pre>
|
|
470 >>> <b>lr.lr_after</b>
|
|
471 [Production(expr -> expr PLUS expr),
|
|
472 Production(expr -> expr MINUS expr),
|
|
473 Production(expr -> expr TIMES expr),
|
|
474 Production(expr -> expr DIVIDE expr),
|
|
475 Production(expr -> MINUS expr),
|
|
476 Production(expr -> LPAREN expr RPAREN),
|
|
477 Production(expr -> NUMBER),
|
|
478 Production(expr -> ID)]
|
|
479 >>>
|
|
480 </pre>
|
|
481
|
|
482 </blockquote>
|
|
483
|
|
484 <p>
|
|
485 <b><tt>lr.lr_before</tt></b>
|
|
486 <blockquote>
|
|
487 The grammar symbol that appears immediately before the dot (.) or <tt>None</tt> if
|
|
488 at the beginning of the parse.
|
|
489 </blockquote>
|
|
490
|
|
491 <p>
|
|
492 <b><tt>lr.lr_next</tt></b>
|
|
493 <blockquote>
|
|
494 A link to the next LR item, representing the next stage of the parse. <tt>None</tt> if <tt>lr</tt>
|
|
495 is the last LR item.
|
|
496 </blockquote>
|
|
497
|
|
498 <tt>LRItem</tt> instances also support the <tt>__len__()</tt> and <tt>__getitem__()</tt> special methods.
|
|
499 <tt>len(lr)</tt> returns the number of items in <tt>lr.prod</tt> including the dot (.). <tt>lr[n]</tt>
|
|
500 returns <tt>lr.prod[n]</tt>.
|
|
501
|
|
502 <p>
|
|
503 It goes without saying that all of the attributes associated with LR
|
|
504 items should be assumed to be read-only. Modifications will very
|
|
505 likely create a small black-hole that will consume you and your code.
|
|
506
|
|
507 <H2><a name="internal_nn5"></a>5. LRTable</H2>
|
|
508
|
|
509
|
|
510 The <tt>LRTable</tt> class is used to represent LR parsing table data. This
|
|
511 minimally includes the production list, action table, and goto table.
|
|
512
|
|
513 <p>
|
|
514 <b><tt>LRTable()</tt></b>
|
|
515 <blockquote>
|
|
516 Create an empty LRTable object. This object contains only the information needed to
|
|
517 run an LR parser.
|
|
518 </blockquote>
|
|
519
|
|
520 An instance <tt>lrtab</tt> of <tt>LRTable</tt> has the following methods:
|
|
521
|
|
522 <p>
|
|
523 <b><tt>lrtab.read_table(module)</tt></b>
|
|
524 <blockquote>
|
|
525 Populates the LR table with information from the module specified in <tt>module</tt>.
|
|
526 <tt>module</tt> is either a module object already loaded with <tt>import</tt> or
|
|
527 the name of a Python module. If it's a string containing a module name, it is
|
|
528 loaded and parsing data is extracted. Returns the signature value that was used
|
|
529 when initially writing the tables. Raises a <tt>VersionError</tt> exception if
|
|
530 the module was created using an incompatible version of PLY.
|
|
531 </blockquote>
|
|
532
|
|
533 <p>
|
|
534 <b><tt>lrtab.bind_callables(dict)</tt></b>
|
|
535 <blockquote>
|
|
536 This binds all of the function names used in productions to callable objects
|
|
537 found in the dictionary <tt>dict</tt>. During table generation and when reading
|
|
538 LR tables from files, PLY only uses the names of action functions such as <tt>'p_expr'</tt>,
|
|
539 <tt>'p_statement'</tt>, etc. In order to actually run the parser, these names
|
|
540 have to be bound to callable objects. This method is always called prior to
|
|
541 running a parser.
|
|
542 </blockquote>
|
|
543
|
|
544 After <tt>lrtab</tt> has been populated, the following attributes are defined.
|
|
545
|
|
546 <p>
|
|
547 <b><tt>lrtab.lr_method</tt></b>
|
|
548 <blockquote>
|
|
549 The LR parsing method used (e.g., <tt>'LALR'</tt>)
|
|
550 </blockquote>
|
|
551
|
|
552
|
|
553 <p>
|
|
554 <b><tt>lrtab.lr_productions</tt></b>
|
|
555 <blockquote>
|
|
556 The production list. If the parsing tables have been newly
|
|
557 constructed, this will be a list of <tt>Production</tt> instances. If
|
|
558 the parsing tables have been read from a file, it's a list
|
|
559 of <tt>MiniProduction</tt> instances. This, together
|
|
560 with <tt>lr_action</tt> and <tt>lr_goto</tt> contain all of the
|
|
561 information needed by the LR parsing engine.
|
|
562 </blockquote>
|
|
563
|
|
564 <p>
|
|
565 <b><tt>lrtab.lr_action</tt></b>
|
|
566 <blockquote>
|
|
567 The LR action dictionary that implements the underlying state machine.
|
|
568 The keys of this dictionary are the LR states.
|
|
569 </blockquote>
|
|
570
|
|
571 <p>
|
|
572 <b><tt>lrtab.lr_goto</tt></b>
|
|
573 <blockquote>
|
|
574 The LR goto table that contains information about grammar rule reductions.
|
|
575 </blockquote>
|
|
576
|
|
577
|
|
578 <H2><a name="internal_nn6"></a>6. LRGeneratedTable</H2>
|
|
579
|
|
580
|
|
581 The <tt>LRGeneratedTable</tt> class represents constructed LR parsing tables on a
|
|
582 grammar. It is a subclass of <tt>LRTable</tt>.
|
|
583
|
|
584 <p>
|
|
585 <b><tt>LRGeneratedTable(grammar, method='LALR',log=None)</tt></b>
|
|
586 <blockquote>
|
|
587 Create the LR parsing tables on a grammar. <tt>grammar</tt> is an instance of <tt>Grammar</tt>,
|
|
588 <tt>method</tt> is a string with the parsing method (<tt>'SLR'</tt> or <tt>'LALR'</tt>), and
|
|
589 <tt>log</tt> is a logger object used to write debugging information. The debugging information
|
|
590 written to <tt>log</tt> is the same as what appears in the <tt>parser.out</tt> file created
|
|
591 by yacc. By supplying a custom logger with a different message format, it is possible to get
|
|
592 more information (e.g., the line number in <tt>yacc.py</tt> used for issuing each line of
|
|
593 output in the log). The result is an instance of <tt>LRGeneratedTable</tt>.
|
|
594 </blockquote>
|
|
595
|
|
596 <p>
|
|
597 An instance <tt>lr</tt> of <tt>LRGeneratedTable</tt> has the following attributes.
|
|
598
|
|
599 <p>
|
|
600 <b><tt>lr.grammar</tt></b>
|
|
601 <blockquote>
|
|
602 A link to the Grammar object used to construct the parsing tables.
|
|
603 </blockquote>
|
|
604
|
|
605 <p>
|
|
606 <b><tt>lr.lr_method</tt></b>
|
|
607 <blockquote>
|
|
608 The LR parsing method used (e.g., <tt>'LALR'</tt>)
|
|
609 </blockquote>
|
|
610
|
|
611
|
|
612 <p>
|
|
613 <b><tt>lr.lr_productions</tt></b>
|
|
614 <blockquote>
|
|
615 A reference to <tt>grammar.Productions</tt>. This, together with <tt>lr_action</tt> and <tt>lr_goto</tt>
|
|
616 contain all of the information needed by the LR parsing engine.
|
|
617 </blockquote>
|
|
618
|
|
619 <p>
|
|
620 <b><tt>lr.lr_action</tt></b>
|
|
621 <blockquote>
|
|
622 The LR action dictionary that implements the underlying state machine. The keys of this dictionary are
|
|
623 the LR states.
|
|
624 </blockquote>
|
|
625
|
|
626 <p>
|
|
627 <b><tt>lr.lr_goto</tt></b>
|
|
628 <blockquote>
|
|
629 The LR goto table that contains information about grammar rule reductions.
|
|
630 </blockquote>
|
|
631
|
|
632 <p>
|
|
633 <b><tt>lr.sr_conflicts</tt></b>
|
|
634 <blockquote>
|
|
635 A list of tuples <tt>(state,token,resolution)</tt> identifying all shift/reduce conflicts. <tt>state</tt> is the LR state
|
|
636 number where the conflict occurred, <tt>token</tt> is the token causing the conflict, and <tt>resolution</tt> is
|
|
637 a string describing the resolution taken. <tt>resolution</tt> is either <tt>'shift'</tt> or <tt>'reduce'</tt>.
|
|
638 </blockquote>
|
|
639
|
|
640 <p>
|
|
641 <b><tt>lr.rr_conflicts</tt></b>
|
|
642 <blockquote>
|
|
643 A list of tuples <tt>(state,rule,rejected)</tt> identifying all reduce/reduce conflicts. <tt>state</tt> is the
|
|
644 LR state number where the conflict occurred, <tt>rule</tt> is the production rule that was selected
|
|
645 and <tt>rejected</tt> is the production rule that was rejected. Both <tt>rule</tt> and </tt>rejected</tt> are
|
|
646 instances of <tt>Production</tt>. They can be inspected to provide the user with more information.
|
|
647 </blockquote>
|
|
648
|
|
649 <p>
|
|
650 There are two public methods of <tt>LRGeneratedTable</tt>.
|
|
651
|
|
652 <p>
|
|
653 <b><tt>lr.write_table(modulename,outputdir="",signature="")</tt></b>
|
|
654 <blockquote>
|
|
655 Writes the LR parsing table information to a Python module. <tt>modulename</tt> is a string
|
|
656 specifying the name of a module such as <tt>"parsetab"</tt>. <tt>outputdir</tt> is the name of a
|
|
657 directory where the module should be created. <tt>signature</tt> is a string representing a
|
|
658 grammar signature that's written into the output file. This can be used to detect when
|
|
659 the data stored in a module file is out-of-sync with the the grammar specification (and that
|
|
660 the tables need to be regenerated). If <tt>modulename</tt> is a string <tt>"parsetab"</tt>,
|
|
661 this function creates a file called <tt>parsetab.py</tt>. If the module name represents a
|
|
662 package such as <tt>"foo.bar.parsetab"</tt>, then only the last component, <tt>"parsetab"</tt> is
|
|
663 used.
|
|
664 </blockquote>
|
|
665
|
|
666
|
|
667 <H2><a name="internal_nn7"></a>7. LRParser</H2>
|
|
668
|
|
669
|
|
670 The <tt>LRParser</tt> class implements the low-level LR parsing engine.
|
|
671
|
|
672
|
|
673 <p>
|
|
674 <b><tt>LRParser(lrtab, error_func)</tt></b>
|
|
675 <blockquote>
|
|
676 Create an LRParser. <tt>lrtab</tt> is an instance of <tt>LRTable</tt>
|
|
677 containing the LR production and state tables. <tt>error_func</tt> is the
|
|
678 error function to invoke in the event of a parsing error.
|
|
679 </blockquote>
|
|
680
|
|
681 An instance <tt>p</tt> of <tt>LRParser</tt> has the following methods:
|
|
682
|
|
683 <p>
|
|
684 <b><tt>p.parse(input=None,lexer=None,debug=0,tracking=0,tokenfunc=None)</tt></b>
|
|
685 <blockquote>
|
|
686 Run the parser. <tt>input</tt> is a string, which if supplied is fed into the
|
|
687 lexer using its <tt>input()</tt> method. <tt>lexer</tt> is an instance of the
|
|
688 <tt>Lexer</tt> class to use for tokenizing. If not supplied, the last lexer
|
|
689 created with the <tt>lex</tt> module is used. <tt>debug</tt> is a boolean flag
|
|
690 that enables debugging. <tt>tracking</tt> is a boolean flag that tells the
|
|
691 parser to perform additional line number tracking. <tt>tokenfunc</tt> is a callable
|
|
692 function that returns the next token. If supplied, the parser will use it to get
|
|
693 all tokens.
|
|
694 </blockquote>
|
|
695
|
|
696 <p>
|
|
697 <b><tt>p.restart()</tt></b>
|
|
698 <blockquote>
|
|
699 Resets the parser state for a parse already in progress.
|
|
700 </blockquote>
|
|
701
|
|
702 <H2><a name="internal_nn8"></a>8. ParserReflect</H2>
|
|
703
|
|
704
|
|
705 <p>
|
|
706 The <tt>ParserReflect</tt> class is used to collect parser specification data
|
|
707 from a Python module or object. This class is what collects all of the
|
|
708 <tt>p_rule()</tt> functions in a PLY file, performs basic error checking,
|
|
709 and collects all of the needed information to build a grammar. Most of the
|
|
710 high-level PLY interface as used by the <tt>yacc()</tt> function is actually
|
|
711 implemented by this class.
|
|
712
|
|
713 <p>
|
|
714 <b><tt>ParserReflect(pdict, log=None)</tt></b>
|
|
715 <blockquote>
|
|
716 Creates a <tt>ParserReflect</tt> instance. <tt>pdict</tt> is a dictionary
|
|
717 containing parser specification data. This dictionary typically corresponds
|
|
718 to the module or class dictionary of code that implements a PLY parser.
|
|
719 <tt>log</tt> is a logger instance that will be used to report error
|
|
720 messages.
|
|
721 </blockquote>
|
|
722
|
|
723 An instance <tt>p</tt> of <tt>ParserReflect</tt> has the following methods:
|
|
724
|
|
725 <p>
|
|
726 <b><tt>p.get_all()</tt></b>
|
|
727 <blockquote>
|
|
728 Collect and store all required parsing information.
|
|
729 </blockquote>
|
|
730
|
|
731 <p>
|
|
732 <b><tt>p.validate_all()</tt></b>
|
|
733 <blockquote>
|
|
734 Validate all of the collected parsing information. This is a seprate step
|
|
735 from <tt>p.get_all()</tt> as a performance optimization. In order to
|
|
736 increase parser start-up time, a parser can elect to only validate the
|
|
737 parsing data when regenerating the parsing tables. The validation
|
|
738 step tries to collect as much information as possible rather than
|
|
739 raising an exception at the first sign of trouble. The attribute
|
|
740 <tt>p.error</tt> is set if there are any validation errors. The
|
|
741 value of this attribute is also returned.
|
|
742 </blockquote>
|
|
743
|
|
744 <p>
|
|
745 <b><tt>p.signature()</tt></b>
|
|
746 <blockquote>
|
|
747 Compute a signature representing the contents of the collected parsing
|
|
748 data. The signature value should change if anything in the parser
|
|
749 specification has changed in a way that would justify parser table
|
|
750 regeneration. This method can be called after <tt>p.get_all()</tt>,
|
|
751 but before <tt>p.validate_all()</tt>.
|
|
752 </blockquote>
|
|
753
|
|
754 The following attributes are set in the process of collecting data:
|
|
755
|
|
756 <p>
|
|
757 <b><tt>p.start</tt></b>
|
|
758 <blockquote>
|
|
759 The grammar start symbol, if any. Taken from <tt>pdict['start']</tt>.
|
|
760 </blockquote>
|
|
761
|
|
762 <p>
|
|
763 <b><tt>p.error_func</tt></b>
|
|
764 <blockquote>
|
|
765 The error handling function or <tt>None</tt>. Taken from <tt>pdict['p_error']</tt>.
|
|
766 </blockquote>
|
|
767
|
|
768 <p>
|
|
769 <b><tt>p.tokens</tt></b>
|
|
770 <blockquote>
|
|
771 The token list. Taken from <tt>pdict['tokens']</tt>.
|
|
772 </blockquote>
|
|
773
|
|
774 <p>
|
|
775 <b><tt>p.prec</tt></b>
|
|
776 <blockquote>
|
|
777 The precedence specifier. Taken from <tt>pdict['precedence']</tt>.
|
|
778 </blockquote>
|
|
779
|
|
780 <p>
|
|
781 <b><tt>p.preclist</tt></b>
|
|
782 <blockquote>
|
|
783 A parsed version of the precedence specified. A list of tuples of the form
|
|
784 <tt>(token,assoc,level)</tt> where <tt>token</tt> is the terminal symbol,
|
|
785 <tt>assoc</tt> is the associativity (e.g., <tt>'left'</tt>) and <tt>level</tt>
|
|
786 is a numeric precedence level.
|
|
787 </blockquote>
|
|
788
|
|
789 <p>
|
|
790 <b><tt>p.grammar</tt></b>
|
|
791 <blockquote>
|
|
792 A list of tuples <tt>(name, rules)</tt> representing the grammar rules. <tt>name</tt> is the
|
|
793 name of a Python function or method in <tt>pdict</tt> that starts with <tt>"p_"</tt>.
|
|
794 <tt>rules</tt> is a list of tuples <tt>(filename,line,prodname,syms)</tt> representing
|
|
795 the grammar rules found in the documentation string of that function. <tt>filename</tt> and <tt>line</tt> contain location
|
|
796 information that can be used for debugging. <tt>prodname</tt> is the name of the
|
|
797 production. <tt>syms</tt> is the right-hand side of the production. If you have a
|
|
798 function like this
|
|
799
|
|
800 <pre>
|
|
801 def p_expr(p):
|
|
802 '''expr : expr PLUS expr
|
|
803 | expr MINUS expr
|
|
804 | expr TIMES expr
|
|
805 | expr DIVIDE expr'''
|
|
806 </pre>
|
|
807
|
|
808 then the corresponding entry in <tt>p.grammar</tt> might look like this:
|
|
809
|
|
810 <pre>
|
|
811 ('p_expr', [ ('calc.py',10,'expr', ['expr','PLUS','expr']),
|
|
812 ('calc.py',11,'expr', ['expr','MINUS','expr']),
|
|
813 ('calc.py',12,'expr', ['expr','TIMES','expr']),
|
|
814 ('calc.py',13,'expr', ['expr','DIVIDE','expr'])
|
|
815 ])
|
|
816 </pre>
|
|
817 </blockquote>
|
|
818
|
|
819 <p>
|
|
820 <b><tt>p.pfuncs</tt></b>
|
|
821 <blockquote>
|
|
822 A sorted list of tuples <tt>(line, file, name, doc)</tt> representing all of
|
|
823 the <tt>p_</tt> functions found. <tt>line</tt> and <tt>file</tt> give location
|
|
824 information. <tt>name</tt> is the name of the function. <tt>doc</tt> is the
|
|
825 documentation string. This list is sorted in ascending order by line number.
|
|
826 </blockquote>
|
|
827
|
|
828 <p>
|
|
829 <b><tt>p.files</tt></b>
|
|
830 <blockquote>
|
|
831 A dictionary holding all of the source filenames that were encountered
|
|
832 while collecting parser information. Only the keys of this dictionary have
|
|
833 any meaning.
|
|
834 </blockquote>
|
|
835
|
|
836 <p>
|
|
837 <b><tt>p.error</tt></b>
|
|
838 <blockquote>
|
|
839 An attribute that indicates whether or not any critical errors
|
|
840 occurred in validation. If this is set, it means that that some kind
|
|
841 of problem was detected and that no further processing should be
|
|
842 performed.
|
|
843 </blockquote>
|
|
844
|
|
845
|
|
846 <H2><a name="internal_nn9"></a>9. High-level operation</H2>
|
|
847
|
|
848
|
|
849 Using all of the above classes requires some attention to detail. The <tt>yacc()</tt>
|
|
850 function carries out a very specific sequence of operations to create a grammar.
|
|
851 This same sequence should be emulated if you build an alternative PLY interface.
|
|
852
|
|
853 <ol>
|
|
854 <li>A <tt>ParserReflect</tt> object is created and raw grammar specification data is
|
|
855 collected.
|
|
856 <li>A <tt>Grammar</tt> object is created and populated with information
|
|
857 from the specification data.
|
|
858 <li>A <tt>LRGenerator</tt> object is created to run the LALR algorithm over
|
|
859 the <tt>Grammar</tt> object.
|
|
860 <li>Productions in the LRGenerator and bound to callables using the <tt>bind_callables()</tt>
|
|
861 method.
|
|
862 <li>A <tt>LRParser</tt> object is created from from the information in the
|
|
863 <tt>LRGenerator</tt> object.
|
|
864 </ol>
|
|
865
|
|
866 </body>
|
|
867 </html>
|
|
868
|
|
869
|
|
870
|
|
871
|
|
872
|
|
873
|
|
874
|