Mercurial > repo
comparison ply-3.8/doc/internal.html @ 7267:343ff337a19b
<ais523> ` tar -xf ply-3.8.tar.gz
author | HackBot |
---|---|
date | Wed, 23 Mar 2016 02:40:16 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
7266:61a39a120dee | 7267:343ff337a19b |
---|---|
1 <html> | |
2 <head> | |
3 <title>PLY Internals</title> | |
4 </head> | |
5 <body bgcolor="#ffffff"> | |
6 | |
7 <h1>PLY Internals</h1> | |
8 | |
9 <b> | |
10 David M. Beazley <br> | |
11 dave@dabeaz.com<br> | |
12 </b> | |
13 | |
14 <p> | |
15 <b>PLY Version: 3.0</b> | |
16 <p> | |
17 | |
18 <!-- INDEX --> | |
19 <div class="sectiontoc"> | |
20 <ul> | |
21 <li><a href="#internal_nn1">Introduction</a> | |
22 <li><a href="#internal_nn2">Grammar Class</a> | |
23 <li><a href="#internal_nn3">Productions</a> | |
24 <li><a href="#internal_nn4">LRItems</a> | |
25 <li><a href="#internal_nn5">LRTable</a> | |
26 <li><a href="#internal_nn6">LRGeneratedTable</a> | |
27 <li><a href="#internal_nn7">LRParser</a> | |
28 <li><a href="#internal_nn8">ParserReflect</a> | |
29 <li><a href="#internal_nn9">High-level operation</a> | |
30 </ul> | |
31 </div> | |
32 <!-- INDEX --> | |
33 | |
34 | |
35 <H2><a name="internal_nn1"></a>1. Introduction</H2> | |
36 | |
37 | |
38 This document describes classes and functions that make up the internal | |
39 operation of PLY. Using this programming interface, it is possible to | |
40 manually build an parser using a different interface specification | |
41 than what PLY normally uses. For example, you could build a gramar | |
42 from information parsed in a completely different input format. Some of | |
43 these objects may be useful for building more advanced parsing engines | |
44 such as GLR. | |
45 | |
46 <p> | |
47 It should be stressed that using PLY at this level is not for the | |
48 faint of heart. Generally, it's assumed that you know a bit of | |
49 the underlying compiler theory and how an LR parser is put together. | |
50 | |
51 <H2><a name="internal_nn2"></a>2. Grammar Class</H2> | |
52 | |
53 | |
54 The file <tt>ply.yacc</tt> defines a class <tt>Grammar</tt> that | |
55 is used to hold and manipulate information about a grammar | |
56 specification. It encapsulates the same basic information | |
57 about a grammar that is put into a YACC file including | |
58 the list of tokens, precedence rules, and grammar rules. | |
59 Various operations are provided to perform different validations | |
60 on the grammar. In addition, there are operations to compute | |
61 the first and follow sets that are needed by the various table | |
62 generation algorithms. | |
63 | |
64 <p> | |
65 <tt><b>Grammar(terminals)</b></tt> | |
66 | |
67 <blockquote> | |
68 Creates a new grammar object. <tt>terminals</tt> is a list of strings | |
69 specifying the terminals for the grammar. An instance <tt>g</tt> of | |
70 <tt>Grammar</tt> has the following methods: | |
71 </blockquote> | |
72 | |
73 <p> | |
74 <b><tt>g.set_precedence(term,assoc,level)</tt></b> | |
75 <blockquote> | |
76 Sets the precedence level and associativity for a given terminal <tt>term</tt>. | |
77 <tt>assoc</tt> is one of <tt>'right'</tt>, | |
78 <tt>'left'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is a positive integer. The higher | |
79 the value of <tt>level</tt>, the higher the precedence. Here is an example of typical | |
80 precedence settings: | |
81 | |
82 <pre> | |
83 g.set_precedence('PLUS', 'left',1) | |
84 g.set_precedence('MINUS', 'left',1) | |
85 g.set_precedence('TIMES', 'left',2) | |
86 g.set_precedence('DIVIDE','left',2) | |
87 g.set_precedence('UMINUS','left',3) | |
88 </pre> | |
89 | |
90 This method must be called prior to adding any productions to the | |
91 grammar with <tt>g.add_production()</tt>. The precedence of individual grammar | |
92 rules is determined by the precedence of the right-most terminal. | |
93 | |
94 </blockquote> | |
95 <p> | |
96 <b><tt>g.add_production(name,syms,func=None,file='',line=0)</tt></b> | |
97 <blockquote> | |
98 Adds a new grammar rule. <tt>name</tt> is the name of the rule, | |
99 <tt>syms</tt> is a list of symbols making up the right hand | |
100 side of the rule, <tt>func</tt> is the function to call when | |
101 reducing the rule. <tt>file</tt> and <tt>line</tt> specify | |
102 the filename and line number of the rule and are used for | |
103 generating error messages. | |
104 | |
105 <p> | |
106 The list of symbols in <tt>syms</tt> may include character | |
107 literals and <tt>%prec</tt> specifiers. Here are some | |
108 examples: | |
109 | |
110 <pre> | |
111 g.add_production('expr',['expr','PLUS','term'],func,file,line) | |
112 g.add_production('expr',['expr','"+"','term'],func,file,line) | |
113 g.add_production('expr',['MINUS','expr','%prec','UMINUS'],func,file,line) | |
114 </pre> | |
115 | |
116 <p> | |
117 If any kind of error is detected, a <tt>GrammarError</tt> exception | |
118 is raised with a message indicating the reason for the failure. | |
119 </blockquote> | |
120 | |
121 <p> | |
122 <b><tt>g.set_start(start=None)</tt></b> | |
123 <blockquote> | |
124 Sets the starting rule for the grammar. <tt>start</tt> is a string | |
125 specifying the name of the start rule. If <tt>start</tt> is omitted, | |
126 the first grammar rule added with <tt>add_production()</tt> is taken to be | |
127 the starting rule. This method must always be called after all | |
128 productions have been added. | |
129 </blockquote> | |
130 | |
131 <p> | |
132 <b><tt>g.find_unreachable()</tt></b> | |
133 <blockquote> | |
134 Diagnostic function. Returns a list of all unreachable non-terminals | |
135 defined in the grammar. This is used to identify inactive parts of | |
136 the grammar specification. | |
137 </blockquote> | |
138 | |
139 <p> | |
140 <b><tt>g.infinite_cycle()</tt></b> | |
141 <blockquote> | |
142 Diagnostic function. Returns a list of all non-terminals in the | |
143 grammar that result in an infinite cycle. This condition occurs if | |
144 there is no way for a grammar rule to expand to a string containing | |
145 only terminal symbols. | |
146 </blockquote> | |
147 | |
148 <p> | |
149 <b><tt>g.undefined_symbols()</tt></b> | |
150 <blockquote> | |
151 Diagnostic function. Returns a list of tuples <tt>(name, prod)</tt> | |
152 corresponding to undefined symbols in the grammar. <tt>name</tt> is the | |
153 name of the undefined symbol and <tt>prod</tt> is an instance of | |
154 <tt>Production</tt> which has information about the production rule | |
155 where the undefined symbol was used. | |
156 </blockquote> | |
157 | |
158 <p> | |
159 <b><tt>g.unused_terminals()</tt></b> | |
160 <blockquote> | |
161 Diagnostic function. Returns a list of terminals that were defined, | |
162 but never used in the grammar. | |
163 </blockquote> | |
164 | |
165 <p> | |
166 <b><tt>g.unused_rules()</tt></b> | |
167 <blockquote> | |
168 Diagnostic function. Returns a list of <tt>Production</tt> instances | |
169 corresponding to production rules that were defined in the grammar, | |
170 but never used anywhere. This is slightly different | |
171 than <tt>find_unreachable()</tt>. | |
172 </blockquote> | |
173 | |
174 <p> | |
175 <b><tt>g.unused_precedence()</tt></b> | |
176 <blockquote> | |
177 Diagnostic function. Returns a list of tuples <tt>(term, assoc)</tt> | |
178 corresponding to precedence rules that were set, but never used the | |
179 grammar. <tt>term</tt> is the terminal name and <tt>assoc</tt> is the | |
180 precedence associativity (e.g., <tt>'left'</tt>, <tt>'right'</tt>, | |
181 or <tt>'nonassoc'</tt>. | |
182 </blockquote> | |
183 | |
184 <p> | |
185 <b><tt>g.compute_first()</tt></b> | |
186 <blockquote> | |
187 Compute all of the first sets for all symbols in the grammar. Returns a dictionary | |
188 mapping symbol names to a list of all first symbols. | |
189 </blockquote> | |
190 | |
191 <p> | |
192 <b><tt>g.compute_follow()</tt></b> | |
193 <blockquote> | |
194 Compute all of the follow sets for all non-terminals in the grammar. | |
195 The follow set is the set of all possible symbols that might follow a | |
196 given non-terminal. Returns a dictionary mapping non-terminal names | |
197 to a list of symbols. | |
198 </blockquote> | |
199 | |
200 <p> | |
201 <b><tt>g.build_lritems()</tt></b> | |
202 <blockquote> | |
203 Calculates all of the LR items for all productions in the grammar. This | |
204 step is required before using the grammar for any kind of table generation. | |
205 See the section on LR items below. | |
206 </blockquote> | |
207 | |
208 <p> | |
209 The following attributes are set by the above methods and may be useful | |
210 in code that works with the grammar. All of these attributes should be | |
211 assumed to be read-only. Changing their values directly will likely | |
212 break the grammar. | |
213 | |
214 <p> | |
215 <b><tt>g.Productions</tt></b> | |
216 <blockquote> | |
217 A list of all productions added. The first entry is reserved for | |
218 a production representing the starting rule. The objects in this list | |
219 are instances of the <tt>Production</tt> class, described shortly. | |
220 </blockquote> | |
221 | |
222 <p> | |
223 <b><tt>g.Prodnames</tt></b> | |
224 <blockquote> | |
225 A dictionary mapping the names of nonterminals to a list of all | |
226 productions of that nonterminal. | |
227 </blockquote> | |
228 | |
229 <p> | |
230 <b><tt>g.Terminals</tt></b> | |
231 <blockquote> | |
232 A dictionary mapping the names of terminals to a list of the | |
233 production numbers where they are used. | |
234 </blockquote> | |
235 | |
236 <p> | |
237 <b><tt>g.Nonterminals</tt></b> | |
238 <blockquote> | |
239 A dictionary mapping the names of nonterminals to a list of the | |
240 production numbers where they are used. | |
241 </blockquote> | |
242 | |
243 <p> | |
244 <b><tt>g.First</tt></b> | |
245 <blockquote> | |
246 A dictionary representing the first sets for all grammar symbols. This is | |
247 computed and returned by the <tt>compute_first()</tt> method. | |
248 </blockquote> | |
249 | |
250 <p> | |
251 <b><tt>g.Follow</tt></b> | |
252 <blockquote> | |
253 A dictionary representing the follow sets for all grammar rules. This is | |
254 computed and returned by the <tt>compute_follow()</tt> method. | |
255 </blockquote> | |
256 | |
257 <p> | |
258 <b><tt>g.Start</tt></b> | |
259 <blockquote> | |
260 Starting symbol for the grammar. Set by the <tt>set_start()</tt> method. | |
261 </blockquote> | |
262 | |
263 For the purposes of debugging, a <tt>Grammar</tt> object supports the <tt>__len__()</tt> and | |
264 <tt>__getitem__()</tt> special methods. Accessing <tt>g[n]</tt> returns the nth production | |
265 from the grammar. | |
266 | |
267 | |
268 <H2><a name="internal_nn3"></a>3. Productions</H2> | |
269 | |
270 | |
271 <tt>Grammar</tt> objects store grammar rules as instances of a <tt>Production</tt> class. This | |
272 class has no public constructor--you should only create productions by calling <tt>Grammar.add_production()</tt>. | |
273 The following attributes are available on a <tt>Production</tt> instance <tt>p</tt>. | |
274 | |
275 <p> | |
276 <b><tt>p.name</tt></b> | |
277 <blockquote> | |
278 The name of the production. For a grammar rule such as <tt>A : B C D</tt>, this is <tt>'A'</tt>. | |
279 </blockquote> | |
280 | |
281 <p> | |
282 <b><tt>p.prod</tt></b> | |
283 <blockquote> | |
284 A tuple of symbols making up the right-hand side of the production. For a grammar rule such as <tt>A : B C D</tt>, this is <tt>('B','C','D')</tt>. | |
285 </blockquote> | |
286 | |
287 <p> | |
288 <b><tt>p.number</tt></b> | |
289 <blockquote> | |
290 Production number. An integer containing the index of the production in the grammar's <tt>Productions</tt> list. | |
291 </blockquote> | |
292 | |
293 <p> | |
294 <b><tt>p.func</tt></b> | |
295 <blockquote> | |
296 The name of the reduction function associated with the production. | |
297 This is the function that will execute when reducing the entire | |
298 grammar rule during parsing. | |
299 </blockquote> | |
300 | |
301 <p> | |
302 <b><tt>p.callable</tt></b> | |
303 <blockquote> | |
304 The callable object associated with the name in <tt>p.func</tt>. This is <tt>None</tt> | |
305 unless the production has been bound using <tt>bind()</tt>. | |
306 </blockquote> | |
307 | |
308 <p> | |
309 <b><tt>p.file</tt></b> | |
310 <blockquote> | |
311 Filename associated with the production. Typically this is the file where the production was defined. Used for error messages. | |
312 </blockquote> | |
313 | |
314 <p> | |
315 <b><tt>p.lineno</tt></b> | |
316 <blockquote> | |
317 Line number associated with the production. Typically this is the line number in <tt>p.file</tt> where the production was defined. Used for error messages. | |
318 </blockquote> | |
319 | |
320 <p> | |
321 <b><tt>p.prec</tt></b> | |
322 <blockquote> | |
323 Precedence and associativity associated with the production. This is a tuple <tt>(assoc,level)</tt> where | |
324 <tt>assoc</tt> is one of <tt>'left'</tt>,<tt>'right'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is | |
325 an integer. This value is determined by the precedence of the right-most terminal symbol in the production | |
326 or by use of the <tt>%prec</tt> specifier when adding the production. | |
327 </blockquote> | |
328 | |
329 <p> | |
330 <b><tt>p.usyms</tt></b> | |
331 <blockquote> | |
332 A list of all unique symbols found in the production. | |
333 </blockquote> | |
334 | |
335 <p> | |
336 <b><tt>p.lr_items</tt></b> | |
337 <blockquote> | |
338 A list of all LR items for this production. This attribute only has a meaningful value if the | |
339 <tt>Grammar.build_lritems()</tt> method has been called. The items in this list are | |
340 instances of <tt>LRItem</tt> described below. | |
341 </blockquote> | |
342 | |
343 <p> | |
344 <b><tt>p.lr_next</tt></b> | |
345 <blockquote> | |
346 The head of a linked-list representation of the LR items in <tt>p.lr_items</tt>. | |
347 This attribute only has a meaningful value if the <tt>Grammar.build_lritems()</tt> | |
348 method has been called. Each <tt>LRItem</tt> instance has a <tt>lr_next</tt> attribute | |
349 to move to the next item. The list is terminated by <tt>None</tt>. | |
350 </blockquote> | |
351 | |
352 <p> | |
353 <b><tt>p.bind(dict)</tt></b> | |
354 <blockquote> | |
355 Binds the production function name in <tt>p.func</tt> to a callable object in | |
356 <tt>dict</tt>. This operation is typically carried out in the last step | |
357 prior to running the parsing engine and is needed since parsing tables are typically | |
358 read from files which only include the function names, not the functions themselves. | |
359 </blockquote> | |
360 | |
361 <P> | |
362 <tt>Production</tt> objects support | |
363 the <tt>__len__()</tt>, <tt>__getitem__()</tt>, and <tt>__str__()</tt> | |
364 special methods. | |
365 <tt>len(p)</tt> returns the number of symbols in <tt>p.prod</tt> | |
366 and <tt>p[n]</tt> is the same as <tt>p.prod[n]</tt>. | |
367 | |
368 <H2><a name="internal_nn4"></a>4. LRItems</H2> | |
369 | |
370 | |
371 The construction of parsing tables in an LR-based parser generator is primarily | |
372 done over a set of "LR Items". An LR item represents a stage of parsing one | |
373 of the grammar rules. To compute the LR items, it is first necessary to | |
374 call <tt>Grammar.build_lritems()</tt>. Once this step, all of the productions | |
375 in the grammar will have their LR items attached to them. | |
376 | |
377 <p> | |
378 Here is an interactive example that shows what LR items look like if you | |
379 interactively experiment. In this example, <tt>g</tt> is a <tt>Grammar</tt> | |
380 object. | |
381 | |
382 <blockquote> | |
383 <pre> | |
384 >>> <b>g.build_lritems()</b> | |
385 >>> <b>p = g[1]</b> | |
386 >>> <b>p</b> | |
387 Production(statement -> ID = expr) | |
388 >>> | |
389 </pre> | |
390 </blockquote> | |
391 | |
392 In the above code, <tt>p</tt> represents the first grammar rule. In | |
393 this case, a rule <tt>'statement -> ID = expr'</tt>. | |
394 | |
395 <p> | |
396 Now, let's look at the LR items for <tt>p</tt>. | |
397 | |
398 <blockquote> | |
399 <pre> | |
400 >>> <b>p.lr_items</b> | |
401 [LRItem(statement -> . ID = expr), | |
402 LRItem(statement -> ID . = expr), | |
403 LRItem(statement -> ID = . expr), | |
404 LRItem(statement -> ID = expr .)] | |
405 >>> | |
406 </pre> | |
407 </blockquote> | |
408 | |
409 In each LR item, the dot (.) represents a specific stage of parsing. In each LR item, the dot | |
410 is advanced by one symbol. It is only when the dot reaches the very end that a production | |
411 is successfully parsed. | |
412 | |
413 <p> | |
414 An instance <tt>lr</tt> of <tt>LRItem</tt> has the following | |
415 attributes that hold information related to that specific stage of | |
416 parsing. | |
417 | |
418 <p> | |
419 <b><tt>lr.name</tt></b> | |
420 <blockquote> | |
421 The name of the grammar rule. For example, <tt>'statement'</tt> in the above example. | |
422 </blockquote> | |
423 | |
424 <p> | |
425 <b><tt>lr.prod</tt></b> | |
426 <blockquote> | |
427 A tuple of symbols representing the right-hand side of the production, including the | |
428 special <tt>'.'</tt> character. For example, <tt>('ID','.','=','expr')</tt>. | |
429 </blockquote> | |
430 | |
431 <p> | |
432 <b><tt>lr.number</tt></b> | |
433 <blockquote> | |
434 An integer representing the production number in the grammar. | |
435 </blockquote> | |
436 | |
437 <p> | |
438 <b><tt>lr.usyms</tt></b> | |
439 <blockquote> | |
440 A set of unique symbols in the production. Inherited from the original <tt>Production</tt> instance. | |
441 </blockquote> | |
442 | |
443 <p> | |
444 <b><tt>lr.lr_index</tt></b> | |
445 <blockquote> | |
446 An integer representing the position of the dot (.). You should never use <tt>lr.prod.index()</tt> | |
447 to search for it--the result will be wrong if the grammar happens to also use (.) as a character | |
448 literal. | |
449 </blockquote> | |
450 | |
451 <p> | |
452 <b><tt>lr.lr_after</tt></b> | |
453 <blockquote> | |
454 A list of all productions that can legally appear immediately to the right of the | |
455 dot (.). This list contains <tt>Production</tt> instances. This attribute | |
456 represents all of the possible branches a parse can take from the current position. | |
457 For example, suppose that <tt>lr</tt> represents a stage immediately before | |
458 an expression like this: | |
459 | |
460 <pre> | |
461 >>> <b>lr</b> | |
462 LRItem(statement -> ID = . expr) | |
463 >>> | |
464 </pre> | |
465 | |
466 Then, the value of <tt>lr.lr_after</tt> might look like this, showing all productions that | |
467 can legally appear next: | |
468 | |
469 <pre> | |
470 >>> <b>lr.lr_after</b> | |
471 [Production(expr -> expr PLUS expr), | |
472 Production(expr -> expr MINUS expr), | |
473 Production(expr -> expr TIMES expr), | |
474 Production(expr -> expr DIVIDE expr), | |
475 Production(expr -> MINUS expr), | |
476 Production(expr -> LPAREN expr RPAREN), | |
477 Production(expr -> NUMBER), | |
478 Production(expr -> ID)] | |
479 >>> | |
480 </pre> | |
481 | |
482 </blockquote> | |
483 | |
484 <p> | |
485 <b><tt>lr.lr_before</tt></b> | |
486 <blockquote> | |
487 The grammar symbol that appears immediately before the dot (.) or <tt>None</tt> if | |
488 at the beginning of the parse. | |
489 </blockquote> | |
490 | |
491 <p> | |
492 <b><tt>lr.lr_next</tt></b> | |
493 <blockquote> | |
494 A link to the next LR item, representing the next stage of the parse. <tt>None</tt> if <tt>lr</tt> | |
495 is the last LR item. | |
496 </blockquote> | |
497 | |
498 <tt>LRItem</tt> instances also support the <tt>__len__()</tt> and <tt>__getitem__()</tt> special methods. | |
499 <tt>len(lr)</tt> returns the number of items in <tt>lr.prod</tt> including the dot (.). <tt>lr[n]</tt> | |
500 returns <tt>lr.prod[n]</tt>. | |
501 | |
502 <p> | |
503 It goes without saying that all of the attributes associated with LR | |
504 items should be assumed to be read-only. Modifications will very | |
505 likely create a small black-hole that will consume you and your code. | |
506 | |
507 <H2><a name="internal_nn5"></a>5. LRTable</H2> | |
508 | |
509 | |
510 The <tt>LRTable</tt> class is used to represent LR parsing table data. This | |
511 minimally includes the production list, action table, and goto table. | |
512 | |
513 <p> | |
514 <b><tt>LRTable()</tt></b> | |
515 <blockquote> | |
516 Create an empty LRTable object. This object contains only the information needed to | |
517 run an LR parser. | |
518 </blockquote> | |
519 | |
520 An instance <tt>lrtab</tt> of <tt>LRTable</tt> has the following methods: | |
521 | |
522 <p> | |
523 <b><tt>lrtab.read_table(module)</tt></b> | |
524 <blockquote> | |
525 Populates the LR table with information from the module specified in <tt>module</tt>. | |
526 <tt>module</tt> is either a module object already loaded with <tt>import</tt> or | |
527 the name of a Python module. If it's a string containing a module name, it is | |
528 loaded and parsing data is extracted. Returns the signature value that was used | |
529 when initially writing the tables. Raises a <tt>VersionError</tt> exception if | |
530 the module was created using an incompatible version of PLY. | |
531 </blockquote> | |
532 | |
533 <p> | |
534 <b><tt>lrtab.bind_callables(dict)</tt></b> | |
535 <blockquote> | |
536 This binds all of the function names used in productions to callable objects | |
537 found in the dictionary <tt>dict</tt>. During table generation and when reading | |
538 LR tables from files, PLY only uses the names of action functions such as <tt>'p_expr'</tt>, | |
539 <tt>'p_statement'</tt>, etc. In order to actually run the parser, these names | |
540 have to be bound to callable objects. This method is always called prior to | |
541 running a parser. | |
542 </blockquote> | |
543 | |
544 After <tt>lrtab</tt> has been populated, the following attributes are defined. | |
545 | |
546 <p> | |
547 <b><tt>lrtab.lr_method</tt></b> | |
548 <blockquote> | |
549 The LR parsing method used (e.g., <tt>'LALR'</tt>) | |
550 </blockquote> | |
551 | |
552 | |
553 <p> | |
554 <b><tt>lrtab.lr_productions</tt></b> | |
555 <blockquote> | |
556 The production list. If the parsing tables have been newly | |
557 constructed, this will be a list of <tt>Production</tt> instances. If | |
558 the parsing tables have been read from a file, it's a list | |
559 of <tt>MiniProduction</tt> instances. This, together | |
560 with <tt>lr_action</tt> and <tt>lr_goto</tt> contain all of the | |
561 information needed by the LR parsing engine. | |
562 </blockquote> | |
563 | |
564 <p> | |
565 <b><tt>lrtab.lr_action</tt></b> | |
566 <blockquote> | |
567 The LR action dictionary that implements the underlying state machine. | |
568 The keys of this dictionary are the LR states. | |
569 </blockquote> | |
570 | |
571 <p> | |
572 <b><tt>lrtab.lr_goto</tt></b> | |
573 <blockquote> | |
574 The LR goto table that contains information about grammar rule reductions. | |
575 </blockquote> | |
576 | |
577 | |
578 <H2><a name="internal_nn6"></a>6. LRGeneratedTable</H2> | |
579 | |
580 | |
581 The <tt>LRGeneratedTable</tt> class represents constructed LR parsing tables on a | |
582 grammar. It is a subclass of <tt>LRTable</tt>. | |
583 | |
584 <p> | |
585 <b><tt>LRGeneratedTable(grammar, method='LALR',log=None)</tt></b> | |
586 <blockquote> | |
587 Create the LR parsing tables on a grammar. <tt>grammar</tt> is an instance of <tt>Grammar</tt>, | |
588 <tt>method</tt> is a string with the parsing method (<tt>'SLR'</tt> or <tt>'LALR'</tt>), and | |
589 <tt>log</tt> is a logger object used to write debugging information. The debugging information | |
590 written to <tt>log</tt> is the same as what appears in the <tt>parser.out</tt> file created | |
591 by yacc. By supplying a custom logger with a different message format, it is possible to get | |
592 more information (e.g., the line number in <tt>yacc.py</tt> used for issuing each line of | |
593 output in the log). The result is an instance of <tt>LRGeneratedTable</tt>. | |
594 </blockquote> | |
595 | |
596 <p> | |
597 An instance <tt>lr</tt> of <tt>LRGeneratedTable</tt> has the following attributes. | |
598 | |
599 <p> | |
600 <b><tt>lr.grammar</tt></b> | |
601 <blockquote> | |
602 A link to the Grammar object used to construct the parsing tables. | |
603 </blockquote> | |
604 | |
605 <p> | |
606 <b><tt>lr.lr_method</tt></b> | |
607 <blockquote> | |
608 The LR parsing method used (e.g., <tt>'LALR'</tt>) | |
609 </blockquote> | |
610 | |
611 | |
612 <p> | |
613 <b><tt>lr.lr_productions</tt></b> | |
614 <blockquote> | |
615 A reference to <tt>grammar.Productions</tt>. This, together with <tt>lr_action</tt> and <tt>lr_goto</tt> | |
616 contain all of the information needed by the LR parsing engine. | |
617 </blockquote> | |
618 | |
619 <p> | |
620 <b><tt>lr.lr_action</tt></b> | |
621 <blockquote> | |
622 The LR action dictionary that implements the underlying state machine. The keys of this dictionary are | |
623 the LR states. | |
624 </blockquote> | |
625 | |
626 <p> | |
627 <b><tt>lr.lr_goto</tt></b> | |
628 <blockquote> | |
629 The LR goto table that contains information about grammar rule reductions. | |
630 </blockquote> | |
631 | |
632 <p> | |
633 <b><tt>lr.sr_conflicts</tt></b> | |
634 <blockquote> | |
635 A list of tuples <tt>(state,token,resolution)</tt> identifying all shift/reduce conflicts. <tt>state</tt> is the LR state | |
636 number where the conflict occurred, <tt>token</tt> is the token causing the conflict, and <tt>resolution</tt> is | |
637 a string describing the resolution taken. <tt>resolution</tt> is either <tt>'shift'</tt> or <tt>'reduce'</tt>. | |
638 </blockquote> | |
639 | |
640 <p> | |
641 <b><tt>lr.rr_conflicts</tt></b> | |
642 <blockquote> | |
643 A list of tuples <tt>(state,rule,rejected)</tt> identifying all reduce/reduce conflicts. <tt>state</tt> is the | |
644 LR state number where the conflict occurred, <tt>rule</tt> is the production rule that was selected | |
645 and <tt>rejected</tt> is the production rule that was rejected. Both <tt>rule</tt> and </tt>rejected</tt> are | |
646 instances of <tt>Production</tt>. They can be inspected to provide the user with more information. | |
647 </blockquote> | |
648 | |
649 <p> | |
650 There are two public methods of <tt>LRGeneratedTable</tt>. | |
651 | |
652 <p> | |
653 <b><tt>lr.write_table(modulename,outputdir="",signature="")</tt></b> | |
654 <blockquote> | |
655 Writes the LR parsing table information to a Python module. <tt>modulename</tt> is a string | |
656 specifying the name of a module such as <tt>"parsetab"</tt>. <tt>outputdir</tt> is the name of a | |
657 directory where the module should be created. <tt>signature</tt> is a string representing a | |
658 grammar signature that's written into the output file. This can be used to detect when | |
659 the data stored in a module file is out-of-sync with the the grammar specification (and that | |
660 the tables need to be regenerated). If <tt>modulename</tt> is a string <tt>"parsetab"</tt>, | |
661 this function creates a file called <tt>parsetab.py</tt>. If the module name represents a | |
662 package such as <tt>"foo.bar.parsetab"</tt>, then only the last component, <tt>"parsetab"</tt> is | |
663 used. | |
664 </blockquote> | |
665 | |
666 | |
667 <H2><a name="internal_nn7"></a>7. LRParser</H2> | |
668 | |
669 | |
670 The <tt>LRParser</tt> class implements the low-level LR parsing engine. | |
671 | |
672 | |
673 <p> | |
674 <b><tt>LRParser(lrtab, error_func)</tt></b> | |
675 <blockquote> | |
676 Create an LRParser. <tt>lrtab</tt> is an instance of <tt>LRTable</tt> | |
677 containing the LR production and state tables. <tt>error_func</tt> is the | |
678 error function to invoke in the event of a parsing error. | |
679 </blockquote> | |
680 | |
681 An instance <tt>p</tt> of <tt>LRParser</tt> has the following methods: | |
682 | |
683 <p> | |
684 <b><tt>p.parse(input=None,lexer=None,debug=0,tracking=0,tokenfunc=None)</tt></b> | |
685 <blockquote> | |
686 Run the parser. <tt>input</tt> is a string, which if supplied is fed into the | |
687 lexer using its <tt>input()</tt> method. <tt>lexer</tt> is an instance of the | |
688 <tt>Lexer</tt> class to use for tokenizing. If not supplied, the last lexer | |
689 created with the <tt>lex</tt> module is used. <tt>debug</tt> is a boolean flag | |
690 that enables debugging. <tt>tracking</tt> is a boolean flag that tells the | |
691 parser to perform additional line number tracking. <tt>tokenfunc</tt> is a callable | |
692 function that returns the next token. If supplied, the parser will use it to get | |
693 all tokens. | |
694 </blockquote> | |
695 | |
696 <p> | |
697 <b><tt>p.restart()</tt></b> | |
698 <blockquote> | |
699 Resets the parser state for a parse already in progress. | |
700 </blockquote> | |
701 | |
702 <H2><a name="internal_nn8"></a>8. ParserReflect</H2> | |
703 | |
704 | |
705 <p> | |
706 The <tt>ParserReflect</tt> class is used to collect parser specification data | |
707 from a Python module or object. This class is what collects all of the | |
708 <tt>p_rule()</tt> functions in a PLY file, performs basic error checking, | |
709 and collects all of the needed information to build a grammar. Most of the | |
710 high-level PLY interface as used by the <tt>yacc()</tt> function is actually | |
711 implemented by this class. | |
712 | |
713 <p> | |
714 <b><tt>ParserReflect(pdict, log=None)</tt></b> | |
715 <blockquote> | |
716 Creates a <tt>ParserReflect</tt> instance. <tt>pdict</tt> is a dictionary | |
717 containing parser specification data. This dictionary typically corresponds | |
718 to the module or class dictionary of code that implements a PLY parser. | |
719 <tt>log</tt> is a logger instance that will be used to report error | |
720 messages. | |
721 </blockquote> | |
722 | |
723 An instance <tt>p</tt> of <tt>ParserReflect</tt> has the following methods: | |
724 | |
725 <p> | |
726 <b><tt>p.get_all()</tt></b> | |
727 <blockquote> | |
728 Collect and store all required parsing information. | |
729 </blockquote> | |
730 | |
731 <p> | |
732 <b><tt>p.validate_all()</tt></b> | |
733 <blockquote> | |
734 Validate all of the collected parsing information. This is a seprate step | |
735 from <tt>p.get_all()</tt> as a performance optimization. In order to | |
736 increase parser start-up time, a parser can elect to only validate the | |
737 parsing data when regenerating the parsing tables. The validation | |
738 step tries to collect as much information as possible rather than | |
739 raising an exception at the first sign of trouble. The attribute | |
740 <tt>p.error</tt> is set if there are any validation errors. The | |
741 value of this attribute is also returned. | |
742 </blockquote> | |
743 | |
744 <p> | |
745 <b><tt>p.signature()</tt></b> | |
746 <blockquote> | |
747 Compute a signature representing the contents of the collected parsing | |
748 data. The signature value should change if anything in the parser | |
749 specification has changed in a way that would justify parser table | |
750 regeneration. This method can be called after <tt>p.get_all()</tt>, | |
751 but before <tt>p.validate_all()</tt>. | |
752 </blockquote> | |
753 | |
754 The following attributes are set in the process of collecting data: | |
755 | |
756 <p> | |
757 <b><tt>p.start</tt></b> | |
758 <blockquote> | |
759 The grammar start symbol, if any. Taken from <tt>pdict['start']</tt>. | |
760 </blockquote> | |
761 | |
762 <p> | |
763 <b><tt>p.error_func</tt></b> | |
764 <blockquote> | |
765 The error handling function or <tt>None</tt>. Taken from <tt>pdict['p_error']</tt>. | |
766 </blockquote> | |
767 | |
768 <p> | |
769 <b><tt>p.tokens</tt></b> | |
770 <blockquote> | |
771 The token list. Taken from <tt>pdict['tokens']</tt>. | |
772 </blockquote> | |
773 | |
774 <p> | |
775 <b><tt>p.prec</tt></b> | |
776 <blockquote> | |
777 The precedence specifier. Taken from <tt>pdict['precedence']</tt>. | |
778 </blockquote> | |
779 | |
780 <p> | |
781 <b><tt>p.preclist</tt></b> | |
782 <blockquote> | |
783 A parsed version of the precedence specified. A list of tuples of the form | |
784 <tt>(token,assoc,level)</tt> where <tt>token</tt> is the terminal symbol, | |
785 <tt>assoc</tt> is the associativity (e.g., <tt>'left'</tt>) and <tt>level</tt> | |
786 is a numeric precedence level. | |
787 </blockquote> | |
788 | |
789 <p> | |
790 <b><tt>p.grammar</tt></b> | |
791 <blockquote> | |
792 A list of tuples <tt>(name, rules)</tt> representing the grammar rules. <tt>name</tt> is the | |
793 name of a Python function or method in <tt>pdict</tt> that starts with <tt>"p_"</tt>. | |
794 <tt>rules</tt> is a list of tuples <tt>(filename,line,prodname,syms)</tt> representing | |
795 the grammar rules found in the documentation string of that function. <tt>filename</tt> and <tt>line</tt> contain location | |
796 information that can be used for debugging. <tt>prodname</tt> is the name of the | |
797 production. <tt>syms</tt> is the right-hand side of the production. If you have a | |
798 function like this | |
799 | |
800 <pre> | |
801 def p_expr(p): | |
802 '''expr : expr PLUS expr | |
803 | expr MINUS expr | |
804 | expr TIMES expr | |
805 | expr DIVIDE expr''' | |
806 </pre> | |
807 | |
808 then the corresponding entry in <tt>p.grammar</tt> might look like this: | |
809 | |
810 <pre> | |
811 ('p_expr', [ ('calc.py',10,'expr', ['expr','PLUS','expr']), | |
812 ('calc.py',11,'expr', ['expr','MINUS','expr']), | |
813 ('calc.py',12,'expr', ['expr','TIMES','expr']), | |
814 ('calc.py',13,'expr', ['expr','DIVIDE','expr']) | |
815 ]) | |
816 </pre> | |
817 </blockquote> | |
818 | |
819 <p> | |
820 <b><tt>p.pfuncs</tt></b> | |
821 <blockquote> | |
822 A sorted list of tuples <tt>(line, file, name, doc)</tt> representing all of | |
823 the <tt>p_</tt> functions found. <tt>line</tt> and <tt>file</tt> give location | |
824 information. <tt>name</tt> is the name of the function. <tt>doc</tt> is the | |
825 documentation string. This list is sorted in ascending order by line number. | |
826 </blockquote> | |
827 | |
828 <p> | |
829 <b><tt>p.files</tt></b> | |
830 <blockquote> | |
831 A dictionary holding all of the source filenames that were encountered | |
832 while collecting parser information. Only the keys of this dictionary have | |
833 any meaning. | |
834 </blockquote> | |
835 | |
836 <p> | |
837 <b><tt>p.error</tt></b> | |
838 <blockquote> | |
839 An attribute that indicates whether or not any critical errors | |
840 occurred in validation. If this is set, it means that that some kind | |
841 of problem was detected and that no further processing should be | |
842 performed. | |
843 </blockquote> | |
844 | |
845 | |
846 <H2><a name="internal_nn9"></a>9. High-level operation</H2> | |
847 | |
848 | |
849 Using all of the above classes requires some attention to detail. The <tt>yacc()</tt> | |
850 function carries out a very specific sequence of operations to create a grammar. | |
851 This same sequence should be emulated if you build an alternative PLY interface. | |
852 | |
853 <ol> | |
854 <li>A <tt>ParserReflect</tt> object is created and raw grammar specification data is | |
855 collected. | |
856 <li>A <tt>Grammar</tt> object is created and populated with information | |
857 from the specification data. | |
858 <li>A <tt>LRGenerator</tt> object is created to run the LALR algorithm over | |
859 the <tt>Grammar</tt> object. | |
860 <li>Productions in the LRGenerator and bound to callables using the <tt>bind_callables()</tt> | |
861 method. | |
862 <li>A <tt>LRParser</tt> object is created from from the information in the | |
863 <tt>LRGenerator</tt> object. | |
864 </ol> | |
865 | |
866 </body> | |
867 </html> | |
868 | |
869 | |
870 | |
871 | |
872 | |
873 | |
874 |