7267
|
1 Version 3.8
|
|
2 ---------------------
|
|
3 10/02/15: beazley
|
|
4 Fixed issues related to Python 3.5. Patch contributed by Barry Warsaw.
|
|
5
|
|
6 Version 3.7
|
|
7 ---------------------
|
|
8 08/25/15: beazley
|
|
9 Fixed problems when reading table files from pickled data.
|
|
10
|
|
11 05/07/15: beazley
|
|
12 Fixed regression in handling of table modules if specified as module
|
|
13 objects. See https://github.com/dabeaz/ply/issues/63
|
|
14
|
|
15 Version 3.6
|
|
16 ---------------------
|
|
17 04/25/15: beazley
|
|
18 If PLY is unable to create the 'parser.out' or 'parsetab.py' files due
|
|
19 to permission issues, it now just issues a warning message and
|
|
20 continues to operate. This could happen if a module using PLY
|
|
21 is installed in a funny way where tables have to be regenerated, but
|
|
22 for whatever reason, the user doesn't have write permission on
|
|
23 the directory where PLY wants to put them.
|
|
24
|
|
25 04/24/15: beazley
|
|
26 Fixed some issues related to use of packages and table file
|
|
27 modules. Just to emphasize, PLY now generates its special
|
|
28 files such as 'parsetab.py' and 'lextab.py' in the *SAME*
|
|
29 directory as the source file that uses lex() and yacc().
|
|
30
|
|
31 If for some reason, you want to change the name of the table
|
|
32 module, use the tabmodule and lextab options:
|
|
33
|
|
34 lexer = lex.lex(lextab='spamlextab')
|
|
35 parser = yacc.yacc(tabmodule='spamparsetab')
|
|
36
|
|
37 If you specify a simple name as shown, the module will still be
|
|
38 created in the same directory as the file invoking lex() or yacc().
|
|
39 If you want the table files to be placed into a different package,
|
|
40 then give a fully qualified package name. For example:
|
|
41
|
|
42 lexer = lex.lex(lextab='pkgname.files.lextab')
|
|
43 parser = yacc.yacc(tabmodule='pkgname.files.parsetab')
|
|
44
|
|
45 For this to work, 'pkgname.files' must already exist as a valid
|
|
46 Python package (i.e., the directories must already exist and be
|
|
47 set up with the proper __init__.py files, etc.).
|
|
48
|
|
49 Version 3.5
|
|
50 ---------------------
|
|
51 04/21/15: beazley
|
|
52 Added support for defaulted_states in the parser. A
|
|
53 defaulted_state is a state where the only legal action is a
|
|
54 reduction of a single grammar rule across all valid input
|
|
55 tokens. For such states, the rule is reduced and the
|
|
56 reading of the next lookahead token is delayed until it is
|
|
57 actually needed at a later point in time.
|
|
58
|
|
59 This delay in consuming the next lookahead token is a
|
|
60 potentially important feature in advanced parsing
|
|
61 applications that require tight interaction between the
|
|
62 lexer and the parser. For example, a grammar rule change
|
|
63 modify the lexer state upon reduction and have such changes
|
|
64 take effect before the next input token is read.
|
|
65
|
|
66 *** POTENTIAL INCOMPATIBILITY ***
|
|
67 One potential danger of defaulted_states is that syntax
|
|
68 errors might be deferred to a a later point of processing
|
|
69 than where they were detected in past versions of PLY.
|
|
70 Thus, it's possible that your error handling could change
|
|
71 slightly on the same inputs. defaulted_states do not change
|
|
72 the overall parsing of the input (i.e., the same grammar is
|
|
73 accepted).
|
|
74
|
|
75 If for some reason, you need to disable defaulted states,
|
|
76 you can do this:
|
|
77
|
|
78 parser = yacc.yacc()
|
|
79 parser.defaulted_states = {}
|
|
80
|
|
81 04/21/15: beazley
|
|
82 Fixed debug logging in the parser. It wasn't properly reporting goto states
|
|
83 on grammar rule reductions.
|
|
84
|
|
85 04/20/15: beazley
|
|
86 Added actions to be defined to character literals (Issue #32). For example:
|
|
87
|
|
88 literals = [ '{', '}' ]
|
|
89
|
|
90 def t_lbrace(t):
|
|
91 r'\{'
|
|
92 # Some action
|
|
93 t.type = '{'
|
|
94 return t
|
|
95
|
|
96 def t_rbrace(t):
|
|
97 r'\}'
|
|
98 # Some action
|
|
99 t.type = '}'
|
|
100 return t
|
|
101
|
|
102 04/19/15: beazley
|
|
103 Import of the 'parsetab.py' file is now constrained to only consider the
|
|
104 directory specified by the outputdir argument to yacc(). If not supplied,
|
|
105 the import will only consider the directory in which the grammar is defined.
|
|
106 This should greatly reduce problems with the wrong parsetab.py file being
|
|
107 imported by mistake. For example, if it's found somewhere else on the path
|
|
108 by accident.
|
|
109
|
|
110 *** POTENTIAL INCOMPATIBILITY *** It's possible that this might break some
|
|
111 packaging/deployment setup if PLY was instructed to place its parsetab.py
|
|
112 in a different location. You'll have to specify a proper outputdir= argument
|
|
113 to yacc() to fix this if needed.
|
|
114
|
|
115 04/19/15: beazley
|
|
116 Changed default output directory to be the same as that in which the
|
|
117 yacc grammar is defined. If your grammar is in a file 'calc.py',
|
|
118 then the parsetab.py and parser.out files should be generated in the
|
|
119 same directory as that file. The destination directory can be changed
|
|
120 using the outputdir= argument to yacc().
|
|
121
|
|
122 04/19/15: beazley
|
|
123 Changed the parsetab.py file signature slightly so that the parsetab won't
|
|
124 regenerate if created on a different major version of Python (ie., a
|
|
125 parsetab created on Python 2 will work with Python 3).
|
|
126
|
|
127 04/16/15: beazley
|
|
128 Fixed Issue #44 call_errorfunc() should return the result of errorfunc()
|
|
129
|
|
130 04/16/15: beazley
|
|
131 Support for versions of Python <2.7 is officially dropped. PLY may work, but
|
|
132 the unit tests requires Python 2.7 or newer.
|
|
133
|
|
134 04/16/15: beazley
|
|
135 Fixed bug related to calling yacc(start=...). PLY wasn't regenerating the
|
|
136 table file correctly for this case.
|
|
137
|
|
138 04/16/15: beazley
|
|
139 Added skipped tests for PyPy and Java. Related to use of Python's -O option.
|
|
140
|
|
141 05/29/13: beazley
|
|
142 Added filter to make unit tests pass under 'python -3'.
|
|
143 Reported by Neil Muller.
|
|
144
|
|
145 05/29/13: beazley
|
|
146 Fixed CPP_INTEGER regex in ply/cpp.py (Issue 21).
|
|
147 Reported by @vbraun.
|
|
148
|
|
149 05/29/13: beazley
|
|
150 Fixed yacc validation bugs when from __future__ import unicode_literals
|
|
151 is being used. Reported by Kenn Knowles.
|
|
152
|
|
153 05/29/13: beazley
|
|
154 Added support for Travis-CI. Contributed by Kenn Knowles.
|
|
155
|
|
156 05/29/13: beazley
|
|
157 Added a .gitignore file. Suggested by Kenn Knowles.
|
|
158
|
|
159 05/29/13: beazley
|
|
160 Fixed validation problems for source files that include a
|
|
161 different source code encoding specifier. Fix relies on
|
|
162 the inspect module. Should work on Python 2.6 and newer.
|
|
163 Not sure about older versions of Python.
|
|
164 Contributed by Michael Droettboom
|
|
165
|
|
166 05/21/13: beazley
|
|
167 Fixed unit tests for yacc to eliminate random failures due to dict hash value
|
|
168 randomization in Python 3.3
|
|
169 Reported by Arfrever
|
|
170
|
|
171 10/15/12: beazley
|
|
172 Fixed comment whitespace processing bugs in ply/cpp.py.
|
|
173 Reported by Alexei Pososin.
|
|
174
|
|
175 10/15/12: beazley
|
|
176 Fixed token names in ply/ctokens.py to match rule names.
|
|
177 Reported by Alexei Pososin.
|
|
178
|
|
179 04/26/12: beazley
|
|
180 Changes to functions available in panic mode error recover. In previous versions
|
|
181 of PLY, the following global functions were available for use in the p_error() rule:
|
|
182
|
|
183 yacc.errok() # Reset error state
|
|
184 yacc.token() # Get the next token
|
|
185 yacc.restart() # Reset the parsing stack
|
|
186
|
|
187 The use of global variables was problematic for code involving multiple parsers
|
|
188 and frankly was a poor design overall. These functions have been moved to methods
|
|
189 of the parser instance created by the yacc() function. You should write code like
|
|
190 this:
|
|
191
|
|
192 def p_error(p):
|
|
193 ...
|
|
194 parser.errok()
|
|
195
|
|
196 parser = yacc.yacc()
|
|
197
|
|
198 *** POTENTIAL INCOMPATIBILITY *** The original global functions now issue a
|
|
199 DeprecationWarning.
|
|
200
|
|
201 04/19/12: beazley
|
|
202 Fixed some problems with line and position tracking and the use of error
|
|
203 symbols. If you have a grammar rule involving an error rule like this:
|
|
204
|
|
205 def p_assignment_bad(p):
|
|
206 '''assignment : location EQUALS error SEMI'''
|
|
207 ...
|
|
208
|
|
209 You can now do line and position tracking on the error token. For example:
|
|
210
|
|
211 def p_assignment_bad(p):
|
|
212 '''assignment : location EQUALS error SEMI'''
|
|
213 start_line = p.lineno(3)
|
|
214 start_pos = p.lexpos(3)
|
|
215
|
|
216 If the trackng=True option is supplied to parse(), you can additionally get
|
|
217 spans:
|
|
218
|
|
219 def p_assignment_bad(p):
|
|
220 '''assignment : location EQUALS error SEMI'''
|
|
221 start_line, end_line = p.linespan(3)
|
|
222 start_pos, end_pos = p.lexspan(3)
|
|
223
|
|
224 Note that error handling is still a hairy thing in PLY. This won't work
|
|
225 unless your lexer is providing accurate information. Please report bugs.
|
|
226 Suggested by a bug reported by Davis Herring.
|
|
227
|
|
228 04/18/12: beazley
|
|
229 Change to doc string handling in lex module. Regex patterns are now first
|
|
230 pulled from a function's .regex attribute. If that doesn't exist, then
|
|
231 .doc is checked as a fallback. The @TOKEN decorator now sets the .regex
|
|
232 attribute of a function instead of its doc string.
|
|
233 Changed suggested by Kristoffer Ellersgaard Koch.
|
|
234
|
|
235 04/18/12: beazley
|
|
236 Fixed issue #1: Fixed _tabversion. It should use __tabversion__ instead of __version__
|
|
237 Reported by Daniele Tricoli
|
|
238
|
|
239 04/18/12: beazley
|
|
240 Fixed issue #8: Literals empty list causes IndexError
|
|
241 Reported by Walter Nissen.
|
|
242
|
|
243 04/18/12: beazley
|
|
244 Fixed issue #12: Typo in code snippet in documentation
|
|
245 Reported by florianschanda.
|
|
246
|
|
247 04/18/12: beazley
|
|
248 Fixed issue #10: Correctly escape t_XOREQUAL pattern.
|
|
249 Reported by Andy Kittner.
|
|
250
|
|
251 Version 3.4
|
|
252 ---------------------
|
|
253 02/17/11: beazley
|
|
254 Minor patch to make cpp.py compatible with Python 3. Note: This
|
|
255 is an experimental file not currently used by the rest of PLY.
|
|
256
|
|
257 02/17/11: beazley
|
|
258 Fixed setup.py trove classifiers to properly list PLY as
|
|
259 Python 3 compatible.
|
|
260
|
|
261 01/02/11: beazley
|
|
262 Migration of repository to github.
|
|
263
|
|
264 Version 3.3
|
|
265 -----------------------------
|
|
266 08/25/09: beazley
|
|
267 Fixed issue 15 related to the set_lineno() method in yacc. Reported by
|
|
268 mdsherry.
|
|
269
|
|
270 08/25/09: beazley
|
|
271 Fixed a bug related to regular expression compilation flags not being
|
|
272 properly stored in lextab.py files created by the lexer when running
|
|
273 in optimize mode. Reported by Bruce Frederiksen.
|
|
274
|
|
275
|
|
276 Version 3.2
|
|
277 -----------------------------
|
|
278 03/24/09: beazley
|
|
279 Added an extra check to not print duplicated warning messages
|
|
280 about reduce/reduce conflicts.
|
|
281
|
|
282 03/24/09: beazley
|
|
283 Switched PLY over to a BSD-license.
|
|
284
|
|
285 03/23/09: beazley
|
|
286 Performance optimization. Discovered a few places to make
|
|
287 speedups in LR table generation.
|
|
288
|
|
289 03/23/09: beazley
|
|
290 New warning message. PLY now warns about rules never
|
|
291 reduced due to reduce/reduce conflicts. Suggested by
|
|
292 Bruce Frederiksen.
|
|
293
|
|
294 03/23/09: beazley
|
|
295 Some clean-up of warning messages related to reduce/reduce errors.
|
|
296
|
|
297 03/23/09: beazley
|
|
298 Added a new picklefile option to yacc() to write the parsing
|
|
299 tables to a filename using the pickle module. Here is how
|
|
300 it works:
|
|
301
|
|
302 yacc(picklefile="parsetab.p")
|
|
303
|
|
304 This option can be used if the normal parsetab.py file is
|
|
305 extremely large. For example, on jython, it is impossible
|
|
306 to read parsing tables if the parsetab.py exceeds a certain
|
|
307 threshold.
|
|
308
|
|
309 The filename supplied to the picklefile option is opened
|
|
310 relative to the current working directory of the Python
|
|
311 interpreter. If you need to refer to the file elsewhere,
|
|
312 you will need to supply an absolute or relative path.
|
|
313
|
|
314 For maximum portability, the pickle file is written
|
|
315 using protocol 0.
|
|
316
|
|
317 03/13/09: beazley
|
|
318 Fixed a bug in parser.out generation where the rule numbers
|
|
319 where off by one.
|
|
320
|
|
321 03/13/09: beazley
|
|
322 Fixed a string formatting bug with one of the error messages.
|
|
323 Reported by Richard Reitmeyer
|
|
324
|
|
325 Version 3.1
|
|
326 -----------------------------
|
|
327 02/28/09: beazley
|
|
328 Fixed broken start argument to yacc(). PLY-3.0 broke this
|
|
329 feature by accident.
|
|
330
|
|
331 02/28/09: beazley
|
|
332 Fixed debugging output. yacc() no longer reports shift/reduce
|
|
333 or reduce/reduce conflicts if debugging is turned off. This
|
|
334 restores similar behavior in PLY-2.5. Reported by Andrew Waters.
|
|
335
|
|
336 Version 3.0
|
|
337 -----------------------------
|
|
338 02/03/09: beazley
|
|
339 Fixed missing lexer attribute on certain tokens when
|
|
340 invoking the parser p_error() function. Reported by
|
|
341 Bart Whiteley.
|
|
342
|
|
343 02/02/09: beazley
|
|
344 The lex() command now does all error-reporting and diagonistics
|
|
345 using the logging module interface. Pass in a Logger object
|
|
346 using the errorlog parameter to specify a different logger.
|
|
347
|
|
348 02/02/09: beazley
|
|
349 Refactored ply.lex to use a more object-oriented and organized
|
|
350 approach to collecting lexer information.
|
|
351
|
|
352 02/01/09: beazley
|
|
353 Removed the nowarn option from lex(). All output is controlled
|
|
354 by passing in a logger object. Just pass in a logger with a high
|
|
355 level setting to suppress output. This argument was never
|
|
356 documented to begin with so hopefully no one was relying upon it.
|
|
357
|
|
358 02/01/09: beazley
|
|
359 Discovered and removed a dead if-statement in the lexer. This
|
|
360 resulted in a 6-7% speedup in lexing when I tested it.
|
|
361
|
|
362 01/13/09: beazley
|
|
363 Minor change to the procedure for signalling a syntax error in a
|
|
364 production rule. A normal SyntaxError exception should be raised
|
|
365 instead of yacc.SyntaxError.
|
|
366
|
|
367 01/13/09: beazley
|
|
368 Added a new method p.set_lineno(n,lineno) that can be used to set the
|
|
369 line number of symbol n in grammar rules. This simplifies manual
|
|
370 tracking of line numbers.
|
|
371
|
|
372 01/11/09: beazley
|
|
373 Vastly improved debugging support for yacc.parse(). Instead of passing
|
|
374 debug as an integer, you can supply a Logging object (see the logging
|
|
375 module). Messages will be generated at the ERROR, INFO, and DEBUG
|
|
376 logging levels, each level providing progressively more information.
|
|
377 The debugging trace also shows states, grammar rule, values passed
|
|
378 into grammar rules, and the result of each reduction.
|
|
379
|
|
380 01/09/09: beazley
|
|
381 The yacc() command now does all error-reporting and diagnostics using
|
|
382 the interface of the logging module. Use the errorlog parameter to
|
|
383 specify a logging object for error messages. Use the debuglog parameter
|
|
384 to specify a logging object for the 'parser.out' output.
|
|
385
|
|
386 01/09/09: beazley
|
|
387 *HUGE* refactoring of the the ply.yacc() implementation. The high-level
|
|
388 user interface is backwards compatible, but the internals are completely
|
|
389 reorganized into classes. No more global variables. The internals
|
|
390 are also more extensible. For example, you can use the classes to
|
|
391 construct a LALR(1) parser in an entirely different manner than
|
|
392 what is currently the case. Documentation is forthcoming.
|
|
393
|
|
394 01/07/09: beazley
|
|
395 Various cleanup and refactoring of yacc internals.
|
|
396
|
|
397 01/06/09: beazley
|
|
398 Fixed a bug with precedence assignment. yacc was assigning the precedence
|
|
399 each rule based on the left-most token, when in fact, it should have been
|
|
400 using the right-most token. Reported by Bruce Frederiksen.
|
|
401
|
|
402 11/27/08: beazley
|
|
403 Numerous changes to support Python 3.0 including removal of deprecated
|
|
404 statements (e.g., has_key) and the additional of compatibility code
|
|
405 to emulate features from Python 2 that have been removed, but which
|
|
406 are needed. Fixed the unit testing suite to work with Python 3.0.
|
|
407 The code should be backwards compatible with Python 2.
|
|
408
|
|
409 11/26/08: beazley
|
|
410 Loosened the rules on what kind of objects can be passed in as the
|
|
411 "module" parameter to lex() and yacc(). Previously, you could only use
|
|
412 a module or an instance. Now, PLY just uses dir() to get a list of
|
|
413 symbols on whatever the object is without regard for its type.
|
|
414
|
|
415 11/26/08: beazley
|
|
416 Changed all except: statements to be compatible with Python2.x/3.x syntax.
|
|
417
|
|
418 11/26/08: beazley
|
|
419 Changed all raise Exception, value statements to raise Exception(value) for
|
|
420 forward compatibility.
|
|
421
|
|
422 11/26/08: beazley
|
|
423 Removed all print statements from lex and yacc, using sys.stdout and sys.stderr
|
|
424 directly. Preparation for Python 3.0 support.
|
|
425
|
|
426 11/04/08: beazley
|
|
427 Fixed a bug with referring to symbols on the the parsing stack using negative
|
|
428 indices.
|
|
429
|
|
430 05/29/08: beazley
|
|
431 Completely revamped the testing system to use the unittest module for everything.
|
|
432 Added additional tests to cover new errors/warnings.
|
|
433
|
|
434 Version 2.5
|
|
435 -----------------------------
|
|
436 05/28/08: beazley
|
|
437 Fixed a bug with writing lex-tables in optimized mode and start states.
|
|
438 Reported by Kevin Henry.
|
|
439
|
|
440 Version 2.4
|
|
441 -----------------------------
|
|
442 05/04/08: beazley
|
|
443 A version number is now embedded in the table file signature so that
|
|
444 yacc can more gracefully accomodate changes to the output format
|
|
445 in the future.
|
|
446
|
|
447 05/04/08: beazley
|
|
448 Removed undocumented .pushback() method on grammar productions. I'm
|
|
449 not sure this ever worked and can't recall ever using it. Might have
|
|
450 been an abandoned idea that never really got fleshed out. This
|
|
451 feature was never described or tested so removing it is hopefully
|
|
452 harmless.
|
|
453
|
|
454 05/04/08: beazley
|
|
455 Added extra error checking to yacc() to detect precedence rules defined
|
|
456 for undefined terminal symbols. This allows yacc() to detect a potential
|
|
457 problem that can be really tricky to debug if no warning message or error
|
|
458 message is generated about it.
|
|
459
|
|
460 05/04/08: beazley
|
|
461 lex() now has an outputdir that can specify the output directory for
|
|
462 tables when running in optimize mode. For example:
|
|
463
|
|
464 lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar")
|
|
465
|
|
466 The behavior of specifying a table module and output directory are
|
|
467 more aligned with the behavior of yacc().
|
|
468
|
|
469 05/04/08: beazley
|
|
470 [Issue 9]
|
|
471 Fixed filename bug in when specifying the modulename in lex() and yacc().
|
|
472 If you specified options such as the following:
|
|
473
|
|
474 parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar")
|
|
475
|
|
476 yacc would create a file "foo.bar.parsetab.py" in the given directory.
|
|
477 Now, it simply generates a file "parsetab.py" in that directory.
|
|
478 Bug reported by cptbinho.
|
|
479
|
|
480 05/04/08: beazley
|
|
481 Slight modification to lex() and yacc() to allow their table files
|
|
482 to be loaded from a previously loaded module. This might make
|
|
483 it easier to load the parsing tables from a complicated package
|
|
484 structure. For example:
|
|
485
|
|
486 import foo.bar.spam.parsetab as parsetab
|
|
487 parser = yacc.yacc(tabmodule=parsetab)
|
|
488
|
|
489 Note: lex and yacc will never regenerate the table file if used
|
|
490 in the form---you will get a warning message instead.
|
|
491 This idea suggested by Brian Clapper.
|
|
492
|
|
493
|
|
494 04/28/08: beazley
|
|
495 Fixed a big with p_error() functions being picked up correctly
|
|
496 when running in yacc(optimize=1) mode. Patch contributed by
|
|
497 Bart Whiteley.
|
|
498
|
|
499 02/28/08: beazley
|
|
500 Fixed a bug with 'nonassoc' precedence rules. Basically the
|
|
501 non-precedence was being ignored and not producing the correct
|
|
502 run-time behavior in the parser.
|
|
503
|
|
504 02/16/08: beazley
|
|
505 Slight relaxation of what the input() method to a lexer will
|
|
506 accept as a string. Instead of testing the input to see
|
|
507 if the input is a string or unicode string, it checks to see
|
|
508 if the input object looks like it contains string data.
|
|
509 This change makes it possible to pass string-like objects
|
|
510 in as input. For example, the object returned by mmap.
|
|
511
|
|
512 import mmap, os
|
|
513 data = mmap.mmap(os.open(filename,os.O_RDONLY),
|
|
514 os.path.getsize(filename),
|
|
515 access=mmap.ACCESS_READ)
|
|
516 lexer.input(data)
|
|
517
|
|
518
|
|
519 11/29/07: beazley
|
|
520 Modification of ply.lex to allow token functions to aliased.
|
|
521 This is subtle, but it makes it easier to create libraries and
|
|
522 to reuse token specifications. For example, suppose you defined
|
|
523 a function like this:
|
|
524
|
|
525 def number(t):
|
|
526 r'\d+'
|
|
527 t.value = int(t.value)
|
|
528 return t
|
|
529
|
|
530 This change would allow you to define a token rule as follows:
|
|
531
|
|
532 t_NUMBER = number
|
|
533
|
|
534 In this case, the token type will be set to 'NUMBER' and use
|
|
535 the associated number() function to process tokens.
|
|
536
|
|
537 11/28/07: beazley
|
|
538 Slight modification to lex and yacc to grab symbols from both
|
|
539 the local and global dictionaries of the caller. This
|
|
540 modification allows lexers and parsers to be defined using
|
|
541 inner functions and closures.
|
|
542
|
|
543 11/28/07: beazley
|
|
544 Performance optimization: The lexer.lexmatch and t.lexer
|
|
545 attributes are no longer set for lexer tokens that are not
|
|
546 defined by functions. The only normal use of these attributes
|
|
547 would be in lexer rules that need to perform some kind of
|
|
548 special processing. Thus, it doesn't make any sense to set
|
|
549 them on every token.
|
|
550
|
|
551 *** POTENTIAL INCOMPATIBILITY *** This might break code
|
|
552 that is mucking around with internal lexer state in some
|
|
553 sort of magical way.
|
|
554
|
|
555 11/27/07: beazley
|
|
556 Added the ability to put the parser into error-handling mode
|
|
557 from within a normal production. To do this, simply raise
|
|
558 a yacc.SyntaxError exception like this:
|
|
559
|
|
560 def p_some_production(p):
|
|
561 'some_production : prod1 prod2'
|
|
562 ...
|
|
563 raise yacc.SyntaxError # Signal an error
|
|
564
|
|
565 A number of things happen after this occurs:
|
|
566
|
|
567 - The last symbol shifted onto the symbol stack is discarded
|
|
568 and parser state backed up to what it was before the
|
|
569 the rule reduction.
|
|
570
|
|
571 - The current lookahead symbol is saved and replaced by
|
|
572 the 'error' symbol.
|
|
573
|
|
574 - The parser enters error recovery mode where it tries
|
|
575 to either reduce the 'error' rule or it starts
|
|
576 discarding items off of the stack until the parser
|
|
577 resets.
|
|
578
|
|
579 When an error is manually set, the parser does *not* call
|
|
580 the p_error() function (if any is defined).
|
|
581 *** NEW FEATURE *** Suggested on the mailing list
|
|
582
|
|
583 11/27/07: beazley
|
|
584 Fixed structure bug in examples/ansic. Reported by Dion Blazakis.
|
|
585
|
|
586 11/27/07: beazley
|
|
587 Fixed a bug in the lexer related to start conditions and ignored
|
|
588 token rules. If a rule was defined that changed state, but
|
|
589 returned no token, the lexer could be left in an inconsistent
|
|
590 state. Reported by
|
|
591
|
|
592 11/27/07: beazley
|
|
593 Modified setup.py to support Python Eggs. Patch contributed by
|
|
594 Simon Cross.
|
|
595
|
|
596 11/09/07: beazely
|
|
597 Fixed a bug in error handling in yacc. If a syntax error occurred and the
|
|
598 parser rolled the entire parse stack back, the parser would be left in in
|
|
599 inconsistent state that would cause it to trigger incorrect actions on
|
|
600 subsequent input. Reported by Ton Biegstraaten, Justin King, and others.
|
|
601
|
|
602 11/09/07: beazley
|
|
603 Fixed a bug when passing empty input strings to yacc.parse(). This
|
|
604 would result in an error message about "No input given". Reported
|
|
605 by Andrew Dalke.
|
|
606
|
|
607 Version 2.3
|
|
608 -----------------------------
|
|
609 02/20/07: beazley
|
|
610 Fixed a bug with character literals if the literal '.' appeared as the
|
|
611 last symbol of a grammar rule. Reported by Ales Smrcka.
|
|
612
|
|
613 02/19/07: beazley
|
|
614 Warning messages are now redirected to stderr instead of being printed
|
|
615 to standard output.
|
|
616
|
|
617 02/19/07: beazley
|
|
618 Added a warning message to lex.py if it detects a literal backslash
|
|
619 character inside the t_ignore declaration. This is to help
|
|
620 problems that might occur if someone accidentally defines t_ignore
|
|
621 as a Python raw string. For example:
|
|
622
|
|
623 t_ignore = r' \t'
|
|
624
|
|
625 The idea for this is from an email I received from David Cimimi who
|
|
626 reported bizarre behavior in lexing as a result of defining t_ignore
|
|
627 as a raw string by accident.
|
|
628
|
|
629 02/18/07: beazley
|
|
630 Performance improvements. Made some changes to the internal
|
|
631 table organization and LR parser to improve parsing performance.
|
|
632
|
|
633 02/18/07: beazley
|
|
634 Automatic tracking of line number and position information must now be
|
|
635 enabled by a special flag to parse(). For example:
|
|
636
|
|
637 yacc.parse(data,tracking=True)
|
|
638
|
|
639 In many applications, it's just not that important to have the
|
|
640 parser automatically track all line numbers. By making this an
|
|
641 optional feature, it allows the parser to run significantly faster
|
|
642 (more than a 20% speed increase in many cases). Note: positional
|
|
643 information is always available for raw tokens---this change only
|
|
644 applies to positional information associated with nonterminal
|
|
645 grammar symbols.
|
|
646 *** POTENTIAL INCOMPATIBILITY ***
|
|
647
|
|
648 02/18/07: beazley
|
|
649 Yacc no longer supports extended slices of grammar productions.
|
|
650 However, it does support regular slices. For example:
|
|
651
|
|
652 def p_foo(p):
|
|
653 '''foo: a b c d e'''
|
|
654 p[0] = p[1:3]
|
|
655
|
|
656 This change is a performance improvement to the parser--it streamlines
|
|
657 normal access to the grammar values since slices are now handled in
|
|
658 a __getslice__() method as opposed to __getitem__().
|
|
659
|
|
660 02/12/07: beazley
|
|
661 Fixed a bug in the handling of token names when combined with
|
|
662 start conditions. Bug reported by Todd O'Bryan.
|
|
663
|
|
664 Version 2.2
|
|
665 ------------------------------
|
|
666 11/01/06: beazley
|
|
667 Added lexpos() and lexspan() methods to grammar symbols. These
|
|
668 mirror the same functionality of lineno() and linespan(). For
|
|
669 example:
|
|
670
|
|
671 def p_expr(p):
|
|
672 'expr : expr PLUS expr'
|
|
673 p.lexpos(1) # Lexing position of left-hand-expression
|
|
674 p.lexpos(1) # Lexing position of PLUS
|
|
675 start,end = p.lexspan(3) # Lexing range of right hand expression
|
|
676
|
|
677 11/01/06: beazley
|
|
678 Minor change to error handling. The recommended way to skip characters
|
|
679 in the input is to use t.lexer.skip() as shown here:
|
|
680
|
|
681 def t_error(t):
|
|
682 print "Illegal character '%s'" % t.value[0]
|
|
683 t.lexer.skip(1)
|
|
684
|
|
685 The old approach of just using t.skip(1) will still work, but won't
|
|
686 be documented.
|
|
687
|
|
688 10/31/06: beazley
|
|
689 Discarded tokens can now be specified as simple strings instead of
|
|
690 functions. To do this, simply include the text "ignore_" in the
|
|
691 token declaration. For example:
|
|
692
|
|
693 t_ignore_cppcomment = r'//.*'
|
|
694
|
|
695 Previously, this had to be done with a function. For example:
|
|
696
|
|
697 def t_ignore_cppcomment(t):
|
|
698 r'//.*'
|
|
699 pass
|
|
700
|
|
701 If start conditions/states are being used, state names should appear
|
|
702 before the "ignore_" text.
|
|
703
|
|
704 10/19/06: beazley
|
|
705 The Lex module now provides support for flex-style start conditions
|
|
706 as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
|
|
707 Please refer to this document to understand this change note. Refer to
|
|
708 the PLY documentation for PLY-specific explanation of how this works.
|
|
709
|
|
710 To use start conditions, you first need to declare a set of states in
|
|
711 your lexer file:
|
|
712
|
|
713 states = (
|
|
714 ('foo','exclusive'),
|
|
715 ('bar','inclusive')
|
|
716 )
|
|
717
|
|
718 This serves the same role as the %s and %x specifiers in flex.
|
|
719
|
|
720 One a state has been declared, tokens for that state can be
|
|
721 declared by defining rules of the form t_state_TOK. For example:
|
|
722
|
|
723 t_PLUS = '\+' # Rule defined in INITIAL state
|
|
724 t_foo_NUM = '\d+' # Rule defined in foo state
|
|
725 t_bar_NUM = '\d+' # Rule defined in bar state
|
|
726
|
|
727 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar
|
|
728 t_ANY_NUM = '\d+' # Rule defined in all states
|
|
729
|
|
730 In addition to defining tokens for each state, the t_ignore and t_error
|
|
731 specifications can be customized for specific states. For example:
|
|
732
|
|
733 t_foo_ignore = " " # Ignored characters for foo state
|
|
734 def t_bar_error(t):
|
|
735 # Handle errors in bar state
|
|
736
|
|
737 With token rules, the following methods can be used to change states
|
|
738
|
|
739 def t_TOKNAME(t):
|
|
740 t.lexer.begin('foo') # Begin state 'foo'
|
|
741 t.lexer.push_state('foo') # Begin state 'foo', push old state
|
|
742 # onto a stack
|
|
743 t.lexer.pop_state() # Restore previous state
|
|
744 t.lexer.current_state() # Returns name of current state
|
|
745
|
|
746 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
|
|
747 yy_top_state() functions in flex.
|
|
748
|
|
749 The use of start states can be used as one way to write sub-lexers.
|
|
750 For example, the lexer or parser might instruct the lexer to start
|
|
751 generating a different set of tokens depending on the context.
|
|
752
|
|
753 example/yply/ylex.py shows the use of start states to grab C/C++
|
|
754 code fragments out of traditional yacc specification files.
|
|
755
|
|
756 *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
|
|
757 discussed various aspects of the design.
|
|
758
|
|
759 10/19/06: beazley
|
|
760 Minor change to the way in which yacc.py was reporting shift/reduce
|
|
761 conflicts. Although the underlying LALR(1) algorithm was correct,
|
|
762 PLY was under-reporting the number of conflicts compared to yacc/bison
|
|
763 when precedence rules were in effect. This change should make PLY
|
|
764 report the same number of conflicts as yacc.
|
|
765
|
|
766 10/19/06: beazley
|
|
767 Modified yacc so that grammar rules could also include the '-'
|
|
768 character. For example:
|
|
769
|
|
770 def p_expr_list(p):
|
|
771 'expression-list : expression-list expression'
|
|
772
|
|
773 Suggested by Oldrich Jedlicka.
|
|
774
|
|
775 10/18/06: beazley
|
|
776 Attribute lexer.lexmatch added so that token rules can access the re
|
|
777 match object that was generated. For example:
|
|
778
|
|
779 def t_FOO(t):
|
|
780 r'some regex'
|
|
781 m = t.lexer.lexmatch
|
|
782 # Do something with m
|
|
783
|
|
784
|
|
785 This may be useful if you want to access named groups specified within
|
|
786 the regex for a specific token. Suggested by Oldrich Jedlicka.
|
|
787
|
|
788 10/16/06: beazley
|
|
789 Changed the error message that results if an illegal character
|
|
790 is encountered and no default error function is defined in lex.
|
|
791 The exception is now more informative about the actual cause of
|
|
792 the error.
|
|
793
|
|
794 Version 2.1
|
|
795 ------------------------------
|
|
796 10/02/06: beazley
|
|
797 The last Lexer object built by lex() can be found in lex.lexer.
|
|
798 The last Parser object built by yacc() can be found in yacc.parser.
|
|
799
|
|
800 10/02/06: beazley
|
|
801 New example added: examples/yply
|
|
802
|
|
803 This example uses PLY to convert Unix-yacc specification files to
|
|
804 PLY programs with the same grammar. This may be useful if you
|
|
805 want to convert a grammar from bison/yacc to use with PLY.
|
|
806
|
|
807 10/02/06: beazley
|
|
808 Added support for a start symbol to be specified in the yacc
|
|
809 input file itself. Just do this:
|
|
810
|
|
811 start = 'name'
|
|
812
|
|
813 where 'name' matches some grammar rule. For example:
|
|
814
|
|
815 def p_name(p):
|
|
816 'name : A B C'
|
|
817 ...
|
|
818
|
|
819 This mirrors the functionality of the yacc %start specifier.
|
|
820
|
|
821 09/30/06: beazley
|
|
822 Some new examples added.:
|
|
823
|
|
824 examples/GardenSnake : A simple indentation based language similar
|
|
825 to Python. Shows how you might handle
|
|
826 whitespace. Contributed by Andrew Dalke.
|
|
827
|
|
828 examples/BASIC : An implementation of 1964 Dartmouth BASIC.
|
|
829 Contributed by Dave against his better
|
|
830 judgement.
|
|
831
|
|
832 09/28/06: beazley
|
|
833 Minor patch to allow named groups to be used in lex regular
|
|
834 expression rules. For example:
|
|
835
|
|
836 t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
|
|
837
|
|
838 Patch submitted by Adam Ring.
|
|
839
|
|
840 09/28/06: beazley
|
|
841 LALR(1) is now the default parsing method. To use SLR, use
|
|
842 yacc.yacc(method="SLR"). Note: there is no performance impact
|
|
843 on parsing when using LALR(1) instead of SLR. However, constructing
|
|
844 the parsing tables will take a little longer.
|
|
845
|
|
846 09/26/06: beazley
|
|
847 Change to line number tracking. To modify line numbers, modify
|
|
848 the line number of the lexer itself. For example:
|
|
849
|
|
850 def t_NEWLINE(t):
|
|
851 r'\n'
|
|
852 t.lexer.lineno += 1
|
|
853
|
|
854 This modification is both cleanup and a performance optimization.
|
|
855 In past versions, lex was monitoring every token for changes in
|
|
856 the line number. This extra processing is unnecessary for a vast
|
|
857 majority of tokens. Thus, this new approach cleans it up a bit.
|
|
858
|
|
859 *** POTENTIAL INCOMPATIBILITY ***
|
|
860 You will need to change code in your lexer that updates the line
|
|
861 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
|
|
862
|
|
863 09/26/06: beazley
|
|
864 Added the lexing position to tokens as an attribute lexpos. This
|
|
865 is the raw index into the input text at which a token appears.
|
|
866 This information can be used to compute column numbers and other
|
|
867 details (e.g., scan backwards from lexpos to the first newline
|
|
868 to get a column position).
|
|
869
|
|
870 09/25/06: beazley
|
|
871 Changed the name of the __copy__() method on the Lexer class
|
|
872 to clone(). This is used to clone a Lexer object (e.g., if
|
|
873 you're running different lexers at the same time).
|
|
874
|
|
875 09/21/06: beazley
|
|
876 Limitations related to the use of the re module have been eliminated.
|
|
877 Several users reported problems with regular expressions exceeding
|
|
878 more than 100 named groups. To solve this, lex.py is now capable
|
|
879 of automatically splitting its master regular regular expression into
|
|
880 smaller expressions as needed. This should, in theory, make it
|
|
881 possible to specify an arbitrarily large number of tokens.
|
|
882
|
|
883 09/21/06: beazley
|
|
884 Improved error checking in lex.py. Rules that match the empty string
|
|
885 are now rejected (otherwise they cause the lexer to enter an infinite
|
|
886 loop). An extra check for rules containing '#' has also been added.
|
|
887 Since lex compiles regular expressions in verbose mode, '#' is interpreted
|
|
888 as a regex comment, it is critical to use '\#' instead.
|
|
889
|
|
890 09/18/06: beazley
|
|
891 Added a @TOKEN decorator function to lex.py that can be used to
|
|
892 define token rules where the documentation string might be computed
|
|
893 in some way.
|
|
894
|
|
895 digit = r'([0-9])'
|
|
896 nondigit = r'([_A-Za-z])'
|
|
897 identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
|
|
898
|
|
899 from ply.lex import TOKEN
|
|
900
|
|
901 @TOKEN(identifier)
|
|
902 def t_ID(t):
|
|
903 # Do whatever
|
|
904
|
|
905 The @TOKEN decorator merely sets the documentation string of the
|
|
906 associated token function as needed for lex to work.
|
|
907
|
|
908 Note: An alternative solution is the following:
|
|
909
|
|
910 def t_ID(t):
|
|
911 # Do whatever
|
|
912
|
|
913 t_ID.__doc__ = identifier
|
|
914
|
|
915 Note: Decorators require the use of Python 2.4 or later. If compatibility
|
|
916 with old versions is needed, use the latter solution.
|
|
917
|
|
918 The need for this feature was suggested by Cem Karan.
|
|
919
|
|
920 09/14/06: beazley
|
|
921 Support for single-character literal tokens has been added to yacc.
|
|
922 These literals must be enclosed in quotes. For example:
|
|
923
|
|
924 def p_expr(p):
|
|
925 "expr : expr '+' expr"
|
|
926 ...
|
|
927
|
|
928 def p_expr(p):
|
|
929 'expr : expr "-" expr'
|
|
930 ...
|
|
931
|
|
932 In addition to this, it is necessary to tell the lexer module about
|
|
933 literal characters. This is done by defining the variable 'literals'
|
|
934 as a list of characters. This should be defined in the module that
|
|
935 invokes the lex.lex() function. For example:
|
|
936
|
|
937 literals = ['+','-','*','/','(',')','=']
|
|
938
|
|
939 or simply
|
|
940
|
|
941 literals = '+=*/()='
|
|
942
|
|
943 It is important to note that literals can only be a single character.
|
|
944 When the lexer fails to match a token using its normal regular expression
|
|
945 rules, it will check the current character against the literal list.
|
|
946 If found, it will be returned with a token type set to match the literal
|
|
947 character. Otherwise, an illegal character will be signalled.
|
|
948
|
|
949
|
|
950 09/14/06: beazley
|
|
951 Modified PLY to install itself as a proper Python package called 'ply'.
|
|
952 This will make it a little more friendly to other modules. This
|
|
953 changes the usage of PLY only slightly. Just do this to import the
|
|
954 modules
|
|
955
|
|
956 import ply.lex as lex
|
|
957 import ply.yacc as yacc
|
|
958
|
|
959 Alternatively, you can do this:
|
|
960
|
|
961 from ply import *
|
|
962
|
|
963 Which imports both the lex and yacc modules.
|
|
964 Change suggested by Lee June.
|
|
965
|
|
966 09/13/06: beazley
|
|
967 Changed the handling of negative indices when used in production rules.
|
|
968 A negative production index now accesses already parsed symbols on the
|
|
969 parsing stack. For example,
|
|
970
|
|
971 def p_foo(p):
|
|
972 "foo: A B C D"
|
|
973 print p[1] # Value of 'A' symbol
|
|
974 print p[2] # Value of 'B' symbol
|
|
975 print p[-1] # Value of whatever symbol appears before A
|
|
976 # on the parsing stack.
|
|
977
|
|
978 p[0] = some_val # Sets the value of the 'foo' grammer symbol
|
|
979
|
|
980 This behavior makes it easier to work with embedded actions within the
|
|
981 parsing rules. For example, in C-yacc, it is possible to write code like
|
|
982 this:
|
|
983
|
|
984 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; }
|
|
985
|
|
986 In this example, the printf() code executes immediately after A has been
|
|
987 parsed. Within the embedded action code, $1 refers to the A symbol on
|
|
988 the stack.
|
|
989
|
|
990 To perform this equivalent action in PLY, you need to write a pair
|
|
991 of rules like this:
|
|
992
|
|
993 def p_bar(p):
|
|
994 "bar : A seen_A B"
|
|
995 do_stuff
|
|
996
|
|
997 def p_seen_A(p):
|
|
998 "seen_A :"
|
|
999 print "seen an A =", p[-1]
|
|
1000
|
|
1001 The second rule "seen_A" is merely a empty production which should be
|
|
1002 reduced as soon as A is parsed in the "bar" rule above. The use
|
|
1003 of the negative index p[-1] is used to access whatever symbol appeared
|
|
1004 before the seen_A symbol.
|
|
1005
|
|
1006 This feature also makes it possible to support inherited attributes.
|
|
1007 For example:
|
|
1008
|
|
1009 def p_decl(p):
|
|
1010 "decl : scope name"
|
|
1011
|
|
1012 def p_scope(p):
|
|
1013 """scope : GLOBAL
|
|
1014 | LOCAL"""
|
|
1015 p[0] = p[1]
|
|
1016
|
|
1017 def p_name(p):
|
|
1018 "name : ID"
|
|
1019 if p[-1] == "GLOBAL":
|
|
1020 # ...
|
|
1021 else if p[-1] == "LOCAL":
|
|
1022 #...
|
|
1023
|
|
1024 In this case, the name rule is inheriting an attribute from the
|
|
1025 scope declaration that precedes it.
|
|
1026
|
|
1027 *** POTENTIAL INCOMPATIBILITY ***
|
|
1028 If you are currently using negative indices within existing grammar rules,
|
|
1029 your code will break. This should be extremely rare if non-existent in
|
|
1030 most cases. The argument to various grammar rules is not usually not
|
|
1031 processed in the same way as a list of items.
|
|
1032
|
|
1033 Version 2.0
|
|
1034 ------------------------------
|
|
1035 09/07/06: beazley
|
|
1036 Major cleanup and refactoring of the LR table generation code. Both SLR
|
|
1037 and LALR(1) table generation is now performed by the same code base with
|
|
1038 only minor extensions for extra LALR(1) processing.
|
|
1039
|
|
1040 09/07/06: beazley
|
|
1041 Completely reimplemented the entire LALR(1) parsing engine to use the
|
|
1042 DeRemer and Pennello algorithm for calculating lookahead sets. This
|
|
1043 significantly improves the performance of generating LALR(1) tables
|
|
1044 and has the added feature of actually working correctly! If you
|
|
1045 experienced weird behavior with LALR(1) in prior releases, this should
|
|
1046 hopefully resolve all of those problems. Many thanks to
|
|
1047 Andrew Waters and Markus Schoepflin for submitting bug reports
|
|
1048 and helping me test out the revised LALR(1) support.
|
|
1049
|
|
1050 Version 1.8
|
|
1051 ------------------------------
|
|
1052 08/02/06: beazley
|
|
1053 Fixed a problem related to the handling of default actions in LALR(1)
|
|
1054 parsing. If you experienced subtle and/or bizarre behavior when trying
|
|
1055 to use the LALR(1) engine, this may correct those problems. Patch
|
|
1056 contributed by Russ Cox. Note: This patch has been superceded by
|
|
1057 revisions for LALR(1) parsing in Ply-2.0.
|
|
1058
|
|
1059 08/02/06: beazley
|
|
1060 Added support for slicing of productions in yacc.
|
|
1061 Patch contributed by Patrick Mezard.
|
|
1062
|
|
1063 Version 1.7
|
|
1064 ------------------------------
|
|
1065 03/02/06: beazley
|
|
1066 Fixed infinite recursion problem ReduceToTerminals() function that
|
|
1067 would sometimes come up in LALR(1) table generation. Reported by
|
|
1068 Markus Schoepflin.
|
|
1069
|
|
1070 03/01/06: beazley
|
|
1071 Added "reflags" argument to lex(). For example:
|
|
1072
|
|
1073 lex.lex(reflags=re.UNICODE)
|
|
1074
|
|
1075 This can be used to specify optional flags to the re.compile() function
|
|
1076 used inside the lexer. This may be necessary for special situations such
|
|
1077 as processing Unicode (e.g., if you want escapes like \w and \b to consult
|
|
1078 the Unicode character property database). The need for this suggested by
|
|
1079 Andreas Jung.
|
|
1080
|
|
1081 03/01/06: beazley
|
|
1082 Fixed a bug with an uninitialized variable on repeated instantiations of parser
|
|
1083 objects when the write_tables=0 argument was used. Reported by Michael Brown.
|
|
1084
|
|
1085 03/01/06: beazley
|
|
1086 Modified lex.py to accept Unicode strings both as the regular expressions for
|
|
1087 tokens and as input. Hopefully this is the only change needed for Unicode support.
|
|
1088 Patch contributed by Johan Dahl.
|
|
1089
|
|
1090 03/01/06: beazley
|
|
1091 Modified the class-based interface to work with new-style or old-style classes.
|
|
1092 Patch contributed by Michael Brown (although I tweaked it slightly so it would work
|
|
1093 with older versions of Python).
|
|
1094
|
|
1095 Version 1.6
|
|
1096 ------------------------------
|
|
1097 05/27/05: beazley
|
|
1098 Incorporated patch contributed by Christopher Stawarz to fix an extremely
|
|
1099 devious bug in LALR(1) parser generation. This patch should fix problems
|
|
1100 numerous people reported with LALR parsing.
|
|
1101
|
|
1102 05/27/05: beazley
|
|
1103 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav,
|
|
1104 and Thad Austin.
|
|
1105
|
|
1106 05/27/05: beazley
|
|
1107 Added outputdir option to yacc() to control output directory. Contributed
|
|
1108 by Christopher Stawarz.
|
|
1109
|
|
1110 05/27/05: beazley
|
|
1111 Added rununit.py test script to run tests using the Python unittest module.
|
|
1112 Contributed by Miki Tebeka.
|
|
1113
|
|
1114 Version 1.5
|
|
1115 ------------------------------
|
|
1116 05/26/04: beazley
|
|
1117 Major enhancement. LALR(1) parsing support is now working.
|
|
1118 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
|
|
1119 and optimized by David Beazley. To use LALR(1) parsing do
|
|
1120 the following:
|
|
1121
|
|
1122 yacc.yacc(method="LALR")
|
|
1123
|
|
1124 Computing LALR(1) parsing tables takes about twice as long as
|
|
1125 the default SLR method. However, LALR(1) allows you to handle
|
|
1126 more complex grammars. For example, the ANSI C grammar
|
|
1127 (in example/ansic) has 13 shift-reduce conflicts with SLR, but
|
|
1128 only has 1 shift-reduce conflict with LALR(1).
|
|
1129
|
|
1130 05/20/04: beazley
|
|
1131 Added a __len__ method to parser production lists. Can
|
|
1132 be used in parser rules like this:
|
|
1133
|
|
1134 def p_somerule(p):
|
|
1135 """a : B C D
|
|
1136 | E F"
|
|
1137 if (len(p) == 3):
|
|
1138 # Must have been first rule
|
|
1139 elif (len(p) == 2):
|
|
1140 # Must be second rule
|
|
1141
|
|
1142 Suggested by Joshua Gerth and others.
|
|
1143
|
|
1144 Version 1.4
|
|
1145 ------------------------------
|
|
1146 04/23/04: beazley
|
|
1147 Incorporated a variety of patches contributed by Eric Raymond.
|
|
1148 These include:
|
|
1149
|
|
1150 0. Cleans up some comments so they don't wrap on an 80-column display.
|
|
1151 1. Directs compiler errors to stderr where they belong.
|
|
1152 2. Implements and documents automatic line counting when \n is ignored.
|
|
1153 3. Changes the way progress messages are dumped when debugging is on.
|
|
1154 The new format is both less verbose and conveys more information than
|
|
1155 the old, including shift and reduce actions.
|
|
1156
|
|
1157 04/23/04: beazley
|
|
1158 Added a Python setup.py file to simply installation. Contributed
|
|
1159 by Adam Kerrison.
|
|
1160
|
|
1161 04/23/04: beazley
|
|
1162 Added patches contributed by Adam Kerrison.
|
|
1163
|
|
1164 - Some output is now only shown when debugging is enabled. This
|
|
1165 means that PLY will be completely silent when not in debugging mode.
|
|
1166
|
|
1167 - An optional parameter "write_tables" can be passed to yacc() to
|
|
1168 control whether or not parsing tables are written. By default,
|
|
1169 it is true, but it can be turned off if you don't want the yacc
|
|
1170 table file. Note: disabling this will cause yacc() to regenerate
|
|
1171 the parsing table each time.
|
|
1172
|
|
1173 04/23/04: beazley
|
|
1174 Added patches contributed by David McNab. This patch addes two
|
|
1175 features:
|
|
1176
|
|
1177 - The parser can be supplied as a class instead of a module.
|
|
1178 For an example of this, see the example/classcalc directory.
|
|
1179
|
|
1180 - Debugging output can be directed to a filename of the user's
|
|
1181 choice. Use
|
|
1182
|
|
1183 yacc(debugfile="somefile.out")
|
|
1184
|
|
1185
|
|
1186 Version 1.3
|
|
1187 ------------------------------
|
|
1188 12/10/02: jmdyck
|
|
1189 Various minor adjustments to the code that Dave checked in today.
|
|
1190 Updated test/yacc_{inf,unused}.exp to reflect today's changes.
|
|
1191
|
|
1192 12/10/02: beazley
|
|
1193 Incorporated a variety of minor bug fixes to empty production
|
|
1194 handling and infinite recursion checking. Contributed by
|
|
1195 Michael Dyck.
|
|
1196
|
|
1197 12/10/02: beazley
|
|
1198 Removed bogus recover() method call in yacc.restart()
|
|
1199
|
|
1200 Version 1.2
|
|
1201 ------------------------------
|
|
1202 11/27/02: beazley
|
|
1203 Lexer and parser objects are now available as an attribute
|
|
1204 of tokens and slices respectively. For example:
|
|
1205
|
|
1206 def t_NUMBER(t):
|
|
1207 r'\d+'
|
|
1208 print t.lexer
|
|
1209
|
|
1210 def p_expr_plus(t):
|
|
1211 'expr: expr PLUS expr'
|
|
1212 print t.lexer
|
|
1213 print t.parser
|
|
1214
|
|
1215 This can be used for state management (if needed).
|
|
1216
|
|
1217 10/31/02: beazley
|
|
1218 Modified yacc.py to work with Python optimize mode. To make
|
|
1219 this work, you need to use
|
|
1220
|
|
1221 yacc.yacc(optimize=1)
|
|
1222
|
|
1223 Furthermore, you need to first run Python in normal mode
|
|
1224 to generate the necessary parsetab.py files. After that,
|
|
1225 you can use python -O or python -OO.
|
|
1226
|
|
1227 Note: optimized mode turns off a lot of error checking.
|
|
1228 Only use when you are sure that your grammar is working.
|
|
1229 Make sure parsetab.py is up to date!
|
|
1230
|
|
1231 10/30/02: beazley
|
|
1232 Added cloning of Lexer objects. For example:
|
|
1233
|
|
1234 import copy
|
|
1235 l = lex.lex()
|
|
1236 lc = copy.copy(l)
|
|
1237
|
|
1238 l.input("Some text")
|
|
1239 lc.input("Some other text")
|
|
1240 ...
|
|
1241
|
|
1242 This might be useful if the same "lexer" is meant to
|
|
1243 be used in different contexts---or if multiple lexers
|
|
1244 are running concurrently.
|
|
1245
|
|
1246 10/30/02: beazley
|
|
1247 Fixed subtle bug with first set computation and empty productions.
|
|
1248 Patch submitted by Michael Dyck.
|
|
1249
|
|
1250 10/30/02: beazley
|
|
1251 Fixed error messages to use "filename:line: message" instead
|
|
1252 of "filename:line. message". This makes error reporting more
|
|
1253 friendly to emacs. Patch submitted by François Pinard.
|
|
1254
|
|
1255 10/30/02: beazley
|
|
1256 Improvements to parser.out file. Terminals and nonterminals
|
|
1257 are sorted instead of being printed in random order.
|
|
1258 Patch submitted by François Pinard.
|
|
1259
|
|
1260 10/30/02: beazley
|
|
1261 Improvements to parser.out file output. Rules are now printed
|
|
1262 in a way that's easier to understand. Contributed by Russ Cox.
|
|
1263
|
|
1264 10/30/02: beazley
|
|
1265 Added 'nonassoc' associativity support. This can be used
|
|
1266 to disable the chaining of operators like a < b < c.
|
|
1267 To use, simply specify 'nonassoc' in the precedence table
|
|
1268
|
|
1269 precedence = (
|
|
1270 ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators
|
|
1271 ('left', 'PLUS', 'MINUS'),
|
|
1272 ('left', 'TIMES', 'DIVIDE'),
|
|
1273 ('right', 'UMINUS'), # Unary minus operator
|
|
1274 )
|
|
1275
|
|
1276 Patch contributed by Russ Cox.
|
|
1277
|
|
1278 10/30/02: beazley
|
|
1279 Modified the lexer to provide optional support for Python -O and -OO
|
|
1280 modes. To make this work, Python *first* needs to be run in
|
|
1281 unoptimized mode. This reads the lexing information and creates a
|
|
1282 file "lextab.py". Then, run lex like this:
|
|
1283
|
|
1284 # module foo.py
|
|
1285 ...
|
|
1286 ...
|
|
1287 lex.lex(optimize=1)
|
|
1288
|
|
1289 Once the lextab file has been created, subsequent calls to
|
|
1290 lex.lex() will read data from the lextab file instead of using
|
|
1291 introspection. In optimized mode (-O, -OO) everything should
|
|
1292 work normally despite the loss of doc strings.
|
|
1293
|
|
1294 To change the name of the file 'lextab.py' use the following:
|
|
1295
|
|
1296 lex.lex(lextab="footab")
|
|
1297
|
|
1298 (this creates a file footab.py)
|
|
1299
|
|
1300
|
|
1301 Version 1.1 October 25, 2001
|
|
1302 ------------------------------
|
|
1303
|
|
1304 10/25/01: beazley
|
|
1305 Modified the table generator to produce much more compact data.
|
|
1306 This should greatly reduce the size of the parsetab.py[c] file.
|
|
1307 Caveat: the tables still need to be constructed so a little more
|
|
1308 work is done in parsetab on import.
|
|
1309
|
|
1310 10/25/01: beazley
|
|
1311 There may be a possible bug in the cycle detector that reports errors
|
|
1312 about infinite recursion. I'm having a little trouble tracking it
|
|
1313 down, but if you get this problem, you can disable the cycle
|
|
1314 detector as follows:
|
|
1315
|
|
1316 yacc.yacc(check_recursion = 0)
|
|
1317
|
|
1318 10/25/01: beazley
|
|
1319 Fixed a bug in lex.py that sometimes caused illegal characters to be
|
|
1320 reported incorrectly. Reported by Sverre Jørgensen.
|
|
1321
|
|
1322 7/8/01 : beazley
|
|
1323 Added a reference to the underlying lexer object when tokens are handled by
|
|
1324 functions. The lexer is available as the 'lexer' attribute. This
|
|
1325 was added to provide better lexing support for languages such as Fortran
|
|
1326 where certain types of tokens can't be conveniently expressed as regular
|
|
1327 expressions (and where the tokenizing function may want to perform a
|
|
1328 little backtracking). Suggested by Pearu Peterson.
|
|
1329
|
|
1330 6/20/01 : beazley
|
|
1331 Modified yacc() function so that an optional starting symbol can be specified.
|
|
1332 For example:
|
|
1333
|
|
1334 yacc.yacc(start="statement")
|
|
1335
|
|
1336 Normally yacc always treats the first production rule as the starting symbol.
|
|
1337 However, if you are debugging your grammar it may be useful to specify
|
|
1338 an alternative starting symbol. Idea suggested by Rich Salz.
|
|
1339
|
|
1340 Version 1.0 June 18, 2001
|
|
1341 --------------------------
|
|
1342 Initial public offering
|
|
1343
|