comparison ply-3.8/CHANGES @ 7267:343ff337a19b

<ais523> ` tar -xf ply-3.8.tar.gz
author HackBot
date Wed, 23 Mar 2016 02:40:16 +0000
parents
children
comparison
equal deleted inserted replaced
7266:61a39a120dee 7267:343ff337a19b
1 Version 3.8
2 ---------------------
3 10/02/15: beazley
4 Fixed issues related to Python 3.5. Patch contributed by Barry Warsaw.
5
6 Version 3.7
7 ---------------------
8 08/25/15: beazley
9 Fixed problems when reading table files from pickled data.
10
11 05/07/15: beazley
12 Fixed regression in handling of table modules if specified as module
13 objects. See https://github.com/dabeaz/ply/issues/63
14
15 Version 3.6
16 ---------------------
17 04/25/15: beazley
18 If PLY is unable to create the 'parser.out' or 'parsetab.py' files due
19 to permission issues, it now just issues a warning message and
20 continues to operate. This could happen if a module using PLY
21 is installed in a funny way where tables have to be regenerated, but
22 for whatever reason, the user doesn't have write permission on
23 the directory where PLY wants to put them.
24
25 04/24/15: beazley
26 Fixed some issues related to use of packages and table file
27 modules. Just to emphasize, PLY now generates its special
28 files such as 'parsetab.py' and 'lextab.py' in the *SAME*
29 directory as the source file that uses lex() and yacc().
30
31 If for some reason, you want to change the name of the table
32 module, use the tabmodule and lextab options:
33
34 lexer = lex.lex(lextab='spamlextab')
35 parser = yacc.yacc(tabmodule='spamparsetab')
36
37 If you specify a simple name as shown, the module will still be
38 created in the same directory as the file invoking lex() or yacc().
39 If you want the table files to be placed into a different package,
40 then give a fully qualified package name. For example:
41
42 lexer = lex.lex(lextab='pkgname.files.lextab')
43 parser = yacc.yacc(tabmodule='pkgname.files.parsetab')
44
45 For this to work, 'pkgname.files' must already exist as a valid
46 Python package (i.e., the directories must already exist and be
47 set up with the proper __init__.py files, etc.).
48
49 Version 3.5
50 ---------------------
51 04/21/15: beazley
52 Added support for defaulted_states in the parser. A
53 defaulted_state is a state where the only legal action is a
54 reduction of a single grammar rule across all valid input
55 tokens. For such states, the rule is reduced and the
56 reading of the next lookahead token is delayed until it is
57 actually needed at a later point in time.
58
59 This delay in consuming the next lookahead token is a
60 potentially important feature in advanced parsing
61 applications that require tight interaction between the
62 lexer and the parser. For example, a grammar rule change
63 modify the lexer state upon reduction and have such changes
64 take effect before the next input token is read.
65
66 *** POTENTIAL INCOMPATIBILITY ***
67 One potential danger of defaulted_states is that syntax
68 errors might be deferred to a a later point of processing
69 than where they were detected in past versions of PLY.
70 Thus, it's possible that your error handling could change
71 slightly on the same inputs. defaulted_states do not change
72 the overall parsing of the input (i.e., the same grammar is
73 accepted).
74
75 If for some reason, you need to disable defaulted states,
76 you can do this:
77
78 parser = yacc.yacc()
79 parser.defaulted_states = {}
80
81 04/21/15: beazley
82 Fixed debug logging in the parser. It wasn't properly reporting goto states
83 on grammar rule reductions.
84
85 04/20/15: beazley
86 Added actions to be defined to character literals (Issue #32). For example:
87
88 literals = [ '{', '}' ]
89
90 def t_lbrace(t):
91 r'\{'
92 # Some action
93 t.type = '{'
94 return t
95
96 def t_rbrace(t):
97 r'\}'
98 # Some action
99 t.type = '}'
100 return t
101
102 04/19/15: beazley
103 Import of the 'parsetab.py' file is now constrained to only consider the
104 directory specified by the outputdir argument to yacc(). If not supplied,
105 the import will only consider the directory in which the grammar is defined.
106 This should greatly reduce problems with the wrong parsetab.py file being
107 imported by mistake. For example, if it's found somewhere else on the path
108 by accident.
109
110 *** POTENTIAL INCOMPATIBILITY *** It's possible that this might break some
111 packaging/deployment setup if PLY was instructed to place its parsetab.py
112 in a different location. You'll have to specify a proper outputdir= argument
113 to yacc() to fix this if needed.
114
115 04/19/15: beazley
116 Changed default output directory to be the same as that in which the
117 yacc grammar is defined. If your grammar is in a file 'calc.py',
118 then the parsetab.py and parser.out files should be generated in the
119 same directory as that file. The destination directory can be changed
120 using the outputdir= argument to yacc().
121
122 04/19/15: beazley
123 Changed the parsetab.py file signature slightly so that the parsetab won't
124 regenerate if created on a different major version of Python (ie., a
125 parsetab created on Python 2 will work with Python 3).
126
127 04/16/15: beazley
128 Fixed Issue #44 call_errorfunc() should return the result of errorfunc()
129
130 04/16/15: beazley
131 Support for versions of Python <2.7 is officially dropped. PLY may work, but
132 the unit tests requires Python 2.7 or newer.
133
134 04/16/15: beazley
135 Fixed bug related to calling yacc(start=...). PLY wasn't regenerating the
136 table file correctly for this case.
137
138 04/16/15: beazley
139 Added skipped tests for PyPy and Java. Related to use of Python's -O option.
140
141 05/29/13: beazley
142 Added filter to make unit tests pass under 'python -3'.
143 Reported by Neil Muller.
144
145 05/29/13: beazley
146 Fixed CPP_INTEGER regex in ply/cpp.py (Issue 21).
147 Reported by @vbraun.
148
149 05/29/13: beazley
150 Fixed yacc validation bugs when from __future__ import unicode_literals
151 is being used. Reported by Kenn Knowles.
152
153 05/29/13: beazley
154 Added support for Travis-CI. Contributed by Kenn Knowles.
155
156 05/29/13: beazley
157 Added a .gitignore file. Suggested by Kenn Knowles.
158
159 05/29/13: beazley
160 Fixed validation problems for source files that include a
161 different source code encoding specifier. Fix relies on
162 the inspect module. Should work on Python 2.6 and newer.
163 Not sure about older versions of Python.
164 Contributed by Michael Droettboom
165
166 05/21/13: beazley
167 Fixed unit tests for yacc to eliminate random failures due to dict hash value
168 randomization in Python 3.3
169 Reported by Arfrever
170
171 10/15/12: beazley
172 Fixed comment whitespace processing bugs in ply/cpp.py.
173 Reported by Alexei Pososin.
174
175 10/15/12: beazley
176 Fixed token names in ply/ctokens.py to match rule names.
177 Reported by Alexei Pososin.
178
179 04/26/12: beazley
180 Changes to functions available in panic mode error recover. In previous versions
181 of PLY, the following global functions were available for use in the p_error() rule:
182
183 yacc.errok() # Reset error state
184 yacc.token() # Get the next token
185 yacc.restart() # Reset the parsing stack
186
187 The use of global variables was problematic for code involving multiple parsers
188 and frankly was a poor design overall. These functions have been moved to methods
189 of the parser instance created by the yacc() function. You should write code like
190 this:
191
192 def p_error(p):
193 ...
194 parser.errok()
195
196 parser = yacc.yacc()
197
198 *** POTENTIAL INCOMPATIBILITY *** The original global functions now issue a
199 DeprecationWarning.
200
201 04/19/12: beazley
202 Fixed some problems with line and position tracking and the use of error
203 symbols. If you have a grammar rule involving an error rule like this:
204
205 def p_assignment_bad(p):
206 '''assignment : location EQUALS error SEMI'''
207 ...
208
209 You can now do line and position tracking on the error token. For example:
210
211 def p_assignment_bad(p):
212 '''assignment : location EQUALS error SEMI'''
213 start_line = p.lineno(3)
214 start_pos = p.lexpos(3)
215
216 If the trackng=True option is supplied to parse(), you can additionally get
217 spans:
218
219 def p_assignment_bad(p):
220 '''assignment : location EQUALS error SEMI'''
221 start_line, end_line = p.linespan(3)
222 start_pos, end_pos = p.lexspan(3)
223
224 Note that error handling is still a hairy thing in PLY. This won't work
225 unless your lexer is providing accurate information. Please report bugs.
226 Suggested by a bug reported by Davis Herring.
227
228 04/18/12: beazley
229 Change to doc string handling in lex module. Regex patterns are now first
230 pulled from a function's .regex attribute. If that doesn't exist, then
231 .doc is checked as a fallback. The @TOKEN decorator now sets the .regex
232 attribute of a function instead of its doc string.
233 Changed suggested by Kristoffer Ellersgaard Koch.
234
235 04/18/12: beazley
236 Fixed issue #1: Fixed _tabversion. It should use __tabversion__ instead of __version__
237 Reported by Daniele Tricoli
238
239 04/18/12: beazley
240 Fixed issue #8: Literals empty list causes IndexError
241 Reported by Walter Nissen.
242
243 04/18/12: beazley
244 Fixed issue #12: Typo in code snippet in documentation
245 Reported by florianschanda.
246
247 04/18/12: beazley
248 Fixed issue #10: Correctly escape t_XOREQUAL pattern.
249 Reported by Andy Kittner.
250
251 Version 3.4
252 ---------------------
253 02/17/11: beazley
254 Minor patch to make cpp.py compatible with Python 3. Note: This
255 is an experimental file not currently used by the rest of PLY.
256
257 02/17/11: beazley
258 Fixed setup.py trove classifiers to properly list PLY as
259 Python 3 compatible.
260
261 01/02/11: beazley
262 Migration of repository to github.
263
264 Version 3.3
265 -----------------------------
266 08/25/09: beazley
267 Fixed issue 15 related to the set_lineno() method in yacc. Reported by
268 mdsherry.
269
270 08/25/09: beazley
271 Fixed a bug related to regular expression compilation flags not being
272 properly stored in lextab.py files created by the lexer when running
273 in optimize mode. Reported by Bruce Frederiksen.
274
275
276 Version 3.2
277 -----------------------------
278 03/24/09: beazley
279 Added an extra check to not print duplicated warning messages
280 about reduce/reduce conflicts.
281
282 03/24/09: beazley
283 Switched PLY over to a BSD-license.
284
285 03/23/09: beazley
286 Performance optimization. Discovered a few places to make
287 speedups in LR table generation.
288
289 03/23/09: beazley
290 New warning message. PLY now warns about rules never
291 reduced due to reduce/reduce conflicts. Suggested by
292 Bruce Frederiksen.
293
294 03/23/09: beazley
295 Some clean-up of warning messages related to reduce/reduce errors.
296
297 03/23/09: beazley
298 Added a new picklefile option to yacc() to write the parsing
299 tables to a filename using the pickle module. Here is how
300 it works:
301
302 yacc(picklefile="parsetab.p")
303
304 This option can be used if the normal parsetab.py file is
305 extremely large. For example, on jython, it is impossible
306 to read parsing tables if the parsetab.py exceeds a certain
307 threshold.
308
309 The filename supplied to the picklefile option is opened
310 relative to the current working directory of the Python
311 interpreter. If you need to refer to the file elsewhere,
312 you will need to supply an absolute or relative path.
313
314 For maximum portability, the pickle file is written
315 using protocol 0.
316
317 03/13/09: beazley
318 Fixed a bug in parser.out generation where the rule numbers
319 where off by one.
320
321 03/13/09: beazley
322 Fixed a string formatting bug with one of the error messages.
323 Reported by Richard Reitmeyer
324
325 Version 3.1
326 -----------------------------
327 02/28/09: beazley
328 Fixed broken start argument to yacc(). PLY-3.0 broke this
329 feature by accident.
330
331 02/28/09: beazley
332 Fixed debugging output. yacc() no longer reports shift/reduce
333 or reduce/reduce conflicts if debugging is turned off. This
334 restores similar behavior in PLY-2.5. Reported by Andrew Waters.
335
336 Version 3.0
337 -----------------------------
338 02/03/09: beazley
339 Fixed missing lexer attribute on certain tokens when
340 invoking the parser p_error() function. Reported by
341 Bart Whiteley.
342
343 02/02/09: beazley
344 The lex() command now does all error-reporting and diagonistics
345 using the logging module interface. Pass in a Logger object
346 using the errorlog parameter to specify a different logger.
347
348 02/02/09: beazley
349 Refactored ply.lex to use a more object-oriented and organized
350 approach to collecting lexer information.
351
352 02/01/09: beazley
353 Removed the nowarn option from lex(). All output is controlled
354 by passing in a logger object. Just pass in a logger with a high
355 level setting to suppress output. This argument was never
356 documented to begin with so hopefully no one was relying upon it.
357
358 02/01/09: beazley
359 Discovered and removed a dead if-statement in the lexer. This
360 resulted in a 6-7% speedup in lexing when I tested it.
361
362 01/13/09: beazley
363 Minor change to the procedure for signalling a syntax error in a
364 production rule. A normal SyntaxError exception should be raised
365 instead of yacc.SyntaxError.
366
367 01/13/09: beazley
368 Added a new method p.set_lineno(n,lineno) that can be used to set the
369 line number of symbol n in grammar rules. This simplifies manual
370 tracking of line numbers.
371
372 01/11/09: beazley
373 Vastly improved debugging support for yacc.parse(). Instead of passing
374 debug as an integer, you can supply a Logging object (see the logging
375 module). Messages will be generated at the ERROR, INFO, and DEBUG
376 logging levels, each level providing progressively more information.
377 The debugging trace also shows states, grammar rule, values passed
378 into grammar rules, and the result of each reduction.
379
380 01/09/09: beazley
381 The yacc() command now does all error-reporting and diagnostics using
382 the interface of the logging module. Use the errorlog parameter to
383 specify a logging object for error messages. Use the debuglog parameter
384 to specify a logging object for the 'parser.out' output.
385
386 01/09/09: beazley
387 *HUGE* refactoring of the the ply.yacc() implementation. The high-level
388 user interface is backwards compatible, but the internals are completely
389 reorganized into classes. No more global variables. The internals
390 are also more extensible. For example, you can use the classes to
391 construct a LALR(1) parser in an entirely different manner than
392 what is currently the case. Documentation is forthcoming.
393
394 01/07/09: beazley
395 Various cleanup and refactoring of yacc internals.
396
397 01/06/09: beazley
398 Fixed a bug with precedence assignment. yacc was assigning the precedence
399 each rule based on the left-most token, when in fact, it should have been
400 using the right-most token. Reported by Bruce Frederiksen.
401
402 11/27/08: beazley
403 Numerous changes to support Python 3.0 including removal of deprecated
404 statements (e.g., has_key) and the additional of compatibility code
405 to emulate features from Python 2 that have been removed, but which
406 are needed. Fixed the unit testing suite to work with Python 3.0.
407 The code should be backwards compatible with Python 2.
408
409 11/26/08: beazley
410 Loosened the rules on what kind of objects can be passed in as the
411 "module" parameter to lex() and yacc(). Previously, you could only use
412 a module or an instance. Now, PLY just uses dir() to get a list of
413 symbols on whatever the object is without regard for its type.
414
415 11/26/08: beazley
416 Changed all except: statements to be compatible with Python2.x/3.x syntax.
417
418 11/26/08: beazley
419 Changed all raise Exception, value statements to raise Exception(value) for
420 forward compatibility.
421
422 11/26/08: beazley
423 Removed all print statements from lex and yacc, using sys.stdout and sys.stderr
424 directly. Preparation for Python 3.0 support.
425
426 11/04/08: beazley
427 Fixed a bug with referring to symbols on the the parsing stack using negative
428 indices.
429
430 05/29/08: beazley
431 Completely revamped the testing system to use the unittest module for everything.
432 Added additional tests to cover new errors/warnings.
433
434 Version 2.5
435 -----------------------------
436 05/28/08: beazley
437 Fixed a bug with writing lex-tables in optimized mode and start states.
438 Reported by Kevin Henry.
439
440 Version 2.4
441 -----------------------------
442 05/04/08: beazley
443 A version number is now embedded in the table file signature so that
444 yacc can more gracefully accomodate changes to the output format
445 in the future.
446
447 05/04/08: beazley
448 Removed undocumented .pushback() method on grammar productions. I'm
449 not sure this ever worked and can't recall ever using it. Might have
450 been an abandoned idea that never really got fleshed out. This
451 feature was never described or tested so removing it is hopefully
452 harmless.
453
454 05/04/08: beazley
455 Added extra error checking to yacc() to detect precedence rules defined
456 for undefined terminal symbols. This allows yacc() to detect a potential
457 problem that can be really tricky to debug if no warning message or error
458 message is generated about it.
459
460 05/04/08: beazley
461 lex() now has an outputdir that can specify the output directory for
462 tables when running in optimize mode. For example:
463
464 lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar")
465
466 The behavior of specifying a table module and output directory are
467 more aligned with the behavior of yacc().
468
469 05/04/08: beazley
470 [Issue 9]
471 Fixed filename bug in when specifying the modulename in lex() and yacc().
472 If you specified options such as the following:
473
474 parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar")
475
476 yacc would create a file "foo.bar.parsetab.py" in the given directory.
477 Now, it simply generates a file "parsetab.py" in that directory.
478 Bug reported by cptbinho.
479
480 05/04/08: beazley
481 Slight modification to lex() and yacc() to allow their table files
482 to be loaded from a previously loaded module. This might make
483 it easier to load the parsing tables from a complicated package
484 structure. For example:
485
486 import foo.bar.spam.parsetab as parsetab
487 parser = yacc.yacc(tabmodule=parsetab)
488
489 Note: lex and yacc will never regenerate the table file if used
490 in the form---you will get a warning message instead.
491 This idea suggested by Brian Clapper.
492
493
494 04/28/08: beazley
495 Fixed a big with p_error() functions being picked up correctly
496 when running in yacc(optimize=1) mode. Patch contributed by
497 Bart Whiteley.
498
499 02/28/08: beazley
500 Fixed a bug with 'nonassoc' precedence rules. Basically the
501 non-precedence was being ignored and not producing the correct
502 run-time behavior in the parser.
503
504 02/16/08: beazley
505 Slight relaxation of what the input() method to a lexer will
506 accept as a string. Instead of testing the input to see
507 if the input is a string or unicode string, it checks to see
508 if the input object looks like it contains string data.
509 This change makes it possible to pass string-like objects
510 in as input. For example, the object returned by mmap.
511
512 import mmap, os
513 data = mmap.mmap(os.open(filename,os.O_RDONLY),
514 os.path.getsize(filename),
515 access=mmap.ACCESS_READ)
516 lexer.input(data)
517
518
519 11/29/07: beazley
520 Modification of ply.lex to allow token functions to aliased.
521 This is subtle, but it makes it easier to create libraries and
522 to reuse token specifications. For example, suppose you defined
523 a function like this:
524
525 def number(t):
526 r'\d+'
527 t.value = int(t.value)
528 return t
529
530 This change would allow you to define a token rule as follows:
531
532 t_NUMBER = number
533
534 In this case, the token type will be set to 'NUMBER' and use
535 the associated number() function to process tokens.
536
537 11/28/07: beazley
538 Slight modification to lex and yacc to grab symbols from both
539 the local and global dictionaries of the caller. This
540 modification allows lexers and parsers to be defined using
541 inner functions and closures.
542
543 11/28/07: beazley
544 Performance optimization: The lexer.lexmatch and t.lexer
545 attributes are no longer set for lexer tokens that are not
546 defined by functions. The only normal use of these attributes
547 would be in lexer rules that need to perform some kind of
548 special processing. Thus, it doesn't make any sense to set
549 them on every token.
550
551 *** POTENTIAL INCOMPATIBILITY *** This might break code
552 that is mucking around with internal lexer state in some
553 sort of magical way.
554
555 11/27/07: beazley
556 Added the ability to put the parser into error-handling mode
557 from within a normal production. To do this, simply raise
558 a yacc.SyntaxError exception like this:
559
560 def p_some_production(p):
561 'some_production : prod1 prod2'
562 ...
563 raise yacc.SyntaxError # Signal an error
564
565 A number of things happen after this occurs:
566
567 - The last symbol shifted onto the symbol stack is discarded
568 and parser state backed up to what it was before the
569 the rule reduction.
570
571 - The current lookahead symbol is saved and replaced by
572 the 'error' symbol.
573
574 - The parser enters error recovery mode where it tries
575 to either reduce the 'error' rule or it starts
576 discarding items off of the stack until the parser
577 resets.
578
579 When an error is manually set, the parser does *not* call
580 the p_error() function (if any is defined).
581 *** NEW FEATURE *** Suggested on the mailing list
582
583 11/27/07: beazley
584 Fixed structure bug in examples/ansic. Reported by Dion Blazakis.
585
586 11/27/07: beazley
587 Fixed a bug in the lexer related to start conditions and ignored
588 token rules. If a rule was defined that changed state, but
589 returned no token, the lexer could be left in an inconsistent
590 state. Reported by
591
592 11/27/07: beazley
593 Modified setup.py to support Python Eggs. Patch contributed by
594 Simon Cross.
595
596 11/09/07: beazely
597 Fixed a bug in error handling in yacc. If a syntax error occurred and the
598 parser rolled the entire parse stack back, the parser would be left in in
599 inconsistent state that would cause it to trigger incorrect actions on
600 subsequent input. Reported by Ton Biegstraaten, Justin King, and others.
601
602 11/09/07: beazley
603 Fixed a bug when passing empty input strings to yacc.parse(). This
604 would result in an error message about "No input given". Reported
605 by Andrew Dalke.
606
607 Version 2.3
608 -----------------------------
609 02/20/07: beazley
610 Fixed a bug with character literals if the literal '.' appeared as the
611 last symbol of a grammar rule. Reported by Ales Smrcka.
612
613 02/19/07: beazley
614 Warning messages are now redirected to stderr instead of being printed
615 to standard output.
616
617 02/19/07: beazley
618 Added a warning message to lex.py if it detects a literal backslash
619 character inside the t_ignore declaration. This is to help
620 problems that might occur if someone accidentally defines t_ignore
621 as a Python raw string. For example:
622
623 t_ignore = r' \t'
624
625 The idea for this is from an email I received from David Cimimi who
626 reported bizarre behavior in lexing as a result of defining t_ignore
627 as a raw string by accident.
628
629 02/18/07: beazley
630 Performance improvements. Made some changes to the internal
631 table organization and LR parser to improve parsing performance.
632
633 02/18/07: beazley
634 Automatic tracking of line number and position information must now be
635 enabled by a special flag to parse(). For example:
636
637 yacc.parse(data,tracking=True)
638
639 In many applications, it's just not that important to have the
640 parser automatically track all line numbers. By making this an
641 optional feature, it allows the parser to run significantly faster
642 (more than a 20% speed increase in many cases). Note: positional
643 information is always available for raw tokens---this change only
644 applies to positional information associated with nonterminal
645 grammar symbols.
646 *** POTENTIAL INCOMPATIBILITY ***
647
648 02/18/07: beazley
649 Yacc no longer supports extended slices of grammar productions.
650 However, it does support regular slices. For example:
651
652 def p_foo(p):
653 '''foo: a b c d e'''
654 p[0] = p[1:3]
655
656 This change is a performance improvement to the parser--it streamlines
657 normal access to the grammar values since slices are now handled in
658 a __getslice__() method as opposed to __getitem__().
659
660 02/12/07: beazley
661 Fixed a bug in the handling of token names when combined with
662 start conditions. Bug reported by Todd O'Bryan.
663
664 Version 2.2
665 ------------------------------
666 11/01/06: beazley
667 Added lexpos() and lexspan() methods to grammar symbols. These
668 mirror the same functionality of lineno() and linespan(). For
669 example:
670
671 def p_expr(p):
672 'expr : expr PLUS expr'
673 p.lexpos(1) # Lexing position of left-hand-expression
674 p.lexpos(1) # Lexing position of PLUS
675 start,end = p.lexspan(3) # Lexing range of right hand expression
676
677 11/01/06: beazley
678 Minor change to error handling. The recommended way to skip characters
679 in the input is to use t.lexer.skip() as shown here:
680
681 def t_error(t):
682 print "Illegal character '%s'" % t.value[0]
683 t.lexer.skip(1)
684
685 The old approach of just using t.skip(1) will still work, but won't
686 be documented.
687
688 10/31/06: beazley
689 Discarded tokens can now be specified as simple strings instead of
690 functions. To do this, simply include the text "ignore_" in the
691 token declaration. For example:
692
693 t_ignore_cppcomment = r'//.*'
694
695 Previously, this had to be done with a function. For example:
696
697 def t_ignore_cppcomment(t):
698 r'//.*'
699 pass
700
701 If start conditions/states are being used, state names should appear
702 before the "ignore_" text.
703
704 10/19/06: beazley
705 The Lex module now provides support for flex-style start conditions
706 as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
707 Please refer to this document to understand this change note. Refer to
708 the PLY documentation for PLY-specific explanation of how this works.
709
710 To use start conditions, you first need to declare a set of states in
711 your lexer file:
712
713 states = (
714 ('foo','exclusive'),
715 ('bar','inclusive')
716 )
717
718 This serves the same role as the %s and %x specifiers in flex.
719
720 One a state has been declared, tokens for that state can be
721 declared by defining rules of the form t_state_TOK. For example:
722
723 t_PLUS = '\+' # Rule defined in INITIAL state
724 t_foo_NUM = '\d+' # Rule defined in foo state
725 t_bar_NUM = '\d+' # Rule defined in bar state
726
727 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar
728 t_ANY_NUM = '\d+' # Rule defined in all states
729
730 In addition to defining tokens for each state, the t_ignore and t_error
731 specifications can be customized for specific states. For example:
732
733 t_foo_ignore = " " # Ignored characters for foo state
734 def t_bar_error(t):
735 # Handle errors in bar state
736
737 With token rules, the following methods can be used to change states
738
739 def t_TOKNAME(t):
740 t.lexer.begin('foo') # Begin state 'foo'
741 t.lexer.push_state('foo') # Begin state 'foo', push old state
742 # onto a stack
743 t.lexer.pop_state() # Restore previous state
744 t.lexer.current_state() # Returns name of current state
745
746 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
747 yy_top_state() functions in flex.
748
749 The use of start states can be used as one way to write sub-lexers.
750 For example, the lexer or parser might instruct the lexer to start
751 generating a different set of tokens depending on the context.
752
753 example/yply/ylex.py shows the use of start states to grab C/C++
754 code fragments out of traditional yacc specification files.
755
756 *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
757 discussed various aspects of the design.
758
759 10/19/06: beazley
760 Minor change to the way in which yacc.py was reporting shift/reduce
761 conflicts. Although the underlying LALR(1) algorithm was correct,
762 PLY was under-reporting the number of conflicts compared to yacc/bison
763 when precedence rules were in effect. This change should make PLY
764 report the same number of conflicts as yacc.
765
766 10/19/06: beazley
767 Modified yacc so that grammar rules could also include the '-'
768 character. For example:
769
770 def p_expr_list(p):
771 'expression-list : expression-list expression'
772
773 Suggested by Oldrich Jedlicka.
774
775 10/18/06: beazley
776 Attribute lexer.lexmatch added so that token rules can access the re
777 match object that was generated. For example:
778
779 def t_FOO(t):
780 r'some regex'
781 m = t.lexer.lexmatch
782 # Do something with m
783
784
785 This may be useful if you want to access named groups specified within
786 the regex for a specific token. Suggested by Oldrich Jedlicka.
787
788 10/16/06: beazley
789 Changed the error message that results if an illegal character
790 is encountered and no default error function is defined in lex.
791 The exception is now more informative about the actual cause of
792 the error.
793
794 Version 2.1
795 ------------------------------
796 10/02/06: beazley
797 The last Lexer object built by lex() can be found in lex.lexer.
798 The last Parser object built by yacc() can be found in yacc.parser.
799
800 10/02/06: beazley
801 New example added: examples/yply
802
803 This example uses PLY to convert Unix-yacc specification files to
804 PLY programs with the same grammar. This may be useful if you
805 want to convert a grammar from bison/yacc to use with PLY.
806
807 10/02/06: beazley
808 Added support for a start symbol to be specified in the yacc
809 input file itself. Just do this:
810
811 start = 'name'
812
813 where 'name' matches some grammar rule. For example:
814
815 def p_name(p):
816 'name : A B C'
817 ...
818
819 This mirrors the functionality of the yacc %start specifier.
820
821 09/30/06: beazley
822 Some new examples added.:
823
824 examples/GardenSnake : A simple indentation based language similar
825 to Python. Shows how you might handle
826 whitespace. Contributed by Andrew Dalke.
827
828 examples/BASIC : An implementation of 1964 Dartmouth BASIC.
829 Contributed by Dave against his better
830 judgement.
831
832 09/28/06: beazley
833 Minor patch to allow named groups to be used in lex regular
834 expression rules. For example:
835
836 t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
837
838 Patch submitted by Adam Ring.
839
840 09/28/06: beazley
841 LALR(1) is now the default parsing method. To use SLR, use
842 yacc.yacc(method="SLR"). Note: there is no performance impact
843 on parsing when using LALR(1) instead of SLR. However, constructing
844 the parsing tables will take a little longer.
845
846 09/26/06: beazley
847 Change to line number tracking. To modify line numbers, modify
848 the line number of the lexer itself. For example:
849
850 def t_NEWLINE(t):
851 r'\n'
852 t.lexer.lineno += 1
853
854 This modification is both cleanup and a performance optimization.
855 In past versions, lex was monitoring every token for changes in
856 the line number. This extra processing is unnecessary for a vast
857 majority of tokens. Thus, this new approach cleans it up a bit.
858
859 *** POTENTIAL INCOMPATIBILITY ***
860 You will need to change code in your lexer that updates the line
861 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
862
863 09/26/06: beazley
864 Added the lexing position to tokens as an attribute lexpos. This
865 is the raw index into the input text at which a token appears.
866 This information can be used to compute column numbers and other
867 details (e.g., scan backwards from lexpos to the first newline
868 to get a column position).
869
870 09/25/06: beazley
871 Changed the name of the __copy__() method on the Lexer class
872 to clone(). This is used to clone a Lexer object (e.g., if
873 you're running different lexers at the same time).
874
875 09/21/06: beazley
876 Limitations related to the use of the re module have been eliminated.
877 Several users reported problems with regular expressions exceeding
878 more than 100 named groups. To solve this, lex.py is now capable
879 of automatically splitting its master regular regular expression into
880 smaller expressions as needed. This should, in theory, make it
881 possible to specify an arbitrarily large number of tokens.
882
883 09/21/06: beazley
884 Improved error checking in lex.py. Rules that match the empty string
885 are now rejected (otherwise they cause the lexer to enter an infinite
886 loop). An extra check for rules containing '#' has also been added.
887 Since lex compiles regular expressions in verbose mode, '#' is interpreted
888 as a regex comment, it is critical to use '\#' instead.
889
890 09/18/06: beazley
891 Added a @TOKEN decorator function to lex.py that can be used to
892 define token rules where the documentation string might be computed
893 in some way.
894
895 digit = r'([0-9])'
896 nondigit = r'([_A-Za-z])'
897 identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
898
899 from ply.lex import TOKEN
900
901 @TOKEN(identifier)
902 def t_ID(t):
903 # Do whatever
904
905 The @TOKEN decorator merely sets the documentation string of the
906 associated token function as needed for lex to work.
907
908 Note: An alternative solution is the following:
909
910 def t_ID(t):
911 # Do whatever
912
913 t_ID.__doc__ = identifier
914
915 Note: Decorators require the use of Python 2.4 or later. If compatibility
916 with old versions is needed, use the latter solution.
917
918 The need for this feature was suggested by Cem Karan.
919
920 09/14/06: beazley
921 Support for single-character literal tokens has been added to yacc.
922 These literals must be enclosed in quotes. For example:
923
924 def p_expr(p):
925 "expr : expr '+' expr"
926 ...
927
928 def p_expr(p):
929 'expr : expr "-" expr'
930 ...
931
932 In addition to this, it is necessary to tell the lexer module about
933 literal characters. This is done by defining the variable 'literals'
934 as a list of characters. This should be defined in the module that
935 invokes the lex.lex() function. For example:
936
937 literals = ['+','-','*','/','(',')','=']
938
939 or simply
940
941 literals = '+=*/()='
942
943 It is important to note that literals can only be a single character.
944 When the lexer fails to match a token using its normal regular expression
945 rules, it will check the current character against the literal list.
946 If found, it will be returned with a token type set to match the literal
947 character. Otherwise, an illegal character will be signalled.
948
949
950 09/14/06: beazley
951 Modified PLY to install itself as a proper Python package called 'ply'.
952 This will make it a little more friendly to other modules. This
953 changes the usage of PLY only slightly. Just do this to import the
954 modules
955
956 import ply.lex as lex
957 import ply.yacc as yacc
958
959 Alternatively, you can do this:
960
961 from ply import *
962
963 Which imports both the lex and yacc modules.
964 Change suggested by Lee June.
965
966 09/13/06: beazley
967 Changed the handling of negative indices when used in production rules.
968 A negative production index now accesses already parsed symbols on the
969 parsing stack. For example,
970
971 def p_foo(p):
972 "foo: A B C D"
973 print p[1] # Value of 'A' symbol
974 print p[2] # Value of 'B' symbol
975 print p[-1] # Value of whatever symbol appears before A
976 # on the parsing stack.
977
978 p[0] = some_val # Sets the value of the 'foo' grammer symbol
979
980 This behavior makes it easier to work with embedded actions within the
981 parsing rules. For example, in C-yacc, it is possible to write code like
982 this:
983
984 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; }
985
986 In this example, the printf() code executes immediately after A has been
987 parsed. Within the embedded action code, $1 refers to the A symbol on
988 the stack.
989
990 To perform this equivalent action in PLY, you need to write a pair
991 of rules like this:
992
993 def p_bar(p):
994 "bar : A seen_A B"
995 do_stuff
996
997 def p_seen_A(p):
998 "seen_A :"
999 print "seen an A =", p[-1]
1000
1001 The second rule "seen_A" is merely a empty production which should be
1002 reduced as soon as A is parsed in the "bar" rule above. The use
1003 of the negative index p[-1] is used to access whatever symbol appeared
1004 before the seen_A symbol.
1005
1006 This feature also makes it possible to support inherited attributes.
1007 For example:
1008
1009 def p_decl(p):
1010 "decl : scope name"
1011
1012 def p_scope(p):
1013 """scope : GLOBAL
1014 | LOCAL"""
1015 p[0] = p[1]
1016
1017 def p_name(p):
1018 "name : ID"
1019 if p[-1] == "GLOBAL":
1020 # ...
1021 else if p[-1] == "LOCAL":
1022 #...
1023
1024 In this case, the name rule is inheriting an attribute from the
1025 scope declaration that precedes it.
1026
1027 *** POTENTIAL INCOMPATIBILITY ***
1028 If you are currently using negative indices within existing grammar rules,
1029 your code will break. This should be extremely rare if non-existent in
1030 most cases. The argument to various grammar rules is not usually not
1031 processed in the same way as a list of items.
1032
1033 Version 2.0
1034 ------------------------------
1035 09/07/06: beazley
1036 Major cleanup and refactoring of the LR table generation code. Both SLR
1037 and LALR(1) table generation is now performed by the same code base with
1038 only minor extensions for extra LALR(1) processing.
1039
1040 09/07/06: beazley
1041 Completely reimplemented the entire LALR(1) parsing engine to use the
1042 DeRemer and Pennello algorithm for calculating lookahead sets. This
1043 significantly improves the performance of generating LALR(1) tables
1044 and has the added feature of actually working correctly! If you
1045 experienced weird behavior with LALR(1) in prior releases, this should
1046 hopefully resolve all of those problems. Many thanks to
1047 Andrew Waters and Markus Schoepflin for submitting bug reports
1048 and helping me test out the revised LALR(1) support.
1049
1050 Version 1.8
1051 ------------------------------
1052 08/02/06: beazley
1053 Fixed a problem related to the handling of default actions in LALR(1)
1054 parsing. If you experienced subtle and/or bizarre behavior when trying
1055 to use the LALR(1) engine, this may correct those problems. Patch
1056 contributed by Russ Cox. Note: This patch has been superceded by
1057 revisions for LALR(1) parsing in Ply-2.0.
1058
1059 08/02/06: beazley
1060 Added support for slicing of productions in yacc.
1061 Patch contributed by Patrick Mezard.
1062
1063 Version 1.7
1064 ------------------------------
1065 03/02/06: beazley
1066 Fixed infinite recursion problem ReduceToTerminals() function that
1067 would sometimes come up in LALR(1) table generation. Reported by
1068 Markus Schoepflin.
1069
1070 03/01/06: beazley
1071 Added "reflags" argument to lex(). For example:
1072
1073 lex.lex(reflags=re.UNICODE)
1074
1075 This can be used to specify optional flags to the re.compile() function
1076 used inside the lexer. This may be necessary for special situations such
1077 as processing Unicode (e.g., if you want escapes like \w and \b to consult
1078 the Unicode character property database). The need for this suggested by
1079 Andreas Jung.
1080
1081 03/01/06: beazley
1082 Fixed a bug with an uninitialized variable on repeated instantiations of parser
1083 objects when the write_tables=0 argument was used. Reported by Michael Brown.
1084
1085 03/01/06: beazley
1086 Modified lex.py to accept Unicode strings both as the regular expressions for
1087 tokens and as input. Hopefully this is the only change needed for Unicode support.
1088 Patch contributed by Johan Dahl.
1089
1090 03/01/06: beazley
1091 Modified the class-based interface to work with new-style or old-style classes.
1092 Patch contributed by Michael Brown (although I tweaked it slightly so it would work
1093 with older versions of Python).
1094
1095 Version 1.6
1096 ------------------------------
1097 05/27/05: beazley
1098 Incorporated patch contributed by Christopher Stawarz to fix an extremely
1099 devious bug in LALR(1) parser generation. This patch should fix problems
1100 numerous people reported with LALR parsing.
1101
1102 05/27/05: beazley
1103 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav,
1104 and Thad Austin.
1105
1106 05/27/05: beazley
1107 Added outputdir option to yacc() to control output directory. Contributed
1108 by Christopher Stawarz.
1109
1110 05/27/05: beazley
1111 Added rununit.py test script to run tests using the Python unittest module.
1112 Contributed by Miki Tebeka.
1113
1114 Version 1.5
1115 ------------------------------
1116 05/26/04: beazley
1117 Major enhancement. LALR(1) parsing support is now working.
1118 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
1119 and optimized by David Beazley. To use LALR(1) parsing do
1120 the following:
1121
1122 yacc.yacc(method="LALR")
1123
1124 Computing LALR(1) parsing tables takes about twice as long as
1125 the default SLR method. However, LALR(1) allows you to handle
1126 more complex grammars. For example, the ANSI C grammar
1127 (in example/ansic) has 13 shift-reduce conflicts with SLR, but
1128 only has 1 shift-reduce conflict with LALR(1).
1129
1130 05/20/04: beazley
1131 Added a __len__ method to parser production lists. Can
1132 be used in parser rules like this:
1133
1134 def p_somerule(p):
1135 """a : B C D
1136 | E F"
1137 if (len(p) == 3):
1138 # Must have been first rule
1139 elif (len(p) == 2):
1140 # Must be second rule
1141
1142 Suggested by Joshua Gerth and others.
1143
1144 Version 1.4
1145 ------------------------------
1146 04/23/04: beazley
1147 Incorporated a variety of patches contributed by Eric Raymond.
1148 These include:
1149
1150 0. Cleans up some comments so they don't wrap on an 80-column display.
1151 1. Directs compiler errors to stderr where they belong.
1152 2. Implements and documents automatic line counting when \n is ignored.
1153 3. Changes the way progress messages are dumped when debugging is on.
1154 The new format is both less verbose and conveys more information than
1155 the old, including shift and reduce actions.
1156
1157 04/23/04: beazley
1158 Added a Python setup.py file to simply installation. Contributed
1159 by Adam Kerrison.
1160
1161 04/23/04: beazley
1162 Added patches contributed by Adam Kerrison.
1163
1164 - Some output is now only shown when debugging is enabled. This
1165 means that PLY will be completely silent when not in debugging mode.
1166
1167 - An optional parameter "write_tables" can be passed to yacc() to
1168 control whether or not parsing tables are written. By default,
1169 it is true, but it can be turned off if you don't want the yacc
1170 table file. Note: disabling this will cause yacc() to regenerate
1171 the parsing table each time.
1172
1173 04/23/04: beazley
1174 Added patches contributed by David McNab. This patch addes two
1175 features:
1176
1177 - The parser can be supplied as a class instead of a module.
1178 For an example of this, see the example/classcalc directory.
1179
1180 - Debugging output can be directed to a filename of the user's
1181 choice. Use
1182
1183 yacc(debugfile="somefile.out")
1184
1185
1186 Version 1.3
1187 ------------------------------
1188 12/10/02: jmdyck
1189 Various minor adjustments to the code that Dave checked in today.
1190 Updated test/yacc_{inf,unused}.exp to reflect today's changes.
1191
1192 12/10/02: beazley
1193 Incorporated a variety of minor bug fixes to empty production
1194 handling and infinite recursion checking. Contributed by
1195 Michael Dyck.
1196
1197 12/10/02: beazley
1198 Removed bogus recover() method call in yacc.restart()
1199
1200 Version 1.2
1201 ------------------------------
1202 11/27/02: beazley
1203 Lexer and parser objects are now available as an attribute
1204 of tokens and slices respectively. For example:
1205
1206 def t_NUMBER(t):
1207 r'\d+'
1208 print t.lexer
1209
1210 def p_expr_plus(t):
1211 'expr: expr PLUS expr'
1212 print t.lexer
1213 print t.parser
1214
1215 This can be used for state management (if needed).
1216
1217 10/31/02: beazley
1218 Modified yacc.py to work with Python optimize mode. To make
1219 this work, you need to use
1220
1221 yacc.yacc(optimize=1)
1222
1223 Furthermore, you need to first run Python in normal mode
1224 to generate the necessary parsetab.py files. After that,
1225 you can use python -O or python -OO.
1226
1227 Note: optimized mode turns off a lot of error checking.
1228 Only use when you are sure that your grammar is working.
1229 Make sure parsetab.py is up to date!
1230
1231 10/30/02: beazley
1232 Added cloning of Lexer objects. For example:
1233
1234 import copy
1235 l = lex.lex()
1236 lc = copy.copy(l)
1237
1238 l.input("Some text")
1239 lc.input("Some other text")
1240 ...
1241
1242 This might be useful if the same "lexer" is meant to
1243 be used in different contexts---or if multiple lexers
1244 are running concurrently.
1245
1246 10/30/02: beazley
1247 Fixed subtle bug with first set computation and empty productions.
1248 Patch submitted by Michael Dyck.
1249
1250 10/30/02: beazley
1251 Fixed error messages to use "filename:line: message" instead
1252 of "filename:line. message". This makes error reporting more
1253 friendly to emacs. Patch submitted by François Pinard.
1254
1255 10/30/02: beazley
1256 Improvements to parser.out file. Terminals and nonterminals
1257 are sorted instead of being printed in random order.
1258 Patch submitted by François Pinard.
1259
1260 10/30/02: beazley
1261 Improvements to parser.out file output. Rules are now printed
1262 in a way that's easier to understand. Contributed by Russ Cox.
1263
1264 10/30/02: beazley
1265 Added 'nonassoc' associativity support. This can be used
1266 to disable the chaining of operators like a < b < c.
1267 To use, simply specify 'nonassoc' in the precedence table
1268
1269 precedence = (
1270 ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators
1271 ('left', 'PLUS', 'MINUS'),
1272 ('left', 'TIMES', 'DIVIDE'),
1273 ('right', 'UMINUS'), # Unary minus operator
1274 )
1275
1276 Patch contributed by Russ Cox.
1277
1278 10/30/02: beazley
1279 Modified the lexer to provide optional support for Python -O and -OO
1280 modes. To make this work, Python *first* needs to be run in
1281 unoptimized mode. This reads the lexing information and creates a
1282 file "lextab.py". Then, run lex like this:
1283
1284 # module foo.py
1285 ...
1286 ...
1287 lex.lex(optimize=1)
1288
1289 Once the lextab file has been created, subsequent calls to
1290 lex.lex() will read data from the lextab file instead of using
1291 introspection. In optimized mode (-O, -OO) everything should
1292 work normally despite the loss of doc strings.
1293
1294 To change the name of the file 'lextab.py' use the following:
1295
1296 lex.lex(lextab="footab")
1297
1298 (this creates a file footab.py)
1299
1300
1301 Version 1.1 October 25, 2001
1302 ------------------------------
1303
1304 10/25/01: beazley
1305 Modified the table generator to produce much more compact data.
1306 This should greatly reduce the size of the parsetab.py[c] file.
1307 Caveat: the tables still need to be constructed so a little more
1308 work is done in parsetab on import.
1309
1310 10/25/01: beazley
1311 There may be a possible bug in the cycle detector that reports errors
1312 about infinite recursion. I'm having a little trouble tracking it
1313 down, but if you get this problem, you can disable the cycle
1314 detector as follows:
1315
1316 yacc.yacc(check_recursion = 0)
1317
1318 10/25/01: beazley
1319 Fixed a bug in lex.py that sometimes caused illegal characters to be
1320 reported incorrectly. Reported by Sverre Jørgensen.
1321
1322 7/8/01 : beazley
1323 Added a reference to the underlying lexer object when tokens are handled by
1324 functions. The lexer is available as the 'lexer' attribute. This
1325 was added to provide better lexing support for languages such as Fortran
1326 where certain types of tokens can't be conveniently expressed as regular
1327 expressions (and where the tokenizing function may want to perform a
1328 little backtracking). Suggested by Pearu Peterson.
1329
1330 6/20/01 : beazley
1331 Modified yacc() function so that an optional starting symbol can be specified.
1332 For example:
1333
1334 yacc.yacc(start="statement")
1335
1336 Normally yacc always treats the first production rule as the starting symbol.
1337 However, if you are debugging your grammar it may be useful to specify
1338 an alternative starting symbol. Idea suggested by Rich Salz.
1339
1340 Version 1.0 June 18, 2001
1341 --------------------------
1342 Initial public offering
1343