996
|
1 arrtest.i is a program I wrote to test my correction to a nasty
|
|
2 ambiguity in grammar that was rendering many programs incompilable.
|
|
3
|
|
4 The problem is:
|
|
5
|
|
6 sublist1 : unambig _
|
|
7 sublist : unambig _ sublist
|
|
8 sublist : unambig _ sublist1
|
|
9
|
|
10 Token received: SPARK or EARS.
|
|
11
|
|
12 Shift/reduce conflict.
|
|
13
|
|
14 This is an example from numio.i:
|
|
15
|
|
16 DO .5 <- '?"'#65535~"'?.7$",2SUB.1"'~#21845"'~#1"$#1'~#3
|
|
17 ^
|
|
18 ambiguity
|
|
19
|
|
20 The problem is that it's unclear whether the spark/ears is the end of
|
|
21 an expression around the array reference (reduce), or whether it's the
|
|
22 start of a new subscript (shift).
|
|
23
|
|
24 In this case, the reduction is wanted. With the shift taking priority,
|
|
25 there seems to be no way to write this statement without splitting
|
|
26 it. The grammar as of INTERCAL 0.24 uses the shift by default. This is
|
|
27 because multidimensional arrays may do this sort of thing:
|
|
28
|
|
29 DO .1 <- ',2SUB.1".2~.3"'~#1
|
|
30
|
|
31 where the shift is needed to parse the ".2~.3" construct. In the first
|
|
32 case, there is no information within the next two characters that
|
|
33 indicates that it was the shift that was wanted; in the second case,
|
|
34 the fact that it's an ears, not a spark, indicates to humans that the
|
|
35 reduce is needed, but yacc/bison cannot determine this as they don't
|
|
36 have information about whether the current spark/ears level was
|
|
37 introduced with sparks or ears.
|
|
38
|
|
39 At first I thought that scanning past all spark/ears to find the first
|
|
40 non-sparkears character could solve the problem, even if it could not
|
|
41 be coded in yacc, but the following code shows the problem:
|
|
42
|
|
43 DO .1 <- ,3SUB",2SUB.1".2 (reduce required)
|
|
44
|
|
45 (It could be the start of this statement:
|
|
46
|
|
47 DO .1 <- ,3SUB",2SUB.1".2~.3"".4 (shift required))
|
|
48
|
|
49 The two statements above are the same for several tokens after the
|
|
50 ambiguity, and are even the same for 2 tokens after the forward
|
|
51 scan. This sort of problem is probably why the original manual
|
|
52 suggested alternating sparks and ears in various levels of expression
|
|
53 grouping. The only solution I can think of is:
|
|
54
|
|
55 Determine somehow whether we are on a spark level or an ears level
|
|
56
|
|
57 Force alternation of sparks and ears (constraint on the programmer)
|
|
58 at different levels of expression grouping when dealing with
|
|
59 near-ambiguous expressions involving spark/ears and arrays, like the
|
|
60 ones above (so the fourth statement above would be illegal)
|
|
61
|
|
62 Look at whether we see a spark or an ears to determine whether to
|
|
63 shift or reduce
|
|
64
|
|
65 In other words, reduce whenever it would be a legal way to close the
|
|
66 current spark/ears level, otherwise shift.
|
|
67
|
|
68 So how to determine whether we're on a spark level or an ears level?
|
|
69 The solution is for the lexer to place all the sparks and ears it
|
|
70 encounters on a spark/ears stack, and for the parser to pop two
|
|
71 sparkears off the stack whenever it reduces a SPARK [unary] <expr>
|
|
72 SPARK or EARS [unary] <expr> EARS.
|
|
73
|
|
74 Once the determination is in place, all that is required is a way of
|
|
75 determining whether to shift or reduce based on a variable in the
|
|
76 program, rather than just a priority (which wouldn't work). This
|
|
77 doesn't seem to fit into yacc's syntax, so it's possible that after
|
|
78 all, yacc can't work with the INTERCAL syntax. But there is a solution
|
|
79 (one that I used to determine COMPUCOME from COME_FROM): syntax
|
|
80 analysis in the lexer! The idea is that legal closing spark/ears have
|
|
81 different tokens from illegal closing spark/ears, so that the reduce
|
|
82 is impossible in situations where it would be illegal (the production
|
|
83 looks something like eitherspark <expr> CLOSESPARK), and the default
|
|
84 in all other situations (sort of like 'auto' in C, which noone ever
|
|
85 uses, because it's the default everywhere it isn't illegal).
|
|
86
|
|
87 The solution I've used places a limit on the maximum nesting depth of
|
|
88 an expression. Strangely, it also limits the number of sparks and ears
|
|
89 that can be written into a 'comment' to 256. As care has to be taken
|
|
90 anyway that malformed abstained statements don't look sufficiently
|
|
91 like real statements to cause errors, this should be a trivial extra
|
|
92 addition.
|
|
93
|
|
94 (Incidentally, I strongly suspect this sort of fix was used in the
|
|
95 original INTERCAL-72 compiler, because there is a very similar
|
|
96 restriction documented in the manual. Besides, it's needed to make
|
|
97 INTERCAL into an LALR(1) language.)
|
|
98
|
|
99 The program arrtest.i will not actually run, but it will compile as
|
|
100 far as degenerating C code. I don't believe every line would have come
|
|
101 out as a valid statement in any version of INTERCAL 0.24 or earlier
|
|
102 (except the one labeled as invalid, which would have come out as valid
|
|
103 but is really bad style anyway).
|
|
104
|