comparison interps/c-intercal/pit/tests/arrtest.doc @ 996:859f9b4339e6

<Gregor> tar xf egobot.tar.xz
author HackBot
date Sun, 09 Dec 2012 19:30:08 +0000
parents
children
comparison
equal deleted inserted replaced
995:6883f5911eb7 996:859f9b4339e6
1 arrtest.i is a program I wrote to test my correction to a nasty
2 ambiguity in grammar that was rendering many programs incompilable.
3
4 The problem is:
5
6 sublist1 : unambig _
7 sublist : unambig _ sublist
8 sublist : unambig _ sublist1
9
10 Token received: SPARK or EARS.
11
12 Shift/reduce conflict.
13
14 This is an example from numio.i:
15
16 DO .5 <- '?"'#65535~"'?.7$",2SUB.1"'~#21845"'~#1"$#1'~#3
17 ^
18 ambiguity
19
20 The problem is that it's unclear whether the spark/ears is the end of
21 an expression around the array reference (reduce), or whether it's the
22 start of a new subscript (shift).
23
24 In this case, the reduction is wanted. With the shift taking priority,
25 there seems to be no way to write this statement without splitting
26 it. The grammar as of INTERCAL 0.24 uses the shift by default. This is
27 because multidimensional arrays may do this sort of thing:
28
29 DO .1 <- ',2SUB.1".2~.3"'~#1
30
31 where the shift is needed to parse the ".2~.3" construct. In the first
32 case, there is no information within the next two characters that
33 indicates that it was the shift that was wanted; in the second case,
34 the fact that it's an ears, not a spark, indicates to humans that the
35 reduce is needed, but yacc/bison cannot determine this as they don't
36 have information about whether the current spark/ears level was
37 introduced with sparks or ears.
38
39 At first I thought that scanning past all spark/ears to find the first
40 non-sparkears character could solve the problem, even if it could not
41 be coded in yacc, but the following code shows the problem:
42
43 DO .1 <- ,3SUB",2SUB.1".2 (reduce required)
44
45 (It could be the start of this statement:
46
47 DO .1 <- ,3SUB",2SUB.1".2~.3"".4 (shift required))
48
49 The two statements above are the same for several tokens after the
50 ambiguity, and are even the same for 2 tokens after the forward
51 scan. This sort of problem is probably why the original manual
52 suggested alternating sparks and ears in various levels of expression
53 grouping. The only solution I can think of is:
54
55 Determine somehow whether we are on a spark level or an ears level
56
57 Force alternation of sparks and ears (constraint on the programmer)
58 at different levels of expression grouping when dealing with
59 near-ambiguous expressions involving spark/ears and arrays, like the
60 ones above (so the fourth statement above would be illegal)
61
62 Look at whether we see a spark or an ears to determine whether to
63 shift or reduce
64
65 In other words, reduce whenever it would be a legal way to close the
66 current spark/ears level, otherwise shift.
67
68 So how to determine whether we're on a spark level or an ears level?
69 The solution is for the lexer to place all the sparks and ears it
70 encounters on a spark/ears stack, and for the parser to pop two
71 sparkears off the stack whenever it reduces a SPARK [unary] <expr>
72 SPARK or EARS [unary] <expr> EARS.
73
74 Once the determination is in place, all that is required is a way of
75 determining whether to shift or reduce based on a variable in the
76 program, rather than just a priority (which wouldn't work). This
77 doesn't seem to fit into yacc's syntax, so it's possible that after
78 all, yacc can't work with the INTERCAL syntax. But there is a solution
79 (one that I used to determine COMPUCOME from COME_FROM): syntax
80 analysis in the lexer! The idea is that legal closing spark/ears have
81 different tokens from illegal closing spark/ears, so that the reduce
82 is impossible in situations where it would be illegal (the production
83 looks something like eitherspark <expr> CLOSESPARK), and the default
84 in all other situations (sort of like 'auto' in C, which noone ever
85 uses, because it's the default everywhere it isn't illegal).
86
87 The solution I've used places a limit on the maximum nesting depth of
88 an expression. Strangely, it also limits the number of sparks and ears
89 that can be written into a 'comment' to 256. As care has to be taken
90 anyway that malformed abstained statements don't look sufficiently
91 like real statements to cause errors, this should be a trivial extra
92 addition.
93
94 (Incidentally, I strongly suspect this sort of fix was used in the
95 original INTERCAL-72 compiler, because there is a very similar
96 restriction documented in the manual. Besides, it's needed to make
97 INTERCAL into an LALR(1) language.)
98
99 The program arrtest.i will not actually run, but it will compile as
100 far as degenerating C code. I don't believe every line would have come
101 out as a valid statement in any version of INTERCAL 0.24 or earlier
102 (except the one labeled as invalid, which would have come out as valid
103 but is really bad style anyway).
104