Mercurial > repo
comparison interps/c-intercal/pit/tests/arrtest.doc @ 996:859f9b4339e6
<Gregor> tar xf egobot.tar.xz
author | HackBot |
---|---|
date | Sun, 09 Dec 2012 19:30:08 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
995:6883f5911eb7 | 996:859f9b4339e6 |
---|---|
1 arrtest.i is a program I wrote to test my correction to a nasty | |
2 ambiguity in grammar that was rendering many programs incompilable. | |
3 | |
4 The problem is: | |
5 | |
6 sublist1 : unambig _ | |
7 sublist : unambig _ sublist | |
8 sublist : unambig _ sublist1 | |
9 | |
10 Token received: SPARK or EARS. | |
11 | |
12 Shift/reduce conflict. | |
13 | |
14 This is an example from numio.i: | |
15 | |
16 DO .5 <- '?"'#65535~"'?.7$",2SUB.1"'~#21845"'~#1"$#1'~#3 | |
17 ^ | |
18 ambiguity | |
19 | |
20 The problem is that it's unclear whether the spark/ears is the end of | |
21 an expression around the array reference (reduce), or whether it's the | |
22 start of a new subscript (shift). | |
23 | |
24 In this case, the reduction is wanted. With the shift taking priority, | |
25 there seems to be no way to write this statement without splitting | |
26 it. The grammar as of INTERCAL 0.24 uses the shift by default. This is | |
27 because multidimensional arrays may do this sort of thing: | |
28 | |
29 DO .1 <- ',2SUB.1".2~.3"'~#1 | |
30 | |
31 where the shift is needed to parse the ".2~.3" construct. In the first | |
32 case, there is no information within the next two characters that | |
33 indicates that it was the shift that was wanted; in the second case, | |
34 the fact that it's an ears, not a spark, indicates to humans that the | |
35 reduce is needed, but yacc/bison cannot determine this as they don't | |
36 have information about whether the current spark/ears level was | |
37 introduced with sparks or ears. | |
38 | |
39 At first I thought that scanning past all spark/ears to find the first | |
40 non-sparkears character could solve the problem, even if it could not | |
41 be coded in yacc, but the following code shows the problem: | |
42 | |
43 DO .1 <- ,3SUB",2SUB.1".2 (reduce required) | |
44 | |
45 (It could be the start of this statement: | |
46 | |
47 DO .1 <- ,3SUB",2SUB.1".2~.3"".4 (shift required)) | |
48 | |
49 The two statements above are the same for several tokens after the | |
50 ambiguity, and are even the same for 2 tokens after the forward | |
51 scan. This sort of problem is probably why the original manual | |
52 suggested alternating sparks and ears in various levels of expression | |
53 grouping. The only solution I can think of is: | |
54 | |
55 Determine somehow whether we are on a spark level or an ears level | |
56 | |
57 Force alternation of sparks and ears (constraint on the programmer) | |
58 at different levels of expression grouping when dealing with | |
59 near-ambiguous expressions involving spark/ears and arrays, like the | |
60 ones above (so the fourth statement above would be illegal) | |
61 | |
62 Look at whether we see a spark or an ears to determine whether to | |
63 shift or reduce | |
64 | |
65 In other words, reduce whenever it would be a legal way to close the | |
66 current spark/ears level, otherwise shift. | |
67 | |
68 So how to determine whether we're on a spark level or an ears level? | |
69 The solution is for the lexer to place all the sparks and ears it | |
70 encounters on a spark/ears stack, and for the parser to pop two | |
71 sparkears off the stack whenever it reduces a SPARK [unary] <expr> | |
72 SPARK or EARS [unary] <expr> EARS. | |
73 | |
74 Once the determination is in place, all that is required is a way of | |
75 determining whether to shift or reduce based on a variable in the | |
76 program, rather than just a priority (which wouldn't work). This | |
77 doesn't seem to fit into yacc's syntax, so it's possible that after | |
78 all, yacc can't work with the INTERCAL syntax. But there is a solution | |
79 (one that I used to determine COMPUCOME from COME_FROM): syntax | |
80 analysis in the lexer! The idea is that legal closing spark/ears have | |
81 different tokens from illegal closing spark/ears, so that the reduce | |
82 is impossible in situations where it would be illegal (the production | |
83 looks something like eitherspark <expr> CLOSESPARK), and the default | |
84 in all other situations (sort of like 'auto' in C, which noone ever | |
85 uses, because it's the default everywhere it isn't illegal). | |
86 | |
87 The solution I've used places a limit on the maximum nesting depth of | |
88 an expression. Strangely, it also limits the number of sparks and ears | |
89 that can be written into a 'comment' to 256. As care has to be taken | |
90 anyway that malformed abstained statements don't look sufficiently | |
91 like real statements to cause errors, this should be a trivial extra | |
92 addition. | |
93 | |
94 (Incidentally, I strongly suspect this sort of fix was used in the | |
95 original INTERCAL-72 compiler, because there is a very similar | |
96 restriction documented in the manual. Besides, it's needed to make | |
97 INTERCAL into an LALR(1) language.) | |
98 | |
99 The program arrtest.i will not actually run, but it will compile as | |
100 far as degenerating C code. I don't believe every line would have come | |
101 out as a valid statement in any version of INTERCAL 0.24 or earlier | |
102 (except the one labeled as invalid, which would have come out as valid | |
103 but is really bad style anyway). | |
104 |