diff interps/c-intercal/pit/tests/arrtest.doc @ 996:859f9b4339e6

<Gregor> tar xf egobot.tar.xz
author HackBot
date Sun, 09 Dec 2012 19:30:08 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/interps/c-intercal/pit/tests/arrtest.doc	Sun Dec 09 19:30:08 2012 +0000
@@ -0,0 +1,104 @@
+arrtest.i is a program I wrote to test my correction to a nasty
+ambiguity in grammar that was rendering many programs incompilable.
+
+The problem is:
+
+sublist1 : unambig _
+sublist : unambig _ sublist
+sublist : unambig _ sublist1
+
+Token received: SPARK or EARS.
+
+Shift/reduce conflict.
+
+This is an example from numio.i:
+
+DO .5 <- '?"'#65535~"'?.7$",2SUB.1"'~#21845"'~#1"$#1'~#3
+                                  ^
+                              ambiguity
+
+The problem is that it's unclear whether the spark/ears is the end of
+an expression around the array reference (reduce), or whether it's the
+start of a new subscript (shift).
+
+In this case, the reduction is wanted. With the shift taking priority,
+there seems to be no way to write this statement without splitting
+it. The grammar as of INTERCAL 0.24 uses the shift by default. This is
+because multidimensional arrays may do this sort of thing:
+
+DO .1 <- ',2SUB.1".2~.3"'~#1
+
+where the shift is needed to parse the ".2~.3" construct. In the first
+case, there is no information within the next two characters that
+indicates that it was the shift that was wanted; in the second case,
+the fact that it's an ears, not a spark, indicates to humans that the
+reduce is needed, but yacc/bison cannot determine this as they don't
+have information about whether the current spark/ears level was
+introduced with sparks or ears. 
+
+At first I thought that scanning past all spark/ears to find the first
+non-sparkears character could solve the problem, even if it could not
+be coded in yacc, but the following code shows the problem:
+
+DO .1 <- ,3SUB",2SUB.1".2 (reduce required)
+
+(It could be the start of this statement:
+
+DO .1 <- ,3SUB",2SUB.1".2~.3"".4 (shift required))
+
+The two statements above are the same for several tokens after the
+ambiguity, and are even the same for 2 tokens after the forward
+scan. This sort of problem is probably why the original manual
+suggested alternating sparks and ears in various levels of expression
+grouping. The only solution I can think of is:
+
+Determine somehow whether we are on a spark level or an ears level
+
+Force alternation of sparks and ears (constraint on the programmer)
+at different levels of expression grouping when dealing with
+near-ambiguous expressions involving spark/ears and arrays, like the
+ones above (so the fourth statement above would be illegal)
+
+Look at whether we see a spark or an ears to determine whether to
+shift or reduce
+
+In other words, reduce whenever it would be a legal way to close the
+current spark/ears level, otherwise shift.
+
+So how to determine whether we're on a spark level or an ears level?
+The solution is for the lexer to place all the sparks and ears it
+encounters on a spark/ears stack, and for the parser to pop two
+sparkears off the stack whenever it reduces a SPARK [unary] <expr>
+SPARK or EARS [unary] <expr> EARS.
+
+Once the determination is in place, all that is required is a way of
+determining whether to shift or reduce based on a variable in the
+program, rather than just a priority (which wouldn't work). This
+doesn't seem to fit into yacc's syntax, so it's possible that after
+all, yacc can't work with the INTERCAL syntax. But there is a solution
+(one that I used to determine COMPUCOME from COME_FROM): syntax
+analysis in the lexer! The idea is that legal closing spark/ears have
+different tokens from illegal closing spark/ears, so that the reduce
+is impossible in situations where it would be illegal (the production
+looks something like eitherspark <expr> CLOSESPARK), and the default
+in all other situations (sort of like 'auto' in C, which noone ever
+uses, because it's the default everywhere it isn't illegal).
+
+The solution I've used places a limit on the maximum nesting depth of
+an expression. Strangely, it also limits the number of sparks and ears
+that can be written into a 'comment' to 256. As care has to be taken
+anyway that malformed abstained statements don't look sufficiently
+like real statements to cause errors, this should be a trivial extra
+addition.
+
+(Incidentally, I strongly suspect this sort of fix was used in the
+original INTERCAL-72 compiler, because there is a very similar
+restriction documented in the manual. Besides, it's needed to make
+INTERCAL into an LALR(1) language.)
+
+The program arrtest.i will not actually run, but it will compile as
+far as degenerating C code. I don't believe every line would have come
+out as a valid statement in any version of INTERCAL 0.24 or earlier
+(except the one labeled as invalid, which would have come out as valid
+but is really bad style anyway).
+