996
|
1 @@DATA ByteCode@@
|
|
2 <HTML>
|
|
3 <HEAD>
|
|
4 <TITLE>CLC-INTERCAL Reference</TITLE>
|
|
5 </HEAD>
|
|
6 <BODY>
|
|
7 <H1>CLC-INTERCAL Reference</H1>
|
|
8 <H2>... How to write a compiler for CLC-INTERCAL</H2>
|
|
9
|
|
10 <P>
|
|
11 <UL>
|
|
12 <LI><A HREF="index.html">Parent directory</A>
|
|
13 <LI><A HREF="#syntax">Syntax</A>
|
|
14 <LI><A HREF="#predefined">Predefined symbols</A>
|
|
15 <LI><A HREF="#special">Special Registers</A>
|
|
16 <LI><A HREF="#code">Code generation</A>
|
|
17 <LI><A HREF="#bytecode">Bytecode</A>
|
|
18 <LI><A HREF="#sick">Writing an extension for <I>sick</I></A>
|
|
19 <LI><A HREF="#examples">Examples</A>
|
|
20 </UL>
|
|
21 </P>
|
|
22
|
|
23 <P>
|
|
24 CLC-INTERCAL 1.-94 no longer includes a parser. Instead, it contains a
|
|
25 parser generator. The source language for the parser generator is a
|
|
26 includes the <CODE>CREATE</CODE> statement and the ability to assign
|
|
27 to special registers (to control the runtime operating mode).
|
|
28 This document describes the syntax of a <CODE>CREATE</CODE>
|
|
29 statement, and shows some examples.
|
|
30 </P>
|
|
31
|
|
32 <H2><A NAME="syntax">Syntax</A></H2>
|
|
33
|
|
34 <P>
|
|
35 The CREATE and DESTROY statement have the form:
|
|
36 <PRE>
|
|
37 DO CREATE <I>grammar</I> <I>class</I> <I>template</I> AS <I>code</I>
|
|
38 DO DESTROY <I>grammar</I> <I>class</I> <I>template</I>
|
|
39 </PRE>
|
|
40 </P>
|
|
41
|
|
42 <P>
|
|
43 The <I>grammar</I> is one of the two crawling horrors, and can be omitted.
|
|
44 When compiling a compiler (with <I>iacc</I>), _1 represents the compiler
|
|
45 compiler's grammar, and _2 represents the compiler being built; if the
|
|
46 grammar is omitted, it defaults to _2. When not compiling a compiler,
|
|
47 _1 is the current compiler and _2 is undefined. When using the CREATE
|
|
48 or DESTROY statements with <I>sick</I>, the grammar must be omitted,
|
|
49 and it defaults to _1.
|
|
50 </P>
|
|
51
|
|
52 <P>
|
|
53 The <I>class</I> specifies a syntactic class (some other languages might
|
|
54 call it a nonterminal). Usually, this takes the form of a what ("?")
|
|
55 followed by some alphanumerics, although anything which evaluates to a
|
|
56 number will do. Please note that in CLC-INTERCAL the what does not
|
|
57 introduce a unary logical operator as in C-INTERCAL, and it always produces
|
|
58 a named constant (of sorts).
|
|
59 <P>
|
|
60
|
|
61 <P>
|
|
62 The <I>template</I> is composed of a sequence of terminals and nonterminals
|
|
63 which vaguely resemble the syntax you are trying to define. Nonterminals
|
|
64 are specified as a special type of constant, usually introduced by the
|
|
65 "what" discussed before. Terminals are specified as "array slices", that
|
|
66 is sequences of numbers enclosed in tails or hybrids and representing a
|
|
67 16-bit array containing ASCII character codes. Abbreviations are
|
|
68 available for terminals consisting of just alhpanumerics, where the
|
|
69 characters, rather than their codes, are included between the tails.
|
|
70 </P>
|
|
71
|
|
72 <P>
|
|
73 The <I>code</I> specifies the semantics of this production. There
|
|
74 are many elements which can be used here, to produce chunks of code,
|
|
75 copy the code produced by a symbol called from the template, etc.
|
|
76 We defer discussion of this until <A HREF="#code">a later section</A>.
|
|
77 </P>
|
|
78
|
|
79 <P>
|
|
80 For example, consider the following production from sick.iacc:
|
|
81 <PRE>
|
|
82 DO CREATE _2 ?STMT_LABEL ,#40, ?CONSTANT ,#41, AS ?CONSTANT #1
|
|
83 </PRE>
|
|
84 This create a production for class ?STMT_LABEL; this matches
|
|
85 an open parenthesis (#40), a numeric constant (which is
|
|
86 parsed by ?CONSTANT), and the close parenthesis (#41). In
|
|
87 other words, this production matches a normal statement label.
|
|
88 </P>
|
|
89
|
|
90 <P>
|
|
91 Some productions parse list of elements, and the code generated
|
|
92 may contain the number of elements. In general, this is not the
|
|
93 same as the number of symbols parsed. Consider the following
|
|
94 productions to parse the list of register names used, for example,
|
|
95 in a STASH statement:
|
|
96 <PRE>
|
|
97 DO CREATE _2 ?NAMES ?RNAME AS ?RNAME #1
|
|
98 DO CREATE _2 ?NAMES ?RNAME ,#43, ?NAMES AS ?RNAME #1 + ?NAMES #1
|
|
99 </PRE>
|
|
100 To parse a list of two registers, the second production matches
|
|
101 an ?RNAME (which presumably parses a register name), an intersection
|
|
102 symbol (#43), and then matches itself recursively - in this recursive
|
|
103 call, it will use the first production to match the second register.
|
|
104 At the level of this production, we matched three symbols, ?RNAME,
|
|
105 #43 and ?NAMES. So how do we obtain a count of 2 (which is required
|
|
106 so that the STASH knows how many registers to stash?). To solve this
|
|
107 problem, each element in the production can have a numeric "count"
|
|
108 associated with it, using the syntax "=<I>number</I>". Moreover,
|
|
109 a nonterminal can have the special count "=*" to indicate that
|
|
110 the count produced by that nonterminal should be used. If a symbol
|
|
111 does not have a count, it is assumed to be "=0". The total count of
|
|
112 a production is the sum of all its counts. We rewrite the above as:
|
|
113 <PRE>
|
|
114 DO CREATE _2 ?NAMES ?RNAME=1 AS ?RNAME #1
|
|
115 DO CREATE _2 ?NAMES ?RNAME=1 ,#43, ?NAMES=* AS ?RNAME #1 + ?NAMES #1
|
|
116 </PRE>
|
|
117 Now, consider again parsing .1+.2 - we need the second production,
|
|
118 which matches .1 using ?RNAME and .2 using ?NAMES; the recursive
|
|
119 call uses the first production to match .2 using ?RNAME. What is the
|
|
120 count? the inner call has count 1, because there is just one count,
|
|
121 =1. The outer call has count 2, the =1 from ?RNAME, and the =*
|
|
122 from ?NAMES - which uses the inner count of 1. If you work out this
|
|
123 example with more than two registers, you see how it works.
|
|
124 </P>
|
|
125
|
|
126 <P>
|
|
127 The DESTROY statement works like a CREATE in reverse. Only the
|
|
128 first part, before the AS, is used. Suppose we no longer need the
|
|
129 two above production, we can remove them with:
|
|
130 <PRE>
|
|
131 DO DESTROY _2 ?NAMES ?RNAME=1
|
|
132 DO DESTROY _2 ?NAMES ?RNAME=1 ,#43, ?NAMES=*
|
|
133 </PRE>
|
|
134 </P>
|
|
135
|
|
136 <P>
|
|
137 While CREATE (and maybe DESTROY) are the major components of a
|
|
138 compiler, it is often necessary to assign values to special registers
|
|
139 and to set object flags. Special registers control the way the runtime
|
|
140 handles the generated code; the compiler uses normal assignments to
|
|
141 give these the appropriate values, and these values will be saved together
|
|
142 with the code in the object, so that they can influence the runtime
|
|
143 when the object is executed. The next section discusses special registers.
|
|
144 </P>
|
|
145
|
|
146 <P>
|
|
147 By contrast, flags are a property of the compiler itself, and are used
|
|
148 by the command-line compiler tool (<I>sick</I>) and the calculator
|
|
149 to decide what to do with an object. Flags are set by using assignments,
|
|
150 but these assignments are executed at compile time.
|
|
151 At the time of writing, the only flag is ?TYPE, which describes the
|
|
152 compiler or other extension we are building. The possible values of
|
|
153 this flag are:
|
|
154 <TABLE>
|
|
155 <TR><TH>?TYPE</TH><TH>Meaning</TH></TR>
|
|
156 <TR><TD>?ASSEMBLER</TD><TD>Compiler used to build assembler programs</TD></TR>
|
|
157 <TR><TD>?BASE</TD><TD>An object which just changes the arithmetic base</TD></TR>
|
|
158 <TR><TD>?COMPILER</TD><TD>Compiler used to compile normal programs</TD></TR>
|
|
159 <TR><TD>?EXTENSION</TD><TD>Compiler extension, e.g. new syntax</TD></TR>
|
|
160 <TR><TD>?IACC</TD><TD>Compiler used to compile other compilers</TD></TR>
|
|
161 <TR><TD>?OPTIMISER</TD><TD>Object defining code optimisations</TD></TR>
|
|
162 <TR><TD>?OPTION</TD><TD>Compiler option, e.g. change the meaning of existing syntax</TD></TR>
|
|
163 <TR><TD>?POSTPRE</TD><TD>Special object loaded after all other objects and before the source</TD></TR>
|
|
164 </TABLE>
|
|
165 At present, the system does not distinguish between ?EXTENSION and ?OPTION;
|
|
166 the command-line compiler tool accepts then indifferently, and the
|
|
167 calculator lists all these object in the Options menu.
|
|
168 </P>
|
|
169
|
|
170 <P>
|
|
171 For example, towards the start of sick.iacc one can see:
|
|
172 <PRE>
|
|
173 DO ?TYPE <- ?COMPILER
|
|
174 </PRE>
|
|
175 On the other hand, the extensions we will develop as examples in this
|
|
176 section will have:
|
|
177 <PRE>
|
|
178 DO ?TYPE <- ?EXTENSION
|
|
179 </PRE>
|
|
180 </P>
|
|
181
|
|
182 <P>
|
|
183 One further statement is of interest: MAKE NEW OPCODE. This
|
|
184 is described in <A HREF="#sick">the section about writing an
|
|
185 extension for <I>sick</I></A>.
|
|
186
|
|
187 <H2><A NAME="predefined">Predefined symbols</A></H2>
|
|
188
|
|
189 <P>
|
|
190 Some nonterminals are predefined by CLC-INTERCAL. This means that
|
|
191 you don't use CREATE statement to make them, and you can use them
|
|
192 in your compiler or extension:
|
|
193 </P>
|
|
194
|
|
195 <TABLE>
|
|
196 <TR><TH>Symbol</TH><TH>Meaning</TH></TR>
|
|
197 <TR><TD>?ANYTHING</TD><TD>Any single character</TD></TR>
|
|
198 <TR><TD>?BLACKSPACE</TD><TD>Any non-space character</TD></TR>
|
|
199 <TR><TD>?CONSTANT</TD><TD>Any numeric constant between 0 and 65535</TD></TR>
|
|
200 <TR><TD>?JUNK</TD><TD><I>See below</I></TD></TR>
|
|
201 <TR><TD>?SPACE</TD><TD>Any space character</TD></TR>
|
|
202 <TR><TD>?SYMBOL</TD><TD>Any sequence of alphanumerics or udnerscores</TD></TR>
|
|
203 </TABLE>
|
|
204
|
|
205 <P>
|
|
206 Although these symbols could be defined using CREATEs, it would be
|
|
207 rather cumbersome to do so.
|
|
208 </P>
|
|
209
|
|
210 <P>
|
|
211 The ?JUNK symbol is used to parse comments. It matches the longest
|
|
212 text which does not look like the start of a statement. Special register
|
|
213 %JS, described in the next section, defines what constitutes the start
|
|
214 of a statement. Normally, the value of %JS is ?END_JUNK, and the following
|
|
215 productions are defined by the compiler:
|
|
216 <PRE>
|
|
217 DO CREATE _2 ?END_JUNK ?STMT_LABEL AS ,,
|
|
218 DO CREATE _2 ?END_JUNK ,DO, AS ,,
|
|
219 DO CREATE _2 ?END_JUNK ,PLEASE, AS ,,
|
|
220 </PRE>
|
|
221 In other words, the start of a statement is either a label (as defined
|
|
222 in symbol ?STMT_LABEL), or one of the terminals "DO" or "PLEASE". When
|
|
223 parsing a comment, the ?JUNK symbol will therefore find the first
|
|
224 label, DO or PLEASE and matches the text in between.
|
|
225 </P>
|
|
226
|
|
227 <H2><A NAME="special">Special Registers</A></H2>
|
|
228
|
|
229 <P>
|
|
230 A number of special register control how the compiler or the runtime operates.
|
|
231 <UL>
|
|
232 @@MULTI DOUBLE_OH_SEVEN NAME@@
|
|
233 <LI>@@TYPE@@@@NAME@@ - @@DESCR@@<BR>
|
|
234 @@DOC 76 HTML@@
|
|
235 @@MULTI@@
|
|
236 </UL>
|
|
237 </P>
|
|
238
|
|
239 <H2><A NAME="code">Code generation</A></H2>
|
|
240
|
|
241 <P>
|
|
242 The right-hand side of a CREATE statement (the bit after the AS) generates
|
|
243 the code to be executed when the template matches a bit of the program.
|
|
244 </P>
|
|
245
|
|
246 <P>
|
|
247 The <I>code</I> consists of elements, separated by the intersection symbol (+); each
|
|
248 element can be one of the following:
|
|
249 <UL>
|
|
250 <LI>A symbol, followed by an expression. If the expression evaluates
|
|
251 to value <I>n</I>, this copies the code of the <I>n</I>-th occurrence
|
|
252 of the symbol in the <I>template</I>. For example:
|
|
253 <PRE>
|
|
254 DO CREATE ?SWAP ?EXPRESSION ?EXPRESSION AS ?EXPRESSION #2 + ?EXPRESSION #1
|
|
255 </PRE>
|
|
256 would generate the code for the second expression followed by the
|
|
257 code for the first one.
|
|
258
|
|
259 <LI>A bang, followed by a symbol (with or without the what),
|
|
260 followed by an expression. This matches the same symbol as the
|
|
261 previous element, but generate codes to produce the associated
|
|
262 count value.
|
|
263
|
|
264 <LI>One opcode representing bytecode. See the next section for
|
|
265 the meaning of the opcodes.
|
|
266
|
|
267 <LI>A terminal, followed by an expression. If the expression evaluates
|
|
268 to value <I>n</I>, this copies the text matched of the <I>n</I>-th
|
|
269 occurrence of the terminal in the <I>template</I>, encoding it as
|
|
270 a string.
|
|
271
|
|
272 <LI>An empty terminal (",,"). This generates no code.
|
|
273
|
|
274 <LI>A constant (in the form #<I>number</I>). This generates the
|
|
275 bytecode which evaluates to that constant.
|
|
276
|
|
277 <LI>A splat. This has a special, currently undocumented, meaning
|
|
278 which has to do with the conversion of the generated bytecode
|
|
279 to actual executable.
|
|
280 </UL>
|
|
281 </P>
|
|
282
|
|
283 <P>
|
|
284 As examples, consider the three CREATE statements listed in the previous
|
|
285 section:
|
|
286 <PRE>
|
|
287 DO CREATE _2 ?STMT_LABEL ,#40, ?CONSTANT ,#41, AS ?CONSTANT #1
|
|
288 DO CREATE _2 ?NAMES ?RNAME=1 AS ?RNAME #1
|
|
289 DO CREATE _2 ?NAMES ?RNAME=1 ,#43, ?NAMES=* AS ?RNAME #1 + ?NAMES #1
|
|
290 </PRE>
|
|
291 The first statement generates code which evaluates to the constant
|
|
292 provided inside the parentheses. This is obviously going to be used
|
|
293 in a context where this is interpreted as a label number. The second
|
|
294 statement just copies the code generated by ?RNAME; the third
|
|
295 statement just produces the code generated by ?RNAME followed by the
|
|
296 code generated by the recursive call to itself.
|
|
297 </P>
|
|
298
|
|
299 <P>
|
|
300 Now consider:
|
|
301 <PRE>
|
|
302 DO CREATE _2 ?VERB ,STASH, ?NAMES AS STA + !NAMES #1 + ?NAMES #1
|
|
303 </PRE>
|
|
304 <I>STA</I> is a bytecode opcode (which happens to correspond to the
|
|
305 STASH operation). It takes an expression, representing a count,
|
|
306 and then that number of registers. The generated code reflects
|
|
307 this: !NAMES #1 is the number of registers, and ?NAMES #1 is
|
|
308 the code generated by all the registers, one after the other.
|
|
309 </P>
|
|
310
|
|
311 <P>
|
|
312 If one wants to extend <I>sick</I> to allow direct access to the
|
|
313 base, for example using the syntax SETBASE <I>expression</I> to
|
|
314 change the base and GETBASE <I>expression</I> to get the base,
|
|
315 one could say:
|
|
316 <PRE>
|
|
317 DO CREATE ?VERB ,SETBASE, ?EXPRESSION AS STO + ?EXPRESSION #1 + %BA
|
|
318 DO CREATE ?VERB ,GETBASE, ?EXPRESSION AS STO + %BA + ?EXPRESSION #1
|
|
319 </PRE>
|
|
320 The <I>STO</I> opcode, followed by two expressions, assigns the
|
|
321 first expression to the second. In this case, the code generated
|
|
322 by SETBASE <I>expression</I> would be identical to the code
|
|
323 generated by %BA <- <I>expression</I>, and the code generated
|
|
324 by GETBASE <I>expression</I> would be identical to the code
|
|
325 generated by <I>expression</I> <- %BA.
|
|
326 </P>
|
|
327
|
|
328 <H2><A NAME="bytecode">Bytecode</A></H2>
|
|
329
|
|
330 <P>
|
|
331 The bytecode represents an intermediate form produced by the
|
|
332 compilers. This consists of opcodes which are executed in sequence.
|
|
333 At present, a bytecode interpreter executes the program, however
|
|
334 there are plans to allow direct generation of C or Perl source
|
|
335 from the bytecode.
|
|
336 </P>
|
|
337
|
|
338 <P>
|
|
339 Each byte in the bytecode can be part of a statement, an expression or a
|
|
340 register. In addition, a subset of expressions can be assigned to: these
|
|
341 are called assignable expressions. For example, a constant is an assignable
|
|
342 expression. When assigned to, it changes the value of the constant. This
|
|
343 is necessary to implement overloading and is also a great obfuscation
|
|
344 mechanism.
|
|
345 </P>
|
|
346
|
|
347 <H3>Constants</H3>
|
|
348
|
|
349 <P>
|
|
350 Constants can be specified in three ways:
|
|
351 <UL>
|
|
352 <LI>Byte larger than maximum opcode.<BR>
|
|
353 Any byte with value greather than the maximum opcode is interpreted as a
|
|
354 16 bit (spot) constant by subtracting the number of opcodes from the byte.
|
|
355 For example, since there are 128 opcodes, byte 131 is equivalent to #3,
|
|
356 and byte 255 (the maximum value) is #127
|
|
357
|
|
358 @@MULTI CONSTANTS NAME@@
|
|
359 <LI>@@NAME@@ - @@DESCR@@<BR>
|
|
360 @@DOC 76 HTML@@
|
|
361 @@MULTI@@
|
|
362 </UL>
|
|
363 </P>
|
|
364
|
|
365 <H3>Registers</H3>
|
|
366
|
|
367 <P>
|
|
368 Registers can be any number of register prefixes, followed by a type and
|
|
369 a constant. There are limitations in the useful combination of prefixes.
|
|
370 </P>
|
|
371
|
|
372 <P>
|
|
373 The register types are:
|
|
374 <UL>
|
|
375 @@MULTI REGISTERS NAME@@
|
|
376 <LI>@@NAME@@ - @@DESCR@@<BR>
|
|
377 @@DOC 76 HTML@@
|
|
378 @@MULTI@@
|
|
379 </UL>
|
|
380 </P>
|
|
381
|
|
382 <P>
|
|
383 The prefixes which can applied to registers are:
|
|
384 <UL>
|
|
385 @@MULTI PREFIXES NAME@@
|
|
386 <LI>@@NAME@@ - @@DESCR@@<BR>
|
|
387 @@DOC 76 HTML@@
|
|
388 @@MULTI@@
|
|
389 </UL>
|
|
390 </P>
|
|
391
|
|
392 <H3>Expressions</H3>
|
|
393
|
|
394 <P>
|
|
395 Assignable expressions are sequences of bytecode which can used as the
|
|
396 target of an assignment. Of course, all registers are assignable;
|
|
397 all constants are also assignable, which makes then really variables.
|
|
398 Instead of describing the assignable expressions separately, we describe
|
|
399 all expressions and mention which ones are assignable. Assigning to
|
|
400 an expression means assigning appropriate values to its subexpressions such
|
|
401 that the expression, if evaluated, would result in the value being assigned.
|
|
402 This is not always possible, so it can generate runtime errors.
|
|
403 </P>
|
|
404
|
|
405 <P>
|
|
406 In addition to registers and constants, the following are valid expressions:
|
|
407 <UL>
|
|
408 @@MULTI EXPRESSIONS NAME@@
|
|
409 <LI>@@NAME@@ - @@DESCR@@<BR>
|
|
410 @@DOC 76 HTML@@
|
|
411 @@MULTI@@
|
|
412 </UL>
|
|
413 </P>
|
|
414
|
|
415 <H3>Statements</H3>
|
|
416
|
|
417 <P>
|
|
418 The following opcodes are valid statements:
|
|
419 <UL>
|
|
420 @@MULTI STATEMENTS NAME@@
|
|
421 <LI>@@NAME@@ - @@DESCR@@<BR>
|
|
422 @@DOC 76 HTML@@
|
|
423 @@MULTI@@
|
|
424 </UL>
|
|
425 </P>
|
|
426
|
|
427 <H2><A NAME="sick">Writing an extension for <I>sick</I></A></H2>
|
|
428
|
|
429 <P>
|
|
430 Writing an extension for the <I>sick</I> compiler (or any of the
|
|
431 other compilers provided) is simply a matter of putting together
|
|
432 the material in this chapter in a way which is consistent with
|
|
433 the rest of the compiler. For this reason, this section provides
|
|
434 a description of some of the compiler internals. Please note
|
|
435 that while the compiler internals could change in future versions
|
|
436 of CLC-INTERCAL, the parts described here are unlikely to change.
|
|
437 </P>
|
|
438
|
|
439 <P>
|
|
440 The most important grammar symbols defined by <I>sick</I>,
|
|
441 <I>ick</I> and <I>1972</I> are (see below for further explanations):
|
|
442 <UL>
|
|
443 <LI>?UNARY<BR>
|
|
444 Defines a unary operator. It matches the operator name (not the
|
|
445 complete subexpression) and generates code which expects an
|
|
446 operand right after it.
|
|
447 <LI>?BINARY<BR>
|
|
448 Defines a binary operator. It matches the operator name (not the
|
|
449 complete subexpression) and generates code which expects the
|
|
450 two operands in <I>reverse</I>.
|
|
451 <LI>?VERB<BR>
|
|
452 Defines a new statement. It matches the whole "verb" part of the
|
|
453 statement (i.e. it does not match PLEASE, DO, NOT and so on)
|
|
454 and return code to execute the statement; the returned code
|
|
455 must be self-contained in that it just runs without assuming
|
|
456 any extra code being generated; if the statement is to be
|
|
457 considered a quantum statement, it must start with the opcode
|
|
458 QUA.
|
|
459 <LI>?GERUND<BR>
|
|
460 Used by ABSTAIN FROM, REINSTATE and any other statements which
|
|
461 takes a gerund list. Matches the gerund appropriate for a
|
|
462 statement, and generates a list of opcodes; it must also
|
|
463 generate the appropriate opcode count.
|
|
464 <LI>?TEMPLATE<BR>
|
|
465 Used by the <I>sick</I> compiler, but it can be defined in extensions
|
|
466 to other compilers without causing problems. Matches one statement
|
|
467 template and generates a single opcode. There is no need to associate
|
|
468 a count with this.
|
|
469 </UL>
|
|
470
|
|
471 <P>
|
|
472 When extending the expression syntax, one commonly adds a unary or
|
|
473 a binary operator. This can be easily done by adding a production
|
|
474 to symbol ?UNARY or ?BINARY, respectively. In general, the operation
|
|
475 to add is not already present in CLC-INTERCAL (otherwise there would
|
|
476 be already syntax for it), so one would use the undocumented expression
|
|
477 opcode (UNE) and an additional Perl module to implement it.
|
|
478 </P>
|
|
479
|
|
480 <P>
|
|
481 The undocumented expression opcode takes two strings, a number, and
|
|
482 then a list of expressions (the number determines how many). The
|
|
483 first string is taken to be the name of a Perl module, with
|
|
484 <CODE>Language::INTERCAL::</CODE> automatically prepended to it,
|
|
485 the second string is taken to be the name of a function to be
|
|
486 called within that module; the expressions are passed as arguments
|
|
487 to the function using the form:
|
|
488 <PRE>
|
|
489 $result = Language::INTERCAL::<I>module</I>-><I>function</I>(<I>expr</I>, <I>expr</I>...)
|
|
490 </PRE>
|
|
491 The module is automatically loaded if necessary.
|
|
492 </P>
|
|
493
|
|
494 <P>
|
|
495 Suppose, for example, we want to add an <I>überwimp</I> extension,
|
|
496 which adds two new operators: a unary logical negation and a binary
|
|
497 logical AND operator. We use the symbols "n" and "a" for these operations.
|
|
498 We start by defining the Perl module to implement them:
|
|
499 <PRE>
|
|
500 package Language::INTERCAL::Ueberwimp;
|
|
501
|
|
502 use Language::INTERCAL::Splats ':SP';
|
|
503 use Language::INTERCAL::Numbers;
|
|
504
|
|
505 sub negate {
|
|
506 @_ == 2 or faint(SP_INVALID, 'Wrong number of arguments', 'negate');
|
|
507 my ($class, $arg) = @_;
|
|
508 $arg = $arg->number;
|
|
509 return Language::INTERCAL::Numbers::Spot->new(! $arg);
|
|
510 }
|
|
511
|
|
512 sub and {
|
|
513 @_ == 3 or faint(SP_INVALID, 'Wrong number of arguments', 'and');
|
|
514 # remember we get the arguments in reverse order (OK, so it does
|
|
515 # not matter here because this operation is commutative, but in
|
|
516 # general we should remember this).
|
|
517 my ($class, $second, $first) = @_;
|
|
518 $first = $first->number;
|
|
519 $second = $second->number;
|
|
520 return Language::INTERCAL::Numbers::Spot->new($first && $second);
|
|
521 }
|
|
522
|
|
523 1;
|
|
524 </PRE>
|
|
525 This uses the runtime internals to get a Perl number from the arguments
|
|
526 (this would automatically splat if the argument happens to be something
|
|
527 other than a number), and create a new Spot value. Now, in order to
|
|
528 use this module, we need to add the syntax and code for it:
|
|
529 <PRE>
|
|
530 DO ?TYPE <- ?EXTENSION
|
|
531 DO CREATE ?UNARY ,n, AS
|
|
532 UNE +
|
|
533 MUL + #9 + #85 + #101 + #98 + #101 + #114 + #119 + #105 + #109 + #112 +
|
|
534 MUL + #6 + #110 + #101 + #103 + #97 + #116 + #101 +
|
|
535 #1
|
|
536 DO CREATE ?BINARY ,a, AS
|
|
537 UNE +
|
|
538 MUL + #9 + #85 + #101 + #98 + #101 + #114 + #119 + #105 + #109 + #112 +
|
|
539 MUL + #3 + #97 + #110 + #100 +
|
|
540 #2
|
|
541 PLEASE GIVE UP
|
|
542 </PRE>
|
|
543 This example shows one way of creating strings in bytecode: the MUL
|
|
544 opcode, followed by the number of characters in the string, followed
|
|
545 by the character codes. Note that the module and function name are
|
|
546 provided in ASCII, but if the function requires any string arguments
|
|
547 these are provided in Baudot for compatibility with alphanumeric I/O.
|
|
548 In any case, we pass the string "Ueberwimp" (the Perl module name)
|
|
549 and either "negate" or "and" as the first two arguments to UNE; the
|
|
550 third argument is the number of expressions to follow, which will
|
|
551 be #1 for "negate" and #2 for "and". The expressions will be automatically
|
|
552 provided by the rest of the compiler.
|
|
553 </P>
|
|
554
|
|
555 <P>
|
|
556 To use this extension, save the above INTERCAL code to a file,
|
|
557 say <I>ueberwimp.iacc</I>, and compile it with:
|
|
558 <PRE>
|
|
559 sick ueberwimp.iacc
|
|
560 </PRE>
|
|
561 Then save the above Perl code in a file <I>Ueberwimp.pm</I> somewhere
|
|
562 your Perl interpreter will be able to find it. To use this extension,
|
|
563 to compile <I>yourprogram.i</I> you just say:
|
|
564 <PRE>
|
|
565 sick -psick -pueberwimp yourprogram.i
|
|
566 </PRE>
|
|
567 For example, if the program contains "DO .1 <- .n2" or "DO .1 <- .2 a .3"
|
|
568 this will automatically load your Perl module and call its negate or and
|
|
569 method, as required.
|
|
570 </P>
|
|
571
|
|
572 <P>
|
|
573 <I>Special note</I> - the rest of this section contains information
|
|
574 which may change in future. Implementing new statements is not fully
|
|
575 supported yet.
|
|
576 </P>
|
|
577
|
|
578 <P>
|
|
579 The procedure to add a new statement is very similar to adding operators,
|
|
580 however you use the Undocumented Statement (<I>UNS</I>) opcode which
|
|
581 is almost identical to the Undocumented Expression except it does not
|
|
582 return a value. It does, however, take the same arguments and expects
|
|
583 you to write a corresponding Perl module.
|
|
584 </P>
|
|
585
|
|
586 <P>
|
|
587 Since statements can be referred to by gerund or template, each
|
|
588 form of the statement must have a unique identifier; statements defined
|
|
589 by CLC-INTERCAL use the bytecode opcode number for that, but if you
|
|
590 use <I>UNS</I> you must specify your own gerund - just pick a number
|
|
591 between #256 and #65535 which has not been used by other extensions.
|
|
592 </P>
|
|
593
|
|
594 <P>
|
|
595 Once all this is in place, you need to define your syntax by adding
|
|
596 rules for the ?VERB symbol; you also create as many rules for the
|
|
597 ?TEMPLATE symbol as there are different forms for your statement;
|
|
598 finally you add one rule for the ?GERUND symbol returning all
|
|
599 possible gerund identifiers, and setting the count value to the
|
|
600 appropriate value.
|
|
601 </P>
|
|
602
|
|
603 <P>
|
|
604 We understand it is about time to provide an example. Let's say
|
|
605 you want to do some form of code profiling, and you start by
|
|
606 adding two statements which signal the start and the end of a
|
|
607 profiling block. You want to be able to say:
|
|
608 <PRE>
|
|
609 DO PROFILE ON #1234
|
|
610 ....
|
|
611 DO PROFILE OFF #1234
|
|
612 </PRE>
|
|
613 And see on your standard error something like:
|
|
614 <PRE>
|
|
615 1174382975.857 ON 1234
|
|
616 ....
|
|
617 1174382975.868 OFF 1234
|
|
618 </PRE>
|
|
619 (The 1174382975 is just a Unix timestamp, which happens to mean
|
|
620 Tue Mar 20 09:29:35 2007 - guess when this was written?). The
|
|
621 assumption is that you'll also write a program to analyse this
|
|
622 output and tell you where your program is being slow.
|
|
623 As before, you start with a Perl module:
|
|
624 <PRE>
|
|
625 package Language::INTERCAL::Profile;
|
|
626
|
|
627 use Language::INTERCAL::Splats ':SP';
|
|
628 use Time::HiRes 'gettimeofday';
|
|
629
|
|
630 sub on {
|
|
631 @_ == 2 or faint(SP_INVALID, 'Wrong number of arguments', 'Profile ON');
|
|
632 my ($class, $arg) = @_;
|
|
633 $arg = $arg->number;
|
|
634 my ($sec, $msec) = gettimeofday;
|
|
635 fprintf STDERR "%d.%03d ON %d\n", $sec, $msec / 1000, $arg;
|
|
636 }
|
|
637
|
|
638 sub off {
|
|
639 @_ == 2 or faint(SP_INVALID, 'Wrong number of arguments', 'Profile OFF');
|
|
640 my ($class, $arg) = @_;
|
|
641 $arg = $arg->number;
|
|
642 my ($sec, $msec) = gettimeofday;
|
|
643 fprintf STDERR "%d.%03d OFF %d\n", $sec, $msec / 1000, $arg;
|
|
644 }
|
|
645
|
|
646 1;
|
|
647 </PRE>
|
|
648 Next, you write a compiler extension to add the required syntax and
|
|
649 generate the code:
|
|
650 <PRE>
|
|
651 DO ?TYPE <- ?EXTENSION
|
|
652 DO MAKE NEW OPCODE #666 ,E, AS
|
|
653 UNS +
|
|
654 MUL + #7 + #80 + #114 + #111 + #102 + #105 + #108 + #101 +
|
|
655 MUL + #2 + #111 + #110 +
|
|
656 #1
|
|
657 DO MAKE NEW OPCODE #666 ,E, AS
|
|
658 UNS +
|
|
659 MUL + #7 + #80 + #114 + #111 + #102 + #105 + #108 + #101 +
|
|
660 MUL + #3 + #111 + #102 + #102 +
|
|
661 #1
|
|
662 DO CREATE ?VERB ,PROFILE, ,ON, ?EXPRESSION AS
|
|
663 USG + #666 + ?EXPRESSION #1
|
|
664 DO CREATE ?VERB ,PROFILE, ,OFF, ?EXPRESSION AS
|
|
665 USG + #667 + ?EXPRESSION #1
|
|
666 DO CREATE ?TEMPLATE ,PROFILE, ,ON, ,EXPRESSION, AS #666
|
|
667 DO CREATE ?TEMPLATE ,PROFILE, ,OFF, ,EXPRESSION, AS #667
|
|
668 DO CREATE ?GERUND ,PROFILING,=2 AS #666 + #667
|
|
669 PLEASE GIVE UP
|
|
670 </PRE>
|
|
671 First we need to register new operation codes ("gerund codes")
|
|
672 with the runtime. This is done by the two MAKE NEW OPCODE
|
|
673 statements. The opcodes, #666 and #667, are very similar: they
|
|
674 both take one expression as arguments (that's the ,E,), and
|
|
675 they are implemented by a call to UNS with the appropriate
|
|
676 parameters ("Profile" and the "on" or "off", respectively).
|
|
677 After that, it is just a matter of using the new opcodes in
|
|
678 the right place: the first two CREATE statements use <I>USG</I>
|
|
679 ("use gerund") followed by the appropriate opcode and its
|
|
680 arguments.
|
|
681 </P>
|
|
682
|
|
683 <P>
|
|
684 The two CREATE ?TEMPLATE statements define the two statement
|
|
685 templates corresponding to the two previous definitions. They
|
|
686 match strings "PROFILE ON" and "PROFILE OFF" and return the
|
|
687 corresponding gerund (as a number, without using the <I>USG</I>
|
|
688 opcode). Having defined these two templates, you are now
|
|
689 allowed to confuse your profiling system with:
|
|
690 <PRE>
|
|
691 PLEASE SWAP PROFILE ON AND PROFILE OFF
|
|
692 </PRE>
|
|
693 Note that the two new gerunds were defined as taking one expression
|
|
694 as argument: therefore they can be swapped with any other
|
|
695 statement which takes just one expression:
|
|
696 <PRE>
|
|
697 PLEASE SWAP PROFILE ON AND RESUME EXPRESSION
|
|
698 PLEASE CONVERT PROFILE OFF TO FORGET EXPRESSION
|
|
699 </PRE>
|
|
700 </P>
|
|
701
|
|
702 <P>
|
|
703 The last CREATE statement defined the gerund PROFILING, so
|
|
704 you can control whether this output is produced by using
|
|
705 DO ABSTAIN FROM PROFILING and DO REINSTATE PROFILING.
|
|
706 Note that you return both gerunds here, and also set the
|
|
707 count as appropriate (with the =2 after ,PROFILING,) so
|
|
708 that the rest of the compiler knows how many gerunds you
|
|
709 are trying to add.
|
|
710 </P>
|
|
711
|
|
712 <H2><A NAME="examples">Examples</A></H2>
|
|
713
|
|
714 <P>
|
|
715 The code for <I>computed-labels.iacc</I> is:
|
|
716 <PRE>
|
|
717 DO ?TYPE <- ?EXTENSION
|
|
718 DO CREATE _2 ?STMT_LABEL ,#40, ?EXPRESSION ,#41, AS ?EXPRESSION #1
|
|
719 DO GIVE UP
|
|
720 </PRE>
|
|
721 The ?TYPE flag is set to extension because this program extends
|
|
722 the syntax of an existing compiler. The second statement extends
|
|
723 the grammar;
|
|
724 we have already seen that a standard label is parsed by stmbol
|
|
725 ?STMT_LABEL and conststs of an open parenthesis, a constant,
|
|
726 and a close parenthesis. The CREATE statement in this extension
|
|
727 adds a second production for ?STMT_LABEL, allowing any expression
|
|
728 in addition to the constant. As a result, a non-computed label
|
|
729 can now be written in two ways, for example (1) and (#1).
|
|
730 </P>
|
|
731
|
|
732 <P>
|
|
733 The distribution also includes six very similar programs, with
|
|
734 names <I>2.iacc</I>, <I>3.iacc</I> etc. We show <I>5.iacc</I>:
|
|
735 <PRE>
|
|
736 DO ?TYPE <- ?BASE
|
|
737 PLEASE %BA <- #5
|
|
738 DO GIVE UP
|
|
739 </PRE>
|
|
740 The ?TYPE flag is set to base here, because this is what thie
|
|
741 program does: it changes the arithmetic base. The only thing
|
|
742 it needs to do is to assign #5 to special register %BA.
|
|
743 </P>
|
|
744
|
|
745 <P>
|
|
746 As a final example, <I>next.iacc</I> allows to extend the <I>sick</I>
|
|
747 compiler with a NEXT statement.
|
|
748 <PRE>
|
|
749 DO ?TYPE <- ?EXTENSION
|
|
750 DO CREATE _2 ?VERB ?LABEL ,NEXT, ?Q_NEXT AS ?Q_NEXT #1 + NXT + ?LABEL #1
|
|
751 DO CREATE _2 ?Q_NEXT ,, AS ,,
|
|
752 DO CREATE _2 ?Q_NEXT ,WHILE, ,NOT, ,NEXTING, AS QUA
|
|
753 DO CREATE _2 ?GERUND ,NEXTING,=1 AS NXT
|
|
754 DO CREATE _2 ?TEMPLATE ,LABEL, ,NEXT, AS NXT
|
|
755 DO GIVE UP
|
|
756 </PRE>
|
|
757 Again, the ?TYPE flag is extension. This time there are several additions
|
|
758 to the grammar. The first CREATE statement adds the actual NEXT, using
|
|
759 the ?LABEL symbol already present in <I>sick</I>, as well as another
|
|
760 auxiliary symbol, ?Q_NEXT, which is defined in the following two statements:
|
|
761 it can be empty, and generate no code, or it can parse the text
|
|
762 WHILE NOT NEXTING, in which case it adds QUA (QUAntum) to the generated
|
|
763 code. Since we are adding a new statement, we also need to extend the
|
|
764 definition of ?GERUND (used by ABSTAIN FROM etc) and of ?TEMPLATE
|
|
765 (used by CONVERT, SWAP, as well as the template form of ABSTAIN FROM etc).
|
|
766 Note that, unlike the case of user-generated statements discussed in a
|
|
767 previous section, we can use the statement's opcode as gerund.
|
|
768 </P>
|
|
769 </BODY>
|
|
770 </HTML>
|
|
771
|