Mercurial > repo
view interps/clc-intercal/CLC-INTERCAL-Docs-1.-94.-2/doc/html/parsers.hgen @ 6287:2deb913a7d54
<tswett> echo \'It takes strength to admit your own weaknesses.\' > good/755
author | HackBot |
---|---|
date | Fri, 27 Nov 2015 01:38:03 +0000 |
parents | 859f9b4339e6 |
children |
line wrap: on
line source
@@DATA ByteCode@@ <HTML> <HEAD> <TITLE>CLC-INTERCAL Reference</TITLE> </HEAD> <BODY> <H1>CLC-INTERCAL Reference</H1> <H2>... How to write a compiler for CLC-INTERCAL</H2> <P> <UL> <LI><A HREF="index.html">Parent directory</A> <LI><A HREF="#syntax">Syntax</A> <LI><A HREF="#predefined">Predefined symbols</A> <LI><A HREF="#special">Special Registers</A> <LI><A HREF="#code">Code generation</A> <LI><A HREF="#bytecode">Bytecode</A> <LI><A HREF="#sick">Writing an extension for <I>sick</I></A> <LI><A HREF="#examples">Examples</A> </UL> </P> <P> CLC-INTERCAL 1.-94 no longer includes a parser. Instead, it contains a parser generator. The source language for the parser generator is a includes the <CODE>CREATE</CODE> statement and the ability to assign to special registers (to control the runtime operating mode). This document describes the syntax of a <CODE>CREATE</CODE> statement, and shows some examples. </P> <H2><A NAME="syntax">Syntax</A></H2> <P> The CREATE and DESTROY statement have the form: <PRE> DO CREATE <I>grammar</I> <I>class</I> <I>template</I> AS <I>code</I> DO DESTROY <I>grammar</I> <I>class</I> <I>template</I> </PRE> </P> <P> The <I>grammar</I> is one of the two crawling horrors, and can be omitted. When compiling a compiler (with <I>iacc</I>), _1 represents the compiler compiler's grammar, and _2 represents the compiler being built; if the grammar is omitted, it defaults to _2. When not compiling a compiler, _1 is the current compiler and _2 is undefined. When using the CREATE or DESTROY statements with <I>sick</I>, the grammar must be omitted, and it defaults to _1. </P> <P> The <I>class</I> specifies a syntactic class (some other languages might call it a nonterminal). Usually, this takes the form of a what ("?") followed by some alphanumerics, although anything which evaluates to a number will do. Please note that in CLC-INTERCAL the what does not introduce a unary logical operator as in C-INTERCAL, and it always produces a named constant (of sorts). <P> <P> The <I>template</I> is composed of a sequence of terminals and nonterminals which vaguely resemble the syntax you are trying to define. Nonterminals are specified as a special type of constant, usually introduced by the "what" discussed before. Terminals are specified as "array slices", that is sequences of numbers enclosed in tails or hybrids and representing a 16-bit array containing ASCII character codes. Abbreviations are available for terminals consisting of just alhpanumerics, where the characters, rather than their codes, are included between the tails. </P> <P> The <I>code</I> specifies the semantics of this production. There are many elements which can be used here, to produce chunks of code, copy the code produced by a symbol called from the template, etc. We defer discussion of this until <A HREF="#code">a later section</A>. </P> <P> For example, consider the following production from sick.iacc: <PRE> DO CREATE _2 ?STMT_LABEL ,#40, ?CONSTANT ,#41, AS ?CONSTANT #1 </PRE> This create a production for class ?STMT_LABEL; this matches an open parenthesis (#40), a numeric constant (which is parsed by ?CONSTANT), and the close parenthesis (#41). In other words, this production matches a normal statement label. </P> <P> Some productions parse list of elements, and the code generated may contain the number of elements. In general, this is not the same as the number of symbols parsed. Consider the following productions to parse the list of register names used, for example, in a STASH statement: <PRE> DO CREATE _2 ?NAMES ?RNAME AS ?RNAME #1 DO CREATE _2 ?NAMES ?RNAME ,#43, ?NAMES AS ?RNAME #1 + ?NAMES #1 </PRE> To parse a list of two registers, the second production matches an ?RNAME (which presumably parses a register name), an intersection symbol (#43), and then matches itself recursively - in this recursive call, it will use the first production to match the second register. At the level of this production, we matched three symbols, ?RNAME, #43 and ?NAMES. So how do we obtain a count of 2 (which is required so that the STASH knows how many registers to stash?). To solve this problem, each element in the production can have a numeric "count" associated with it, using the syntax "=<I>number</I>". Moreover, a nonterminal can have the special count "=*" to indicate that the count produced by that nonterminal should be used. If a symbol does not have a count, it is assumed to be "=0". The total count of a production is the sum of all its counts. We rewrite the above as: <PRE> DO CREATE _2 ?NAMES ?RNAME=1 AS ?RNAME #1 DO CREATE _2 ?NAMES ?RNAME=1 ,#43, ?NAMES=* AS ?RNAME #1 + ?NAMES #1 </PRE> Now, consider again parsing .1+.2 - we need the second production, which matches .1 using ?RNAME and .2 using ?NAMES; the recursive call uses the first production to match .2 using ?RNAME. What is the count? the inner call has count 1, because there is just one count, =1. The outer call has count 2, the =1 from ?RNAME, and the =* from ?NAMES - which uses the inner count of 1. If you work out this example with more than two registers, you see how it works. </P> <P> The DESTROY statement works like a CREATE in reverse. Only the first part, before the AS, is used. Suppose we no longer need the two above production, we can remove them with: <PRE> DO DESTROY _2 ?NAMES ?RNAME=1 DO DESTROY _2 ?NAMES ?RNAME=1 ,#43, ?NAMES=* </PRE> </P> <P> While CREATE (and maybe DESTROY) are the major components of a compiler, it is often necessary to assign values to special registers and to set object flags. Special registers control the way the runtime handles the generated code; the compiler uses normal assignments to give these the appropriate values, and these values will be saved together with the code in the object, so that they can influence the runtime when the object is executed. The next section discusses special registers. </P> <P> By contrast, flags are a property of the compiler itself, and are used by the command-line compiler tool (<I>sick</I>) and the calculator to decide what to do with an object. Flags are set by using assignments, but these assignments are executed at compile time. At the time of writing, the only flag is ?TYPE, which describes the compiler or other extension we are building. The possible values of this flag are: <TABLE> <TR><TH>?TYPE</TH><TH>Meaning</TH></TR> <TR><TD>?ASSEMBLER</TD><TD>Compiler used to build assembler programs</TD></TR> <TR><TD>?BASE</TD><TD>An object which just changes the arithmetic base</TD></TR> <TR><TD>?COMPILER</TD><TD>Compiler used to compile normal programs</TD></TR> <TR><TD>?EXTENSION</TD><TD>Compiler extension, e.g. new syntax</TD></TR> <TR><TD>?IACC</TD><TD>Compiler used to compile other compilers</TD></TR> <TR><TD>?OPTIMISER</TD><TD>Object defining code optimisations</TD></TR> <TR><TD>?OPTION</TD><TD>Compiler option, e.g. change the meaning of existing syntax</TD></TR> <TR><TD>?POSTPRE</TD><TD>Special object loaded after all other objects and before the source</TD></TR> </TABLE> At present, the system does not distinguish between ?EXTENSION and ?OPTION; the command-line compiler tool accepts then indifferently, and the calculator lists all these object in the Options menu. </P> <P> For example, towards the start of sick.iacc one can see: <PRE> DO ?TYPE <- ?COMPILER </PRE> On the other hand, the extensions we will develop as examples in this section will have: <PRE> DO ?TYPE <- ?EXTENSION </PRE> </P> <P> One further statement is of interest: MAKE NEW OPCODE. This is described in <A HREF="#sick">the section about writing an extension for <I>sick</I></A>. <H2><A NAME="predefined">Predefined symbols</A></H2> <P> Some nonterminals are predefined by CLC-INTERCAL. This means that you don't use CREATE statement to make them, and you can use them in your compiler or extension: </P> <TABLE> <TR><TH>Symbol</TH><TH>Meaning</TH></TR> <TR><TD>?ANYTHING</TD><TD>Any single character</TD></TR> <TR><TD>?BLACKSPACE</TD><TD>Any non-space character</TD></TR> <TR><TD>?CONSTANT</TD><TD>Any numeric constant between 0 and 65535</TD></TR> <TR><TD>?JUNK</TD><TD><I>See below</I></TD></TR> <TR><TD>?SPACE</TD><TD>Any space character</TD></TR> <TR><TD>?SYMBOL</TD><TD>Any sequence of alphanumerics or udnerscores</TD></TR> </TABLE> <P> Although these symbols could be defined using CREATEs, it would be rather cumbersome to do so. </P> <P> The ?JUNK symbol is used to parse comments. It matches the longest text which does not look like the start of a statement. Special register %JS, described in the next section, defines what constitutes the start of a statement. Normally, the value of %JS is ?END_JUNK, and the following productions are defined by the compiler: <PRE> DO CREATE _2 ?END_JUNK ?STMT_LABEL AS ,, DO CREATE _2 ?END_JUNK ,DO, AS ,, DO CREATE _2 ?END_JUNK ,PLEASE, AS ,, </PRE> In other words, the start of a statement is either a label (as defined in symbol ?STMT_LABEL), or one of the terminals "DO" or "PLEASE". When parsing a comment, the ?JUNK symbol will therefore find the first label, DO or PLEASE and matches the text in between. </P> <H2><A NAME="special">Special Registers</A></H2> <P> A number of special register control how the compiler or the runtime operates. <UL> @@MULTI DOUBLE_OH_SEVEN NAME@@ <LI>@@TYPE@@@@NAME@@ - @@DESCR@@<BR> @@DOC 76 HTML@@ @@MULTI@@ </UL> </P> <H2><A NAME="code">Code generation</A></H2> <P> The right-hand side of a CREATE statement (the bit after the AS) generates the code to be executed when the template matches a bit of the program. </P> <P> The <I>code</I> consists of elements, separated by the intersection symbol (+); each element can be one of the following: <UL> <LI>A symbol, followed by an expression. If the expression evaluates to value <I>n</I>, this copies the code of the <I>n</I>-th occurrence of the symbol in the <I>template</I>. For example: <PRE> DO CREATE ?SWAP ?EXPRESSION ?EXPRESSION AS ?EXPRESSION #2 + ?EXPRESSION #1 </PRE> would generate the code for the second expression followed by the code for the first one. <LI>A bang, followed by a symbol (with or without the what), followed by an expression. This matches the same symbol as the previous element, but generate codes to produce the associated count value. <LI>One opcode representing bytecode. See the next section for the meaning of the opcodes. <LI>A terminal, followed by an expression. If the expression evaluates to value <I>n</I>, this copies the text matched of the <I>n</I>-th occurrence of the terminal in the <I>template</I>, encoding it as a string. <LI>An empty terminal (",,"). This generates no code. <LI>A constant (in the form #<I>number</I>). This generates the bytecode which evaluates to that constant. <LI>A splat. This has a special, currently undocumented, meaning which has to do with the conversion of the generated bytecode to actual executable. </UL> </P> <P> As examples, consider the three CREATE statements listed in the previous section: <PRE> DO CREATE _2 ?STMT_LABEL ,#40, ?CONSTANT ,#41, AS ?CONSTANT #1 DO CREATE _2 ?NAMES ?RNAME=1 AS ?RNAME #1 DO CREATE _2 ?NAMES ?RNAME=1 ,#43, ?NAMES=* AS ?RNAME #1 + ?NAMES #1 </PRE> The first statement generates code which evaluates to the constant provided inside the parentheses. This is obviously going to be used in a context where this is interpreted as a label number. The second statement just copies the code generated by ?RNAME; the third statement just produces the code generated by ?RNAME followed by the code generated by the recursive call to itself. </P> <P> Now consider: <PRE> DO CREATE _2 ?VERB ,STASH, ?NAMES AS STA + !NAMES #1 + ?NAMES #1 </PRE> <I>STA</I> is a bytecode opcode (which happens to correspond to the STASH operation). It takes an expression, representing a count, and then that number of registers. The generated code reflects this: !NAMES #1 is the number of registers, and ?NAMES #1 is the code generated by all the registers, one after the other. </P> <P> If one wants to extend <I>sick</I> to allow direct access to the base, for example using the syntax SETBASE <I>expression</I> to change the base and GETBASE <I>expression</I> to get the base, one could say: <PRE> DO CREATE ?VERB ,SETBASE, ?EXPRESSION AS STO + ?EXPRESSION #1 + %BA DO CREATE ?VERB ,GETBASE, ?EXPRESSION AS STO + %BA + ?EXPRESSION #1 </PRE> The <I>STO</I> opcode, followed by two expressions, assigns the first expression to the second. In this case, the code generated by SETBASE <I>expression</I> would be identical to the code generated by %BA <- <I>expression</I>, and the code generated by GETBASE <I>expression</I> would be identical to the code generated by <I>expression</I> <- %BA. </P> <H2><A NAME="bytecode">Bytecode</A></H2> <P> The bytecode represents an intermediate form produced by the compilers. This consists of opcodes which are executed in sequence. At present, a bytecode interpreter executes the program, however there are plans to allow direct generation of C or Perl source from the bytecode. </P> <P> Each byte in the bytecode can be part of a statement, an expression or a register. In addition, a subset of expressions can be assigned to: these are called assignable expressions. For example, a constant is an assignable expression. When assigned to, it changes the value of the constant. This is necessary to implement overloading and is also a great obfuscation mechanism. </P> <H3>Constants</H3> <P> Constants can be specified in three ways: <UL> <LI>Byte larger than maximum opcode.<BR> Any byte with value greather than the maximum opcode is interpreted as a 16 bit (spot) constant by subtracting the number of opcodes from the byte. For example, since there are 128 opcodes, byte 131 is equivalent to #3, and byte 255 (the maximum value) is #127 @@MULTI CONSTANTS NAME@@ <LI>@@NAME@@ - @@DESCR@@<BR> @@DOC 76 HTML@@ @@MULTI@@ </UL> </P> <H3>Registers</H3> <P> Registers can be any number of register prefixes, followed by a type and a constant. There are limitations in the useful combination of prefixes. </P> <P> The register types are: <UL> @@MULTI REGISTERS NAME@@ <LI>@@NAME@@ - @@DESCR@@<BR> @@DOC 76 HTML@@ @@MULTI@@ </UL> </P> <P> The prefixes which can applied to registers are: <UL> @@MULTI PREFIXES NAME@@ <LI>@@NAME@@ - @@DESCR@@<BR> @@DOC 76 HTML@@ @@MULTI@@ </UL> </P> <H3>Expressions</H3> <P> Assignable expressions are sequences of bytecode which can used as the target of an assignment. Of course, all registers are assignable; all constants are also assignable, which makes then really variables. Instead of describing the assignable expressions separately, we describe all expressions and mention which ones are assignable. Assigning to an expression means assigning appropriate values to its subexpressions such that the expression, if evaluated, would result in the value being assigned. This is not always possible, so it can generate runtime errors. </P> <P> In addition to registers and constants, the following are valid expressions: <UL> @@MULTI EXPRESSIONS NAME@@ <LI>@@NAME@@ - @@DESCR@@<BR> @@DOC 76 HTML@@ @@MULTI@@ </UL> </P> <H3>Statements</H3> <P> The following opcodes are valid statements: <UL> @@MULTI STATEMENTS NAME@@ <LI>@@NAME@@ - @@DESCR@@<BR> @@DOC 76 HTML@@ @@MULTI@@ </UL> </P> <H2><A NAME="sick">Writing an extension for <I>sick</I></A></H2> <P> Writing an extension for the <I>sick</I> compiler (or any of the other compilers provided) is simply a matter of putting together the material in this chapter in a way which is consistent with the rest of the compiler. For this reason, this section provides a description of some of the compiler internals. Please note that while the compiler internals could change in future versions of CLC-INTERCAL, the parts described here are unlikely to change. </P> <P> The most important grammar symbols defined by <I>sick</I>, <I>ick</I> and <I>1972</I> are (see below for further explanations): <UL> <LI>?UNARY<BR> Defines a unary operator. It matches the operator name (not the complete subexpression) and generates code which expects an operand right after it. <LI>?BINARY<BR> Defines a binary operator. It matches the operator name (not the complete subexpression) and generates code which expects the two operands in <I>reverse</I>. <LI>?VERB<BR> Defines a new statement. It matches the whole "verb" part of the statement (i.e. it does not match PLEASE, DO, NOT and so on) and return code to execute the statement; the returned code must be self-contained in that it just runs without assuming any extra code being generated; if the statement is to be considered a quantum statement, it must start with the opcode QUA. <LI>?GERUND<BR> Used by ABSTAIN FROM, REINSTATE and any other statements which takes a gerund list. Matches the gerund appropriate for a statement, and generates a list of opcodes; it must also generate the appropriate opcode count. <LI>?TEMPLATE<BR> Used by the <I>sick</I> compiler, but it can be defined in extensions to other compilers without causing problems. Matches one statement template and generates a single opcode. There is no need to associate a count with this. </UL> <P> When extending the expression syntax, one commonly adds a unary or a binary operator. This can be easily done by adding a production to symbol ?UNARY or ?BINARY, respectively. In general, the operation to add is not already present in CLC-INTERCAL (otherwise there would be already syntax for it), so one would use the undocumented expression opcode (UNE) and an additional Perl module to implement it. </P> <P> The undocumented expression opcode takes two strings, a number, and then a list of expressions (the number determines how many). The first string is taken to be the name of a Perl module, with <CODE>Language::INTERCAL::</CODE> automatically prepended to it, the second string is taken to be the name of a function to be called within that module; the expressions are passed as arguments to the function using the form: <PRE> $result = Language::INTERCAL::<I>module</I>-><I>function</I>(<I>expr</I>, <I>expr</I>...) </PRE> The module is automatically loaded if necessary. </P> <P> Suppose, for example, we want to add an <I>überwimp</I> extension, which adds two new operators: a unary logical negation and a binary logical AND operator. We use the symbols "n" and "a" for these operations. We start by defining the Perl module to implement them: <PRE> package Language::INTERCAL::Ueberwimp; use Language::INTERCAL::Splats ':SP'; use Language::INTERCAL::Numbers; sub negate { @_ == 2 or faint(SP_INVALID, 'Wrong number of arguments', 'negate'); my ($class, $arg) = @_; $arg = $arg->number; return Language::INTERCAL::Numbers::Spot->new(! $arg); } sub and { @_ == 3 or faint(SP_INVALID, 'Wrong number of arguments', 'and'); # remember we get the arguments in reverse order (OK, so it does # not matter here because this operation is commutative, but in # general we should remember this). my ($class, $second, $first) = @_; $first = $first->number; $second = $second->number; return Language::INTERCAL::Numbers::Spot->new($first && $second); } 1; </PRE> This uses the runtime internals to get a Perl number from the arguments (this would automatically splat if the argument happens to be something other than a number), and create a new Spot value. Now, in order to use this module, we need to add the syntax and code for it: <PRE> DO ?TYPE <- ?EXTENSION DO CREATE ?UNARY ,n, AS UNE + MUL + #9 + #85 + #101 + #98 + #101 + #114 + #119 + #105 + #109 + #112 + MUL + #6 + #110 + #101 + #103 + #97 + #116 + #101 + #1 DO CREATE ?BINARY ,a, AS UNE + MUL + #9 + #85 + #101 + #98 + #101 + #114 + #119 + #105 + #109 + #112 + MUL + #3 + #97 + #110 + #100 + #2 PLEASE GIVE UP </PRE> This example shows one way of creating strings in bytecode: the MUL opcode, followed by the number of characters in the string, followed by the character codes. Note that the module and function name are provided in ASCII, but if the function requires any string arguments these are provided in Baudot for compatibility with alphanumeric I/O. In any case, we pass the string "Ueberwimp" (the Perl module name) and either "negate" or "and" as the first two arguments to UNE; the third argument is the number of expressions to follow, which will be #1 for "negate" and #2 for "and". The expressions will be automatically provided by the rest of the compiler. </P> <P> To use this extension, save the above INTERCAL code to a file, say <I>ueberwimp.iacc</I>, and compile it with: <PRE> sick ueberwimp.iacc </PRE> Then save the above Perl code in a file <I>Ueberwimp.pm</I> somewhere your Perl interpreter will be able to find it. To use this extension, to compile <I>yourprogram.i</I> you just say: <PRE> sick -psick -pueberwimp yourprogram.i </PRE> For example, if the program contains "DO .1 <- .n2" or "DO .1 <- .2 a .3" this will automatically load your Perl module and call its negate or and method, as required. </P> <P> <I>Special note</I> - the rest of this section contains information which may change in future. Implementing new statements is not fully supported yet. </P> <P> The procedure to add a new statement is very similar to adding operators, however you use the Undocumented Statement (<I>UNS</I>) opcode which is almost identical to the Undocumented Expression except it does not return a value. It does, however, take the same arguments and expects you to write a corresponding Perl module. </P> <P> Since statements can be referred to by gerund or template, each form of the statement must have a unique identifier; statements defined by CLC-INTERCAL use the bytecode opcode number for that, but if you use <I>UNS</I> you must specify your own gerund - just pick a number between #256 and #65535 which has not been used by other extensions. </P> <P> Once all this is in place, you need to define your syntax by adding rules for the ?VERB symbol; you also create as many rules for the ?TEMPLATE symbol as there are different forms for your statement; finally you add one rule for the ?GERUND symbol returning all possible gerund identifiers, and setting the count value to the appropriate value. </P> <P> We understand it is about time to provide an example. Let's say you want to do some form of code profiling, and you start by adding two statements which signal the start and the end of a profiling block. You want to be able to say: <PRE> DO PROFILE ON #1234 .... DO PROFILE OFF #1234 </PRE> And see on your standard error something like: <PRE> 1174382975.857 ON 1234 .... 1174382975.868 OFF 1234 </PRE> (The 1174382975 is just a Unix timestamp, which happens to mean Tue Mar 20 09:29:35 2007 - guess when this was written?). The assumption is that you'll also write a program to analyse this output and tell you where your program is being slow. As before, you start with a Perl module: <PRE> package Language::INTERCAL::Profile; use Language::INTERCAL::Splats ':SP'; use Time::HiRes 'gettimeofday'; sub on { @_ == 2 or faint(SP_INVALID, 'Wrong number of arguments', 'Profile ON'); my ($class, $arg) = @_; $arg = $arg->number; my ($sec, $msec) = gettimeofday; fprintf STDERR "%d.%03d ON %d\n", $sec, $msec / 1000, $arg; } sub off { @_ == 2 or faint(SP_INVALID, 'Wrong number of arguments', 'Profile OFF'); my ($class, $arg) = @_; $arg = $arg->number; my ($sec, $msec) = gettimeofday; fprintf STDERR "%d.%03d OFF %d\n", $sec, $msec / 1000, $arg; } 1; </PRE> Next, you write a compiler extension to add the required syntax and generate the code: <PRE> DO ?TYPE <- ?EXTENSION DO MAKE NEW OPCODE #666 ,E, AS UNS + MUL + #7 + #80 + #114 + #111 + #102 + #105 + #108 + #101 + MUL + #2 + #111 + #110 + #1 DO MAKE NEW OPCODE #666 ,E, AS UNS + MUL + #7 + #80 + #114 + #111 + #102 + #105 + #108 + #101 + MUL + #3 + #111 + #102 + #102 + #1 DO CREATE ?VERB ,PROFILE, ,ON, ?EXPRESSION AS USG + #666 + ?EXPRESSION #1 DO CREATE ?VERB ,PROFILE, ,OFF, ?EXPRESSION AS USG + #667 + ?EXPRESSION #1 DO CREATE ?TEMPLATE ,PROFILE, ,ON, ,EXPRESSION, AS #666 DO CREATE ?TEMPLATE ,PROFILE, ,OFF, ,EXPRESSION, AS #667 DO CREATE ?GERUND ,PROFILING,=2 AS #666 + #667 PLEASE GIVE UP </PRE> First we need to register new operation codes ("gerund codes") with the runtime. This is done by the two MAKE NEW OPCODE statements. The opcodes, #666 and #667, are very similar: they both take one expression as arguments (that's the ,E,), and they are implemented by a call to UNS with the appropriate parameters ("Profile" and the "on" or "off", respectively). After that, it is just a matter of using the new opcodes in the right place: the first two CREATE statements use <I>USG</I> ("use gerund") followed by the appropriate opcode and its arguments. </P> <P> The two CREATE ?TEMPLATE statements define the two statement templates corresponding to the two previous definitions. They match strings "PROFILE ON" and "PROFILE OFF" and return the corresponding gerund (as a number, without using the <I>USG</I> opcode). Having defined these two templates, you are now allowed to confuse your profiling system with: <PRE> PLEASE SWAP PROFILE ON AND PROFILE OFF </PRE> Note that the two new gerunds were defined as taking one expression as argument: therefore they can be swapped with any other statement which takes just one expression: <PRE> PLEASE SWAP PROFILE ON AND RESUME EXPRESSION PLEASE CONVERT PROFILE OFF TO FORGET EXPRESSION </PRE> </P> <P> The last CREATE statement defined the gerund PROFILING, so you can control whether this output is produced by using DO ABSTAIN FROM PROFILING and DO REINSTATE PROFILING. Note that you return both gerunds here, and also set the count as appropriate (with the =2 after ,PROFILING,) so that the rest of the compiler knows how many gerunds you are trying to add. </P> <H2><A NAME="examples">Examples</A></H2> <P> The code for <I>computed-labels.iacc</I> is: <PRE> DO ?TYPE <- ?EXTENSION DO CREATE _2 ?STMT_LABEL ,#40, ?EXPRESSION ,#41, AS ?EXPRESSION #1 DO GIVE UP </PRE> The ?TYPE flag is set to extension because this program extends the syntax of an existing compiler. The second statement extends the grammar; we have already seen that a standard label is parsed by stmbol ?STMT_LABEL and conststs of an open parenthesis, a constant, and a close parenthesis. The CREATE statement in this extension adds a second production for ?STMT_LABEL, allowing any expression in addition to the constant. As a result, a non-computed label can now be written in two ways, for example (1) and (#1). </P> <P> The distribution also includes six very similar programs, with names <I>2.iacc</I>, <I>3.iacc</I> etc. We show <I>5.iacc</I>: <PRE> DO ?TYPE <- ?BASE PLEASE %BA <- #5 DO GIVE UP </PRE> The ?TYPE flag is set to base here, because this is what thie program does: it changes the arithmetic base. The only thing it needs to do is to assign #5 to special register %BA. </P> <P> As a final example, <I>next.iacc</I> allows to extend the <I>sick</I> compiler with a NEXT statement. <PRE> DO ?TYPE <- ?EXTENSION DO CREATE _2 ?VERB ?LABEL ,NEXT, ?Q_NEXT AS ?Q_NEXT #1 + NXT + ?LABEL #1 DO CREATE _2 ?Q_NEXT ,, AS ,, DO CREATE _2 ?Q_NEXT ,WHILE, ,NOT, ,NEXTING, AS QUA DO CREATE _2 ?GERUND ,NEXTING,=1 AS NXT DO CREATE _2 ?TEMPLATE ,LABEL, ,NEXT, AS NXT DO GIVE UP </PRE> Again, the ?TYPE flag is extension. This time there are several additions to the grammar. The first CREATE statement adds the actual NEXT, using the ?LABEL symbol already present in <I>sick</I>, as well as another auxiliary symbol, ?Q_NEXT, which is defined in the following two statements: it can be empty, and generate no code, or it can parse the text WHILE NOT NEXTING, in which case it adds QUA (QUAntum) to the generated code. Since we are adding a new statement, we also need to extend the definition of ?GERUND (used by ABSTAIN FROM etc) and of ?TEMPLATE (used by CONVERT, SWAP, as well as the template form of ABSTAIN FROM etc). Note that, unlike the case of user-generated statements discussed in a previous section, we can use the statement's opcode as gerund. </P> </BODY> </HTML>