view src/ploki/doc/ploki-expr.txt @ 12292:d51f2100210c draft

<kspalaiologos> `` cat <<<"asmbf && bfi output.b" > /hackenv/ibin/asmbf
author HackEso <hackeso@esolangs.org>
date Thu, 02 Jan 2020 15:38:21 +0000
parents ac0403686959
children
line wrap: on
line source

                      *** THE PLOKI LANGUAGE: EXPRESSIONS ***


GENERAL

ploki is an imperative, unstructured language, similar to BASIC. Some of its
features are borrowed from Perl. ploki's data structures are fully dynamic,
i.e. they resize themselves when necessary. Strings know their length and
may contain binary data.


VALUES

There are six types of values:
1. String: A (possibly empty) sequence of characters.
   Any value can be converted to a string.
2. Number: A (double precision) floating point number.
   Any value can be converted to a string.
3. IO handle: Can be read from or written to.
4. List: A (possibly empty) sequence of values.
5. Bound expression: See @OMFG below.
6. Undefined.

Values aren't necessarily one thing or another. There's no place to declare
a variable to be of type "string", type "number", or anything else. In fact,
you can't declare variables at all. Every variable you use is initialized
with the undefined value ("undef" for short).


Truth

0, "0", "", #<#> and undef are false. Any other value is true.


Conversions

Conversions happen as follows:
  * number -> string: decimal floating point, the "normal" way
  * IO handle -> string: filename
  * list -> string: concatenation of the stringified list elements
  * bound expression -> string: undefined results
  * undef -> string: "" (the empty string)

  * string -> number: initial whitespace is skipped, the rest of the string
                      is interpreted as a decimal floating point number
                      (as done by strtod())
  * IO handle -> number: undefined results
  * list -> number: sum of the numified list elements
  * bound expression -> number: undefined results
  * undef -> number: 0.0


Constructors

Anything starting with a digit or a period followed by a digit is taken to
be the beginning of a number (again, as parsed by strtod(), which means that
scientific notation (e.g. 1e-23) is allowed). String literals start with '"'
and end with '"' or end-of-line. The following escape sequences are
recognized:

  \OOO octal char (where OOO matches [0-7]{1,3})
  \xXX hex char (where XX matches [0-9a-fA-F]{1,2})
  \a   alarm
  \b   backspace
  \f   form feed
  \n   newline
  \r   carriage return
  \t   tab
  \v   vertical tab

  \cX  the character resulting from (toupper(X) + 64) % 128
       (this depends on the character set used)

  \V<value>
       The stringified result of evaluating <value>. In other words, this
       can be used to interpolate expressions in strings.
       Example: "2 + 2 = \V(2 + 2)" yields "2 + 2 = 4".


Lists start with #< and end with #> or end-of-line. Example: #<2 +1 (3 +4)#>
is equivalent to #<2 1 7#>.

The empty value yields undef: `-2' is parsed as `  - 2', undef is converted
to 0.0, and the result is 0.0 - 2.0, which is -2.


Special symbols

There are the following special symbols, similar to Perl's punctuation
variables:

  * \N (where N is a non-empty sequence of digits)
    Contains the substring that was matched by the Nth capturing group in
    the previous pattern match.
  * \!
    If used numerically, yields the current value of C's errno variable, or
    in other words, if a system or library call fails, it sets this
    variable. If used as a string, yields the corresponding system error
    string (i.e. strerror(errno)).
  * \?
    Yields a random number in the range [0,1).
  * \_
    Contains the return value of the last command.
  * \@
    The argument the current function was called with.
  * \ARG
    Number of command line options (C's argc).
  * \AUSG
    The standard output IO handle (open mode: "WF").
  * \EING
    The standard input IO handle (open mode: "RZ").
  * \E
    Euler's number: 2.718281828...
  * \FEHL
    The standard error IO handle (open mode: "WF").
  * \PI
    pi: 3.14159265...
  * \
    undef


VARIABLES

There are two kinds of variables: plain variables and subscripted variables.
A plain variable has the form [A-Za-z$]+, i.e. one or more alphabetic
characters (where '$' is considered alphabetic). A subscripted variable
looks like a plain variable immediately followed by an opening paren "(", an
expression (the result of which is converted to a string) and possibly a
closing paren ")".


UNARY OPERATORS

Each unary operator takes a value and returns a value. The only exception is
@OMFG, which is more a constructor than an operator.

  * \ARG:
    Takes an integer and returns the corresponding command line argument
    (like C's argv[]).
  * \L
    Returns a lowercased version of its operand (like Perl's lc).
  * \Q
    Returns its operand with all non-word characters backslashed, i.e. any
    character not matching [a-zA-Z0-9_] will be preceded by a \ in the
    returned string (like Perl's quotemeta).
  * \R
    Returns its operand with all non-word characters regex-escaped, i.e. any
    character not matching [a-zA-Z0-9_] will be followed by a ! in the
    returned string (like s/(\W)/$1!/g in Perl).
  * \U
    Returns an uppercased version of its operand (like Perl's uc).
  * @+
    For an operand N, returns the offset of the character after the portion
    of the string matched by the Nth capturing group in the last pattern
    matching.
  * @-
    For an operand N, returns the offset of the portion of the string
    matched by the Nth capturing group in the last pattern matching.
  * @ABS
    Returns the absolute value of its operand.
  * @ACOS
    Returns the arc cosine of its operand.
  * @APERS
    Takes a list of two elements, a filename and a mode string. Opens the
    specified file and returns an IO handle or undef on error. If the mode
    string contains "A", the file is created if necessary and opened for
    appending (i.e. every write goes to the end of the file); if the mode
    string contains "W", the file is truncated/created and opened for
    writing; otherwise the file is opened for reading, and the open fails if
    the file doesn't exist. If the mode string contains "+", reading is
    added for "A" and "W", and writing otherwise. "R+" is almost always
    preferred for read/write access; "W+" would clobber the file first. The
    resulting IO handle defaults to text mode; add "B" to open the file in
    binary mode. Adding "F" turns on autoflush mode, i.e. the stream is
    flushed after every write. If the file is opened for reading and the
    mode string contains "Z", the resulting stream can be used with the
    regex match operator (~).
  * @ASIN
    Returns the arc sine of its operand.
  * @ATAN
    Returns the arc tangent of its operand.
  * @ATAN2
    Takes a list of two elements #<y x#> and returns the arc tangent of x
    and y. It is similar to @ATAN (y / x), except that the signs of both
    arguments are used to determine the quadrant of the result.
  * @CHR
    Takes an integer and returns the character represented by that number in
    the character set.
  * @COS
    Returns the cosine of its operand.
  * @DEF-P
    Returns "" if its operand is undef, 1 otherwise.
  * @EDD-P
    Returns 1 if the end-of-file indicator for its operand (which must be an
    IO handle) is set, "" otherwise.
  * @ENV or \ENV
    Returns the value of the specified environment variable.
  * @ERR-P
    Returns 1 if the error indicator for its operand (which must be an IO
    handle) is set, "" otherwise.
  * @EVAL
    Evaluates and returns its operand. If an exception is thrown during the
    evaluation of its operand, assigns the exception value to \_ and returns
    undef.
  * @GET
    Takes an IO handle, reads one character, and returns its numeric value or
    -1 on end-of-file or error.
  * @INT
    Returns the integer portion of its operand.
  * @IO-P
    Returns 1 if its operand is an IO handle, "" otherwise.
  * @LAPERS
    Takes a filename, opens it for reading, and returns the resulting IO
    handle or undef on failure. Equivalent to @APERS #<filename "RB"#>.
  * @LEGS
    Takes an IO handle, reads a line and returns it. Returns "" on
    end-of-file and undef on error.
  * @LENGTH
    If passed a list, returns the number of elements; otherwise, returns the
    length of its operand in characters.
  * @LG
    Returns the base-10 logarithm of its operand.
  * @LN
    Returns the natural logarithm (base e) of its operand.
  * @NEG
    Performs arithmetic negation.
  * @NOT
    Returns 1 if its operand is false, "" otherwise.
  * @NUM
    Converts its operand to a number.
  * @ORD
    Returns the numeric value of the first character of its operand.
  * @OMFG
    Does not evaluate its operand but wraps it up in a bound expression by
    replacing all variables by their current value.
  * @REMOVE
    Tries to remove the specified file. Returns 1 for success, "" for
    failure.
  * @RENAEM
    Takes a list of two filenames #<OLD NEW#> and tries to rename OLD to
    NEW. Returns 1 for success, "" for failure.
  * @REVERSE
    If passed a list, returns a list consisting of the elements of the
    original list in the opposite order. Otherwise, returns a string with
    all the characters in the opposite order.
  * @SAG
    Returns a value representing the current file position of its operand
    (which must be an IO handle) suitable as the second argument of @SUCH.
    If the file was opened in binary mode, this is the number of bytes from
    the beginning of the file.
  * @SAPERS
    Takes a filename, opens it for writing, and returns the resulting IO
    handle or undef on failure. The file is created if it doesn't exist and
    truncated to length 0 otherwise. Equivalent to @APERS #<filename
    "WBF"#>.
  * @SIN
    Returns the sine of its operand.
  * @SQRT
    Returns the square root of its operand.
  * @STR
    Converts its operand to a string.
  * @SUCH
    Takes a list of two or three elements, #<IO POS WHENCE#>. WHENCE
    defaults to 0 if omitted. Sets the current file position of IO (which
    must be an IO handle) to POS (relative to WHENCE). WHENCE must be one of
    0 (beginning of the file), 1 (current position) or 2 (end of file). If
    IO is in text mode, POS must be a value returned by @SAG and WHENCE must
    be 0, or POS must be 0. If IO is in binary mode, the new position
    (measured in bytes from the beginning of the file) is obtained by adding
    POS to the position specified by WHENCE. In this case POS=0 and WHENCE=2
    may not work, depending on your C library.
  * @TAN
    Returns the tangent of its operand.
  * @TYPE OF
    Returns a string describing the type of its operand: "string" for
    strings, "number" for numbers, "stream" for IO handles, "list" for
    lists, "stuff" for bound expressions, "nothing" for undef.
  * @<label> (where <label> is a static label)
    Calls the static label <label> with its operand.
  * @
    Calls the dynamic label specified by its operand.


BINARY OPERATORS

Some of the operators (marked with (SL)) below double as string and list
operators. They do the same thing with strings and lists by treating strings
as lists of characters. Boolean and comparison operators return 1 for true
and "" for false.

 _   (SL) If given two lists, it concatenates them. Otherwise, it converts
     its operands to strings and concatenates them.

 +   Adds two numbers.
 -   Subtracts two numbers.
 *   Multiplies two numbers.
 /   Divides two numbers.
 %   Returns the modulus (remainder from division) of two numbers.
 ^   Exponentiation.

 [   (SL) Given a string/list B and an integer N, returns the
     substring/sublist of B starting at index N. Negative indices start from
     the end of the string/list.
     Example: "foo" [ 1 yields "oo"; "foobar" [ -2 yields "ar".

 ]   (SL) Given a string/list B and an integer N, returns the
     substring/sublist of B having a length of N. Negative lengths start
     from the end of the string/list.
     Example: "foo" ] 1 yields "f"; "foobar" ] -2 yields "foob"

 <   Numeric less than.
 >   Numeric greater than.
 {   (SL) String/list less than.
 }   (SL) String/list greater than.
 =   Numeric equality.
 !   Numeric inequality.
 :   (SL) String/list equality.
 ;   (SL) String/list inequality.

 &   Logical and.
 |   Logical or.

 ~   Pattern match. The left operand is the string or IO stream to be
     matched, the right operand is a string forming a regular expression
     (see PATTERNS below).  On failure, "" is returned. On success, returns
     the position of the match and sets \_ to the position after the match.

 ?o~ Similar to ~ above, except that the right operand is only evaluated the
     first time it is executed. This means that (foo ?o~ bar) won't see any
     changes to bar after the first match attempt (like /o in Perl).

 `   Given a number X and an integer N (which must be in the range 2 .. 36),
     returns a string representation of X in base N.
 '   Given a string S and an integer N (which must be in the range 2 .. 36),
     interprets S as a number in base N and returns the result.

  (the empty operator) or
 .   (SL) This one is really three different operators, depending on the
     type of its left operand:
      - IO handle: Attempts to read and return N bytes where N is the right
        operand.
      - Bound expression: Evaluates the bound expression with \@ temporarily
        set to the right operand.
      - String/list: Returns the element at the index specified by the right
        operand.

 ,   Returns its right operand.


PATTERNS

The ~ operator tries to match a string or IO handle against a pattern. A
match succeeds if the pattern matches a substring of the matched string.
When matching an IO handle (which must have been opened with "Z"), the
matched characters are removed from the input stream if the match succeeds.

Whitespace in the pattern is mostly ignored (that is, the pattern " a b" is
equivalent to "ab"). Exceptions are marked [S] below. There is a special
kind of "whitespace": everything between "[" and "#]" is ignored (except for
"!"s), so you can write "foo [ a comment with [! in it #] bar" instead of
"foobar".

A search pattern consists of one or more branches, separated by "|". It
succeeds if one of its branches succeeds. The branches are tried in the
order they're specified.

A branch is a (possibly empty) sequence of pieces. It succeeds if all of the
pieces match in turn.

A piece is an atom, possibly followed by a quantifier. A missing quantifier
means the atom must match exactly once.

A quantifier is one of the following (N, M are nonnegative integers):
  *       0 or more of the preceding atom (greedy)
  *?      0 or more of the preceding atom (nongreedy)   [S]
  +       1 or more of the preceding atom (greedy)
  +?      1 or more of the preceding atom (nongreedy)   [S]
  ?       0 or 1 of the preceding atom (greedy)
  ??      0 or 1 of the preceding atom (nongreedy)      [S]
  :N:     exactly N of the preceding atom               [S]
  :N:?    exactly N of the preceding atom               [S]
  :N,:    N or more of the preceding atom (greedy)      [S]
  :N,:?   N or more of the preceding atom (nongreedy)   [S]
  :N,M:   at least N but not more than M of the preceding atom (greedy)
          [S]
  :N,M:?  at least N but not more than M of the preceding atom (nongreedy)
          [S]

An atom is either a character class, an assertion, a selection, a subgroup,
a capturing group, a backreference, an independent subgroup, an
abbreviation, an escaped character, or an ordinary character.

A character class is either a built-in class or a custom class. A built-in
class is one of the following:
  .      a character
  c!     a control character
  C!     a non-control character
  d!     a digit
  D!     a non-digit
  l!     a lowercase character
  L!     a non-lowercase character
  p!     a printable character
  P!     a non-printable character
  q!     an alphanumeric character
  Q!     a non-alphanumeric character
  s!     a space character
  S!     a non-space character
  u!     an uppercase character
  U!     a non-uppercase character
  w!     a word character, i.e. one of a-zA-Z0-9_
  W!     a non-word character
  x!     a hex digit, i.e. one of 0-9a-fA-F
  X!     a non-hex character

[S] A custom class is a (possibly empty) sequence of characters, enclosed in
'' (single quotes). It matches one of the listed characters, unless the
first character is '^', in which case it matches one of the not listed
characters. Whitespace is significant inside character classes. A range of
characters can be specified by putting a '-' between the endpoints. In
addition, all built-in classes listed above (except for .) and the following
subclasses are available:

  [:alnum:]   equivalent to q!
  [:^alnum:]  equivalent to Q!
  [:alpha:]   equivalent to a!
  [:^alpha:]  equivalent to A!
  [:cntrl:]   equivalent to c!
  [:^cntrl:]  equivalent to C!
  [:digit:]   equivalent to d!
  [:^digit:]  equivalent to D!
  [:graph:]   equivalent to '^P! ', i.e. any printable character except space
  [:^graph:]  equivalent to 'P! ', i.e. any non-printable character or space
  [:lower:]   equivalent to l!
  [:^lower:]  equivalent to L!
  [:print:]   equivalent to p!
  [:^print:]  equivalent to P!
  [:punct:]   equivalent to '^P!q! ', i.e. any non-alphanumeric, non-space
              printable character
  [:^punct:]  equivalent to 'P!q! ', i.e. a non-printable character, an
              alphanumeric character or space
  [:space:]   equivalent to s!
  [:^space:]  equivalent to S!
  [:upper:]   equivalent to u!
  [:^upper:]  equivalent to U!
  [:xdigit:]  equivalent to x!
  [:^xdigit:] equivalent to X!

To include a literal ^, !, - or ' in a character class, escape it with a
following !. Example: '!!'!' is a class that matches ! or '.

An assertion is something that doesn't consume parts of the matched string,
i.e. it matches with zero length. There are built-in assertions and custom
assertions. A built-in assertion is one of the following:

  ^   at beginning of string
  $   at end of string
  A!  at beginning of line
  Z!  at end of line
  b!  at a word boundary
  B!  not at a word boundary

A custom assertion is either a positive assertion, a negative assertion or a
backreference check. A positive assertion consists of "[", followed by a
pattern, followed by "&]" [S].  It succeeds (without consuming parts of the
matched string) if the enclosed pattern succeeds. A negative assertion
consists of "[", followed by a pattern, followed by "^]" [S]. It succeeds if
the enclosed pattern fails. A backreference check consists of "[", followed
by a number, followed by "]" [S]. It succeeds if the corresponding capturing
group succeeded.

A selection consists of "[", optionally followed by a pattern followed by
"|", followed by a branch, followed by an atom, followed by "?]" [S]. It
matches as follows: First the atom is tried. If it matches, the preceding
branch is tried; otherwise matching continues with the preceding pattern if
specified.  In other words, it looks like
"[no-pattern|yes-pattern(conditition)?]" or "[yes-pattern(condition)?]".
Example: "{(!}?'^()'+[)![0]?]" matches a chunk of non-parentheses, possibly
included in parentheses themselves.

A subgroup is a parenthesized pattern. It succeeds if the enclosed pattern
succeeds.

A capturing group is a pattern surrounded by "{" and "}". It succeeds if the
enclosed pattern succeeds. As a side effect, it sets \N, @-(N) and @+(N)
where N is the number of the capturing group. Capturing groups are numbered
from the right, counting the closing braces (starting with 0).

A backreference is N! where N is a non-negative integer. It succeeds if it
can match the same substring that was matched by the Nth capturing group.

An independent subgroup is a pattern, surrounded by "<" and ">". It succeeds
if the enclosed pattern succeeds, but when it does, it won't backtrack in
the enclosed pattern.

An abbreviation is a (possibly empty) sequence of characters surrounded by
"`" and "`". All metacharacters lose their special meaning inside an
abbreviation, except for ` (which marks the end of the abbreviation) and !
(which escapes the previous character, as usual). `abcd` is equivalent to
(a(b(cd?)?)?)?, i.e. an abbreviation for x matches the longest possible
initial substring of x.

[S] An escaped character is a special character like "!" or "+" (or
whitespace), followed by an "!". It matches that character literally.

An ordinary character matches itself.


# vi: set tw=76 et: