Mercurial > repo
view src/ploki/doc/ploki-expr.txt @ 12512:9b31bb5d9ee1 draft default tip
<int-e> learn The password of the month is a matter of ongoing investigations
author | HackEso <hackeso@esolangs.org> |
---|---|
date | Wed, 25 Sep 2024 19:54:44 +0100 |
parents | ac0403686959 |
children |
line wrap: on
line source
*** THE PLOKI LANGUAGE: EXPRESSIONS *** GENERAL ploki is an imperative, unstructured language, similar to BASIC. Some of its features are borrowed from Perl. ploki's data structures are fully dynamic, i.e. they resize themselves when necessary. Strings know their length and may contain binary data. VALUES There are six types of values: 1. String: A (possibly empty) sequence of characters. Any value can be converted to a string. 2. Number: A (double precision) floating point number. Any value can be converted to a string. 3. IO handle: Can be read from or written to. 4. List: A (possibly empty) sequence of values. 5. Bound expression: See @OMFG below. 6. Undefined. Values aren't necessarily one thing or another. There's no place to declare a variable to be of type "string", type "number", or anything else. In fact, you can't declare variables at all. Every variable you use is initialized with the undefined value ("undef" for short). Truth 0, "0", "", #<#> and undef are false. Any other value is true. Conversions Conversions happen as follows: * number -> string: decimal floating point, the "normal" way * IO handle -> string: filename * list -> string: concatenation of the stringified list elements * bound expression -> string: undefined results * undef -> string: "" (the empty string) * string -> number: initial whitespace is skipped, the rest of the string is interpreted as a decimal floating point number (as done by strtod()) * IO handle -> number: undefined results * list -> number: sum of the numified list elements * bound expression -> number: undefined results * undef -> number: 0.0 Constructors Anything starting with a digit or a period followed by a digit is taken to be the beginning of a number (again, as parsed by strtod(), which means that scientific notation (e.g. 1e-23) is allowed). String literals start with '"' and end with '"' or end-of-line. The following escape sequences are recognized: \OOO octal char (where OOO matches [0-7]{1,3}) \xXX hex char (where XX matches [0-9a-fA-F]{1,2}) \a alarm \b backspace \f form feed \n newline \r carriage return \t tab \v vertical tab \cX the character resulting from (toupper(X) + 64) % 128 (this depends on the character set used) \V<value> The stringified result of evaluating <value>. In other words, this can be used to interpolate expressions in strings. Example: "2 + 2 = \V(2 + 2)" yields "2 + 2 = 4". Lists start with #< and end with #> or end-of-line. Example: #<2 +1 (3 +4)#> is equivalent to #<2 1 7#>. The empty value yields undef: `-2' is parsed as ` - 2', undef is converted to 0.0, and the result is 0.0 - 2.0, which is -2. Special symbols There are the following special symbols, similar to Perl's punctuation variables: * \N (where N is a non-empty sequence of digits) Contains the substring that was matched by the Nth capturing group in the previous pattern match. * \! If used numerically, yields the current value of C's errno variable, or in other words, if a system or library call fails, it sets this variable. If used as a string, yields the corresponding system error string (i.e. strerror(errno)). * \? Yields a random number in the range [0,1). * \_ Contains the return value of the last command. * \@ The argument the current function was called with. * \ARG Number of command line options (C's argc). * \AUSG The standard output IO handle (open mode: "WF"). * \EING The standard input IO handle (open mode: "RZ"). * \E Euler's number: 2.718281828... * \FEHL The standard error IO handle (open mode: "WF"). * \PI pi: 3.14159265... * \ undef VARIABLES There are two kinds of variables: plain variables and subscripted variables. A plain variable has the form [A-Za-z$]+, i.e. one or more alphabetic characters (where '$' is considered alphabetic). A subscripted variable looks like a plain variable immediately followed by an opening paren "(", an expression (the result of which is converted to a string) and possibly a closing paren ")". UNARY OPERATORS Each unary operator takes a value and returns a value. The only exception is @OMFG, which is more a constructor than an operator. * \ARG: Takes an integer and returns the corresponding command line argument (like C's argv[]). * \L Returns a lowercased version of its operand (like Perl's lc). * \Q Returns its operand with all non-word characters backslashed, i.e. any character not matching [a-zA-Z0-9_] will be preceded by a \ in the returned string (like Perl's quotemeta). * \R Returns its operand with all non-word characters regex-escaped, i.e. any character not matching [a-zA-Z0-9_] will be followed by a ! in the returned string (like s/(\W)/$1!/g in Perl). * \U Returns an uppercased version of its operand (like Perl's uc). * @+ For an operand N, returns the offset of the character after the portion of the string matched by the Nth capturing group in the last pattern matching. * @- For an operand N, returns the offset of the portion of the string matched by the Nth capturing group in the last pattern matching. * @ABS Returns the absolute value of its operand. * @ACOS Returns the arc cosine of its operand. * @APERS Takes a list of two elements, a filename and a mode string. Opens the specified file and returns an IO handle or undef on error. If the mode string contains "A", the file is created if necessary and opened for appending (i.e. every write goes to the end of the file); if the mode string contains "W", the file is truncated/created and opened for writing; otherwise the file is opened for reading, and the open fails if the file doesn't exist. If the mode string contains "+", reading is added for "A" and "W", and writing otherwise. "R+" is almost always preferred for read/write access; "W+" would clobber the file first. The resulting IO handle defaults to text mode; add "B" to open the file in binary mode. Adding "F" turns on autoflush mode, i.e. the stream is flushed after every write. If the file is opened for reading and the mode string contains "Z", the resulting stream can be used with the regex match operator (~). * @ASIN Returns the arc sine of its operand. * @ATAN Returns the arc tangent of its operand. * @ATAN2 Takes a list of two elements #<y x#> and returns the arc tangent of x and y. It is similar to @ATAN (y / x), except that the signs of both arguments are used to determine the quadrant of the result. * @CHR Takes an integer and returns the character represented by that number in the character set. * @COS Returns the cosine of its operand. * @DEF-P Returns "" if its operand is undef, 1 otherwise. * @EDD-P Returns 1 if the end-of-file indicator for its operand (which must be an IO handle) is set, "" otherwise. * @ENV or \ENV Returns the value of the specified environment variable. * @ERR-P Returns 1 if the error indicator for its operand (which must be an IO handle) is set, "" otherwise. * @EVAL Evaluates and returns its operand. If an exception is thrown during the evaluation of its operand, assigns the exception value to \_ and returns undef. * @GET Takes an IO handle, reads one character, and returns its numeric value or -1 on end-of-file or error. * @INT Returns the integer portion of its operand. * @IO-P Returns 1 if its operand is an IO handle, "" otherwise. * @LAPERS Takes a filename, opens it for reading, and returns the resulting IO handle or undef on failure. Equivalent to @APERS #<filename "RB"#>. * @LEGS Takes an IO handle, reads a line and returns it. Returns "" on end-of-file and undef on error. * @LENGTH If passed a list, returns the number of elements; otherwise, returns the length of its operand in characters. * @LG Returns the base-10 logarithm of its operand. * @LN Returns the natural logarithm (base e) of its operand. * @NEG Performs arithmetic negation. * @NOT Returns 1 if its operand is false, "" otherwise. * @NUM Converts its operand to a number. * @ORD Returns the numeric value of the first character of its operand. * @OMFG Does not evaluate its operand but wraps it up in a bound expression by replacing all variables by their current value. * @REMOVE Tries to remove the specified file. Returns 1 for success, "" for failure. * @RENAEM Takes a list of two filenames #<OLD NEW#> and tries to rename OLD to NEW. Returns 1 for success, "" for failure. * @REVERSE If passed a list, returns a list consisting of the elements of the original list in the opposite order. Otherwise, returns a string with all the characters in the opposite order. * @SAG Returns a value representing the current file position of its operand (which must be an IO handle) suitable as the second argument of @SUCH. If the file was opened in binary mode, this is the number of bytes from the beginning of the file. * @SAPERS Takes a filename, opens it for writing, and returns the resulting IO handle or undef on failure. The file is created if it doesn't exist and truncated to length 0 otherwise. Equivalent to @APERS #<filename "WBF"#>. * @SIN Returns the sine of its operand. * @SQRT Returns the square root of its operand. * @STR Converts its operand to a string. * @SUCH Takes a list of two or three elements, #<IO POS WHENCE#>. WHENCE defaults to 0 if omitted. Sets the current file position of IO (which must be an IO handle) to POS (relative to WHENCE). WHENCE must be one of 0 (beginning of the file), 1 (current position) or 2 (end of file). If IO is in text mode, POS must be a value returned by @SAG and WHENCE must be 0, or POS must be 0. If IO is in binary mode, the new position (measured in bytes from the beginning of the file) is obtained by adding POS to the position specified by WHENCE. In this case POS=0 and WHENCE=2 may not work, depending on your C library. * @TAN Returns the tangent of its operand. * @TYPE OF Returns a string describing the type of its operand: "string" for strings, "number" for numbers, "stream" for IO handles, "list" for lists, "stuff" for bound expressions, "nothing" for undef. * @<label> (where <label> is a static label) Calls the static label <label> with its operand. * @ Calls the dynamic label specified by its operand. BINARY OPERATORS Some of the operators (marked with (SL)) below double as string and list operators. They do the same thing with strings and lists by treating strings as lists of characters. Boolean and comparison operators return 1 for true and "" for false. _ (SL) If given two lists, it concatenates them. Otherwise, it converts its operands to strings and concatenates them. + Adds two numbers. - Subtracts two numbers. * Multiplies two numbers. / Divides two numbers. % Returns the modulus (remainder from division) of two numbers. ^ Exponentiation. [ (SL) Given a string/list B and an integer N, returns the substring/sublist of B starting at index N. Negative indices start from the end of the string/list. Example: "foo" [ 1 yields "oo"; "foobar" [ -2 yields "ar". ] (SL) Given a string/list B and an integer N, returns the substring/sublist of B having a length of N. Negative lengths start from the end of the string/list. Example: "foo" ] 1 yields "f"; "foobar" ] -2 yields "foob" < Numeric less than. > Numeric greater than. { (SL) String/list less than. } (SL) String/list greater than. = Numeric equality. ! Numeric inequality. : (SL) String/list equality. ; (SL) String/list inequality. & Logical and. | Logical or. ~ Pattern match. The left operand is the string or IO stream to be matched, the right operand is a string forming a regular expression (see PATTERNS below). On failure, "" is returned. On success, returns the position of the match and sets \_ to the position after the match. ?o~ Similar to ~ above, except that the right operand is only evaluated the first time it is executed. This means that (foo ?o~ bar) won't see any changes to bar after the first match attempt (like /o in Perl). ` Given a number X and an integer N (which must be in the range 2 .. 36), returns a string representation of X in base N. ' Given a string S and an integer N (which must be in the range 2 .. 36), interprets S as a number in base N and returns the result. (the empty operator) or . (SL) This one is really three different operators, depending on the type of its left operand: - IO handle: Attempts to read and return N bytes where N is the right operand. - Bound expression: Evaluates the bound expression with \@ temporarily set to the right operand. - String/list: Returns the element at the index specified by the right operand. , Returns its right operand. PATTERNS The ~ operator tries to match a string or IO handle against a pattern. A match succeeds if the pattern matches a substring of the matched string. When matching an IO handle (which must have been opened with "Z"), the matched characters are removed from the input stream if the match succeeds. Whitespace in the pattern is mostly ignored (that is, the pattern " a b" is equivalent to "ab"). Exceptions are marked [S] below. There is a special kind of "whitespace": everything between "[" and "#]" is ignored (except for "!"s), so you can write "foo [ a comment with [! in it #] bar" instead of "foobar". A search pattern consists of one or more branches, separated by "|". It succeeds if one of its branches succeeds. The branches are tried in the order they're specified. A branch is a (possibly empty) sequence of pieces. It succeeds if all of the pieces match in turn. A piece is an atom, possibly followed by a quantifier. A missing quantifier means the atom must match exactly once. A quantifier is one of the following (N, M are nonnegative integers): * 0 or more of the preceding atom (greedy) *? 0 or more of the preceding atom (nongreedy) [S] + 1 or more of the preceding atom (greedy) +? 1 or more of the preceding atom (nongreedy) [S] ? 0 or 1 of the preceding atom (greedy) ?? 0 or 1 of the preceding atom (nongreedy) [S] :N: exactly N of the preceding atom [S] :N:? exactly N of the preceding atom [S] :N,: N or more of the preceding atom (greedy) [S] :N,:? N or more of the preceding atom (nongreedy) [S] :N,M: at least N but not more than M of the preceding atom (greedy) [S] :N,M:? at least N but not more than M of the preceding atom (nongreedy) [S] An atom is either a character class, an assertion, a selection, a subgroup, a capturing group, a backreference, an independent subgroup, an abbreviation, an escaped character, or an ordinary character. A character class is either a built-in class or a custom class. A built-in class is one of the following: . a character c! a control character C! a non-control character d! a digit D! a non-digit l! a lowercase character L! a non-lowercase character p! a printable character P! a non-printable character q! an alphanumeric character Q! a non-alphanumeric character s! a space character S! a non-space character u! an uppercase character U! a non-uppercase character w! a word character, i.e. one of a-zA-Z0-9_ W! a non-word character x! a hex digit, i.e. one of 0-9a-fA-F X! a non-hex character [S] A custom class is a (possibly empty) sequence of characters, enclosed in '' (single quotes). It matches one of the listed characters, unless the first character is '^', in which case it matches one of the not listed characters. Whitespace is significant inside character classes. A range of characters can be specified by putting a '-' between the endpoints. In addition, all built-in classes listed above (except for .) and the following subclasses are available: [:alnum:] equivalent to q! [:^alnum:] equivalent to Q! [:alpha:] equivalent to a! [:^alpha:] equivalent to A! [:cntrl:] equivalent to c! [:^cntrl:] equivalent to C! [:digit:] equivalent to d! [:^digit:] equivalent to D! [:graph:] equivalent to '^P! ', i.e. any printable character except space [:^graph:] equivalent to 'P! ', i.e. any non-printable character or space [:lower:] equivalent to l! [:^lower:] equivalent to L! [:print:] equivalent to p! [:^print:] equivalent to P! [:punct:] equivalent to '^P!q! ', i.e. any non-alphanumeric, non-space printable character [:^punct:] equivalent to 'P!q! ', i.e. a non-printable character, an alphanumeric character or space [:space:] equivalent to s! [:^space:] equivalent to S! [:upper:] equivalent to u! [:^upper:] equivalent to U! [:xdigit:] equivalent to x! [:^xdigit:] equivalent to X! To include a literal ^, !, - or ' in a character class, escape it with a following !. Example: '!!'!' is a class that matches ! or '. An assertion is something that doesn't consume parts of the matched string, i.e. it matches with zero length. There are built-in assertions and custom assertions. A built-in assertion is one of the following: ^ at beginning of string $ at end of string A! at beginning of line Z! at end of line b! at a word boundary B! not at a word boundary A custom assertion is either a positive assertion, a negative assertion or a backreference check. A positive assertion consists of "[", followed by a pattern, followed by "&]" [S]. It succeeds (without consuming parts of the matched string) if the enclosed pattern succeeds. A negative assertion consists of "[", followed by a pattern, followed by "^]" [S]. It succeeds if the enclosed pattern fails. A backreference check consists of "[", followed by a number, followed by "]" [S]. It succeeds if the corresponding capturing group succeeded. A selection consists of "[", optionally followed by a pattern followed by "|", followed by a branch, followed by an atom, followed by "?]" [S]. It matches as follows: First the atom is tried. If it matches, the preceding branch is tried; otherwise matching continues with the preceding pattern if specified. In other words, it looks like "[no-pattern|yes-pattern(conditition)?]" or "[yes-pattern(condition)?]". Example: "{(!}?'^()'+[)![0]?]" matches a chunk of non-parentheses, possibly included in parentheses themselves. A subgroup is a parenthesized pattern. It succeeds if the enclosed pattern succeeds. A capturing group is a pattern surrounded by "{" and "}". It succeeds if the enclosed pattern succeeds. As a side effect, it sets \N, @-(N) and @+(N) where N is the number of the capturing group. Capturing groups are numbered from the right, counting the closing braces (starting with 0). A backreference is N! where N is a non-negative integer. It succeeds if it can match the same substring that was matched by the Nth capturing group. An independent subgroup is a pattern, surrounded by "<" and ">". It succeeds if the enclosed pattern succeeds, but when it does, it won't backtrack in the enclosed pattern. An abbreviation is a (possibly empty) sequence of characters surrounded by "`" and "`". All metacharacters lose their special meaning inside an abbreviation, except for ` (which marks the end of the abbreviation) and ! (which escapes the previous character, as usual). `abcd` is equivalent to (a(b(cd?)?)?)?, i.e. an abbreviation for x matches the longest possible initial substring of x. [S] An escaped character is a special character like "!" or "+" (or whitespace), followed by an "!". It matches that character literally. An ordinary character matches itself. # vi: set tw=76 et: