In the documentation, quotation marks are often used to visually delimit
examples that are embedded in the text. The quotation marks are never part of
the example.
By default, the following
characters have special meaning in patterns. Note that all of these can be
changed through the use of the @set-syntax function. See also the
-literal option.
- *
- In a template, this denotes a wild-card argument that matches any number
of characters, from zero up to a maximum of 4096, or as specified by the
-arglen option. (Some limit is needed for efficiency to avoid reading
all the way to the end of the file before concluding that the match has
failed.) Characters are copied from the input stream into the argument value
until a match is found for the entire remainder of the template. Thus, when a
template has two or more wild card arguments, the input text is divided among
them as necessary for the complete template to be matched. (By contrast, a
``<u>'' argument is similar except that it terminates when a
match is found for whatever sequence of literal characters follows it, up
until the next argument.) If the -line option is in effect or if
``\L'' appeared earlier in the template, then it will not accept a
newline character.
In an action, it denotes the value of the corresponding template argument.
- ?
- Wild-card argument that matches any one character. If the -line
option is in effect or if ``\L'' appeared earlier in the template,
then it will not accept a newline character.
- #
- Recursive argument. In a template, this denotes an argument whose value is
obtained by translating the input text in the same domain as the current rule
until a match is found for whatever sequence of literal characters follows the
argument (up to the next argument, or the end of the template, or
``\G'').
In an action, it denotes the value of the corresponding template argument.
- <name>
- Recursive argument, translated according the named domain, or a
pre-defined recognizer argument. The name may be empty to denote the default
domain. The name does not have to have been defined before it is referenced.
This can be used only in a template.
- /regexp/
- In a template, this denotes an argument where the characters between the
slashes are used as a regular expression, and the argument value is however
much text it matches. Regular expressions have been documented many other
places, so will not be detailed here. Suffice it to say that the following
characters and combinations have special meaning:
. \ [ ] * + ^ $ \( \)
\< \> A slash that is to be part of the regular expression
needs to be preceded by a backslash. Regular expression arguments never cross
line boundaries. Unlike other kinds of arguments, they will match as many
characters as they can, without regard to whatever follows in the template.
For example, the template ``a/[a-z]*/x'' will never match anything
because if there is an ending ``x'', it will be swallowed by the
argument; however, in the template ``a<l>x'' the argument will
match on any letter except ``x''.
- =
- This designates the end of a template and the beginning of the
corresponding action.
- $0
- This can be used in an action to copy the matched text to the output. The
template is evaluated as though it were an action, with each argument
designator being replaced by the actual argument value. Note that this does
not necessarily exactly duplicate the input text since any ignored whitespace
will be lost and recursive arguments are shown in their translated form.
- $digit or ${digits}
- In either a template or action, this represents the value of the numbered
argument. The argument number must be enclosed in braces if it needs more than
one digit. In a template, this obviously can only refer to a preceding
argument, and in the current implementation, the value of a ``*''
argument cannot be accessed within the same template.
- $letter
- In either a template or action, this inserts the value of a variable,
which is limited to having a name which is a single letter. An error is
reported if the variable is not defined.
- ${name}
- In an action, this outputs the value of variable. The name is limited to
not begin with a digit. An error is reported if the variable is not defined.
- ${name;default}
- In an action, this outputs the value of the named variable, if it is
defined, or evaluates the default action if the variable is not defined.
- \
- Escape character; see the section on ``escape sequences'' below.
- ^
- Control key. Together with the following character, this represents the
control character formed by combining the Control key with the character. For
example, either ``^J'' or ``^j'' could be used to denote the
ASCII Line Feed character. This notation is not meaningful if a character set
other than ASCII is being used.
- Space
- In a template, a space character matches one or more whitespace characters
in the input, the same as ``\S''. (In the less likely event that you
really want to match exactly one space character, you can use ``\ ''
or ``\s''.) In an action, a space character causes one space to be
output if the last character output was not a whitespace character, except
that if there are multiple adjacent spaces, all but the first are taken
literally. However, if the -w option is used, then spaces are ignored
except where they server to separate two identifiers.
- NewLine
- The end of a line denotes the end of a rule or immediate action.
- ;
- The semicolon is used to separate multiple rules on the same line, and to
separate arguments of function calls.
- @name{args}
- In an action, this notation is used to either call a built-in function or
to translate the argument using the rules of the named domain. The name may be
empty to denote the default domain. It is permissible to reference a domain
name that is defined later in the file. The braces may be optionally omitted
for functions that take no arguments.
- @spchar
- When followed by a special character (i.e. not a letter or digit), the
``@'' indicates that the following character has its default meaning,
as documented in this list. This can be used to access the original
functionality of a character that has been changed by the -literal
option or @set-syntax function. For example, if you had done
``-literal /'' and then discovered that you do need to use a regular
expression, you could write it as ``@/regexp@/''.
- :
- The characters to the left of the colon (with any leading and trailing
spaces and surrounding angle brackets removed) constitute the name of the
domain in which the rules that follow on the same line will be defined.
- ::
- A double colon specifies that the domain whose name appears to the left,
inherits from the domain whose name appears to the right.
- !
- Comment - the rest of the line is ignored. This can either appear at the
beginning of a line to cause the whole line to be ignored, or it can be used
at the end of a rule so that the remainder of the line is a comment.
up
The backslash character
denotes special handling for the character that follows it.
- When followed by a lower-case letter or a digit, it represents a
particular control character.
- When followed by an upper-case letter, it is a pattern match operator.
- A backslash at the end of a line designates continuation by causing the
newline to be ignored along with any leading white space on the following
line.
- Before any other character, the backslash quotes the character so that it
simply represents itself. In particular, a literal backslash is represented by
two backslashes.
Following are the defined escape sequences:
- \a
- Alert (a.k.a. bell) character
- \b
- Backspace character
- \cx
- Control key combined with the following character. For example,
``\ci'', ``\cI'', ``^i'', ``^I'', and
``\t'' all have the same effect, namely to represent the ASCII Tab
character.
- \d
- Delete character
- \e
- Escape character (i.e. ESC, not backslash)
- \f
- Form feed character
- \n
- New line character
- \r
- carriage Return character
- \s
- Space character
- \t
- horizontal Tab character
- \v
- Vertical tab character
- \xxx
- character specified by its heXadecimal code
- \digits
- character specified by its octal code
- \A
- Matches the beginning of the input data, either the beginning of a file or
the beginning of the argument for a domain used as a function.
- \B
- Matches the beginning of file. This can be used either by itself to
specify actions to be taken before beginning to read the file, or it can be
used at the beginning of a template that is to match only on the first line of
the file.
- \C
- This causes case-insensitive comparison for letters in the rest of the
template. (See also the -i option which selects case-insensitive mode
globally.)
- \E
- Matches the end of file.
- \G
- Goal point. This can be used in a template to indicate the end of the
literal string that is used to recognize the end of the preceding argument.
For example, if the template ``a(<T>) done'' is applied to the
input data ``a(x) b(y) done'', the argument ``<T>''
will match on the text ``x) b(y'', which is probably not what was
desired. If the template is written as ``a(<T>)\G done'' then
the argument will be terminated by the first right parenthesis, and then the
match will fail if the text following the parenthesis doesn't match ``
done''. This does not yet work for ``*'' arguments.
If ``\G'' immediately follows a recursive argument, then there is
no delimiter, and the argument will continue to accept characters until it
stops itself by executing @end or @terminate.
- \I
- Identifier separator. In a template, this matches an empty string if it is
not within an identifier. In other words, it requires either of the adjacent
characters to not be an identifier constituent in order for the template to
match. In an action, this outputs a space character if the last character
output is an identifier constituent. By default, an identifier constituent is
a letter, digit, or underscore, but this can be extended by the
-idchars option.
- \J
- Join - locally counteracts the -w and/or -t option by
saying that spaces in the input will not be ignored at this position, and an
identifier delimiter is not required here. If neither of these options is
being used, then it has no effect. Not meaningful in an action.
- \L
- Line mode - arguments that follow in the same template are not allowed to
cross line boundaries. This also means that ``\S'' and
``\W'' will not accept newline characters. However, a line boundary
can still be crossed by an explicit ``\n'' or ``\N''.
- \N
- New line boundary. In a template, this matches an empty string if it is at
either the beginning of a line or the end of a line (either before or after a
new line character, or at the beginning or end of the file or data stream). In
an action, it outputs a new line character if the last character output is not
a new line.
- \P
- Position - if the template matches, the input stream will be left at this
position. Thus everything following this is a look-ahead, and will be re-read
for subsequent pattern matches.
- \S
- Space. In a template, this matches one or more whitespace characters. (See
also ``<S>'' which has the same effect except that the spaces
are remembered as an argument value.) In an action, it outputs one space
character if the last character output is not a whitespace character.
- \W
- Optional whitespace. In a template, this specifies that any whitespace
characters in the input stream at this point will be skipped over. (See also
``<s>'' which has the same effect except that the spaces are
remembered as an argument value.) However, if this is followed in the template
by a literal whitespace character, then that character will not be skipped.
For example, in ``\W\n'', the ``\W'' will skip any
whitespace other than a newline. This has no effect in an action. See also the
-w option which ignores spaces everywhere.
- \X
- Word separator. In a template, this matches an empty string if it is not
within a word. In this context, a word consists of letters and digits.
- \Z
- Matches the end of the input data, either the end of a file or the end of
the argument for a domain used as a function, or a look-ahead match of the
terminating string for a recursive argument.
up
The following argument
designators, consisting of a single letter between angle brackets, can be used
in templates to match on various kinds of characters. Preceding the letter with
``-'' inverts the test. The argument requires at least one matching
character if the letter is uppercase, or is optional if the letter is lowercase.
The letter may be followed by a number to match on that many characters, or up
to that maximum for an optional argument. If the number is 0, the
argument matches if the next character is of the indicated kind, but the input
stream is not advanced past it; in other words, this acts as a one-character
look-ahead.
If the argument is followed in the template by literal characters, then the
argument will be terminated when that literal string is matched, even if those
characters would otherwise qualify for inclusion in the argument.
- <A>
- Alphanumeric (letters and digits)
- <C>
- Control characters
- <D>
- Digits
- <F>
- File pathname. See the -filechars option.
- <G>
- Graphic characters, i.e. any non-space printable character
- <I>
- Identifier. By default, an identifier consists of letters, digits, and
underscores. See the -idchars option.
- <J>
- lower case letters (in version 1.2 or later)
- <K>
- upper case letters (in version 1.2 or later)
- <L>
- Letters (either upper or lower case)
- <N>
- Number, i.e. digits with optional sign and decimal point
- <O>
- Octal digits
- <P>
- Printing characters, including space
- <S>
- white Space characters (space, tab, newline, FF, VT)
- <T>
- Text characters, including all printing characters and white space
- <U>
- Universal (matches anything except end-of-file)
- <W>
- Word (letters, apostrophe, and hyphen)
- <X>
- hexadecimal digits
- <Y>
- punctuation (graphic characters that are not identifiers)
up
Table of Contents
1 Introduction
2 Operational Overview
3 Notation
3.1 Special characters
3.2 Escape Sequences
3.3 Recognizer arguments
4 Built-in Functions
4.1 Numbers
4.2 String functions
4.2.1 Output formatting -- padding, filling, and wrapping
4.2.2 String Comparison
4.2.3 Case conversion
4.2.4 Miscellaneous string functions
4.3 Variables
4.4 Files
4.4.1 Pathname manipulation
4.4.2 Using alternate input and output files
4.4.3 File context queries
4.5 Control flow functions
4.6 Other operating system interfaces
4.7 Definitions
4.8 Setting Options
4.9 Informational functions
5
Customized command-line processing
6 Exit codes
7 Status and Future development
8 Acknowledgments
|
|
 |