Xrate Macros

From Biowiki
Revision as of 23:43, 1 January 2017 by Move page script (talk | contribs) (Move page script moved page XrateMacros to Xrate Macros: Rename from TWiki to MediaWiki style)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Documentation of the macro language that forms part of the xrate format.

For the longer & complete description of the format, including the grammar description as well as these macros, please see the xrate format page.


The xrate macro preprocessor

Several kinds of macro are automatically expanded by xrate before any training or alignment annotation takes place. Macro expansion is a one-off, irreversible event: if the grammar file is saved after macro substitution has taken place, the original macros will not be recoverable.

Preprocessing and parsing take place in the following order:

  1. Parsing of the Stockholm alignment file;
  2. Processing of &include directives (everything from here onwards refers to the grammar file);
  3. Parsing of the alphabet declaration;
  4. Expansion of &define, &foreach and &warn directives (a preorder tree traversal, with some lookahead);
  5. Reduction of list, logic, arithmetic & other functions (a postorder tree traversal);
  6. Evaluation of any &scheme or &scheme-discard expressions (if compiled with Guile Scheme support);
  7. Parsing of the (generated) phylo-grammar.

Including files

(&include ~/dart/grammars/hky85.eg)

The &include directive includes another file (just like #include in C). Shell globbing is performed on the filename, so you can use shorthands like tildes and wildcards, as in the above example.

Printing warnings

(&warn Generating column COLUMN ...)

Prints the atoms following &warn to standard error. Useful to monitor progress during generation of large grammars, e.g. this column-specific model (from whence the example is taken): DartGrammar:site_specific.eg

Simple substitutions

The &define macro is used to indicate that one expression should be substituted for another.

For example,

(&define X yellow)
curious X
(mellow X)

evaluates to

curious yellow
(mellow yellow)

Currently, only atomic expressions may be substituted in; so, for example, (&define X (bright yellow)) is not a legal usage, since (bright yellow), unlike yellow, is a list rather than an atom.

Note also that the binding is static. You cannot use &define to create a dynamic function that evaluates to different values at different times. If this is the sort of thing you need to do, you will be better off using Scheme (perhaps with the experimental Scheme language extensions to xrate).

List operations

The following operators fold a list into a single element during macro preprocessing.

Concatenation

The &cat function (or the shorthand .) concatenates a list of atoms into a single atom. For example,

(&. X Y Z)
(&cat X Y Z)

both evaluate to

XYZ

Summation

The &sum function (or the shorthand +) takes the sum of a list of integer values. For example,

(&+ 1 10 -5)
(&sum 1 10 -5)

both evaluate to

6

Multiplication

The &mul function (or the shorthand *) takes the product of a list of floating-point values. For example,

(&* 2 3 5)
(&mul 2 3 5)

both evaluate to

30

Binary operations

Division

(&/ X Y)
(&div X Y)

Both evaluate to the floating-point division X / Y.

Modulus

(&% A B)
(&mod A B)

Both evaluate to the integer modulus operation X mod Y.

Subtraction

(&- X Y)
(&sub X Y)

Both evaluate to the integer subtraction X - Y.

Iterations

The following macros generate a list of elements from a template during preprocessing.

foreach

(&foreach VAR (LIST) EXPR)

Inserts one copy of EXPR for every element of LIST. Any occurrences of VAR within EXPR will be replaced by the corresponding element of LIST. For example,

(&foreach VAR (1 2 3) (VAR + 1))

evaluates to

(1 + 1) (1 + 2) (1 + 3)

EXPR can include more than one item, e.g.

(&foreach VAR (1 2 3) VAR *)

evaluates to

1 * 2 * 3 *

foreach-integer

As foreach, but the list is specified as an integer range. This can be useful for specifying arrays of states, c.f. Profile HMMs.

(&foreach VAR (MINVAL MAXVAL) EXPR)

For example,

(&foreach-integer VAR (1 3) VAR *)

evaluates to

1 * 2 * 3 *

foreach-token

As foreach, but the list is taken to be the set of alphabet tokens.

(&foreach-token VAR EXPR)

foreach-node, foreach-branch, foreach-leaf, foreach-ancestor

As foreach, but the list is taken to be [some subset of] the node names in the tree. (NB this macro is data-dependent, and it only works if the input alignment database contains exactly one alignment; "the tree" refers to the tree specified in the #=GF NH field of this alignment.)

(&foreach-node VAR EXPR)

The various forms allow iteration over

all named nodes (&==&foreach-node==),
all named nodes except the root (&==&foreach-branch==),
all named leaf nodes (&==&foreach-leaf==)
or
all named internal nodes (&==&foreach-ancestor==).

Logic operations

You can do some basic logic in the macro language. For more elaborate computations, use the built-in scheme interpreter.

Equality

(&= SEXPR1 SEXPR2)
(&eq SEXPR1 SEXPR2)

(&!= SEXPR1 SEXPR2)
(&neq SEXPR1 SEXPR2)

If the two S-expressions, SEXPR1 and SEXPR2, are equal/inequal, evaluate to 1; otherwise, evaluate to 0.

Arithmetic comparisons

(&> EXPR1 EXPR2)
(&gt EXPR1 EXPR2)

(&< EXPR1 EXPR2)
(&lt EXPR1 EXPR2)

(&>= EXPR1 EXPR2)
(&geq EXPR1 EXPR2)

(&<= EXPR1 EXPR2)
(&leq EXPR1 EXPR2)

Arithmetic comparisons between numerical expressions, returning 1 for true and 0 for false.

Conditional operator

(&? TEST_EXPR TRUE_EXPR FALSE_EXPR)
(&if TEST_EXPR TRUE_EXPR FALSE_EXPR)

If the integer value of TEST_EXPR is nonzero, evaluates to TRUE_EXPR; otherwise, evaluates to FALSE_EXPR.

Boolean operations

<code>
(&and X Y)
(&or X Y)
(&not X)
</code>

These do what you'd expect.

&and and &or don't just have to be binary; they can be list operators too, e.g.

<code>
(&and X Y Z)
(&or A B C D E)
</code>

Miscellaneous functions

ASCII character manipulation

(&chr INT)
(&ord CHAR)

&ord evaluates to the ASCII value of character CHAR.

&chr evaluates to the character whose ASCII value is given by INT.

Numerical functions

<code>
(&int EXPR)
</code>

&int evaluates to the integer value of EXPR.

Special constants

Some special constants are auto-substituted during macro expansion.

&TOKENS

Evaluates to the number of tokens in the terminal alphabet.

&NODES, &BRANCHES, &LEAVES, &ANCESTORS

Each of these evaluates to the number of tree nodes of a particular class. The respective classes are

  • all nodes: &NODES
  • all nodes except the root: &BRANCHES
  • all leaf nodes: &LEAVES
  • all internal nodes: &ANCESTORS

As with foreach-node, etc., these macros only work if the input alignment database contains exactly one alignment.

&COLUMNS

Evaluates to the number of columns in the alignment.

This macro only works if the input alignment database contains exactly one alignment.

Arbitrary Scheme expressions

At some point, the xrate macros may become too limiting for you, at which point you may decide you need to write an actual program to generate your grammar (hey, it happens). If you compiled xrate on a system with the Guile library present, you can evaluate arbitrary Scheme expressions inside a grammar file.

&scheme

The &scheme macro evaluates a Scheme expression and interpolates the result. It is approximately equivalent to the unquote-splicing expression in a Scheme quasiquote environment.

For example, the following code will be transformed to (blakes 7) by the Scheme evaluation phase of the macro preprocessor, before your grammar is parsed by the xrate phylogrammar interpreter:

(blakes
 (&scheme
  (define x 3)
  (define
	(y a)
	(+ a 5)))
 (&scheme (y 2)))

Evaluation of Scheme expressions is performed after expansion of all other macros. The order of evaluation is a depth-first recursive traversal of the S-expression tree.

If you want to evaluate a Scheme expression and discard the return value (i.e. to change the Scheme environment without adding anything to the grammar), you can use &scheme-discard in place of &scheme. This is rarely necessary, as the most common way to change the Scheme environment is with a define (as in the above example), which returns an unspecified return value that is automatically discarded anyway.

Within the Scheme expression, the input alignment is bound to the Scheme symbol alignment, and the Scheme primitives provided by xrate are all available.

These keywords and their behavior are currently documented here: Dart Scheme Functions

For example, the following code is equivalent to the &COLUMNS macro:

(&scheme (stockholm-column-count alignment))

Macro debugging

To dump the macro-expanded, grammar to a file after post-processing, use the -x option to xrate. (Note: this will not include any constructs that were not parsed correctly. See below for more on this.)

In combination with the --noannotate option, which disables annotation (and hence prevents any dynamic programming from taking place), this can be useful to test correct usage of macros. For example:

cd dart
xrate src/handel/t/short.stk
		-g grammars/ancestral_gc.eg
		-x expanded.eg
		--noannotate > /dev/null
cat expanded.eg

Note that the -x option dumps the grammar after parsing it. If your macros generated any invalid syntax (i.e. text that was not recognized by xrate's grammar parser), this will not appear in the grammar file dumped by the -x option. If you want to debug all macro expansions (including any invalid syntax), use the logging option -log ECFG_MACROS to dump out the grammar file after macro expansion, but prior to parsing.