Next: , Previous: Declarations, Up: Grammar File


2.4 Grammar Rules

A grammar rule has the following general form:

     result: components ...
           ;

result is the nonterminal symbol that this rule describes, and components are the various terminal and nonterminal symbols that are put together by this rule (see Terminal and Nonterminal Symbols).

For example, this rule:

     exp: exp PLUS exp
        ;

Says that two groupings of type ‘exp’, with a ‘PLUS’ token in between, can be combined into a larger grouping of type ‘exp’.

Multiple rules for the same result are joined with the vertical-bar character | as follows:

     result: rule1-components...
           | rule2-components...
             ...
           ;

If components in a rule is empty, it means that result can match the empty string. For example, here is how to define a comma-separated sequence of zero or more ‘exp’ groupings:

     expseq: ;;Empty
           | expseq1
           ;
     
     expseq1: exp
            | expseq1 COMMA exp
            ;

It is customary to write a comment ;;Empty in each rule with no components.

Please note:
In LALR grammars, a %prec modifier can be written after the components of a rule to specify a terminal symbol whose precedence should be used for that rule. The syntax is:
          %prec token

It assigns the rule the precedence of token, overriding the precedence that would be deduced for it in the ordinary way. For more details, see Grammar format, PRECEDENCE.

Here is a typical example:

          ...
          %left NEG     ;; negation--unary minus
          ...
          
          %%
          ...
          
          exp: ...
             | '-' exp %prec NEG
               (- $2)
             ...
             ;

Scattered among the components can be actions that determine the semantics of the rule. An action is an Emacs Lisp list form, for example:

     (cons $1 $2)

To execute a sequence of forms, you can enclose them between braces like this:

     {
       (message "$2=%s" $2)
       $2
      }

Usually there is only one action and it follows the rule components.

The code in an action can refer to the semantic values of the components matched by the rule with the construct ‘$N’, which stands for the value of the Nth component.

Here is a typical example:

     exp: ...
        | exp PLUS exp
          (+ $1 $3)
        ...
        ;

This rule constructs an ‘exp’ from two smaller ‘exp’ groupings connected by a plus-sign token. In the action, ‘$1’ and ‘$3’ refer to the semantic values of the two component ‘exp’ groupings, which are the first and third symbols on the right hand side of the rule. The sum becomes the semantic value of the addition-expression just recognized by the rule. If there were a useful semantic value associated with the ‘PLUS’ token, it could be referred to as ‘$2’.

Note that the vertical-bar character | is really a rule separator, and actions are attached to a single rule.

By convention, if you don't specify an action for a rule, the value of the first symbol in the rule becomes the value of the whole rule: (progn $1). The default value of an empty rule is nil.

The exact default behavior depends on the parser. For more information, see Wisent, and see Bovine, manuals.

Please note:
In LALR grammars, you can have mid-rule actions, that is semantic actions put in the middle of a rule. These actions are written just like usual end-of-rule actions, but they are executed before the parser even recognizes the following components.

The mid-rule action itself counts as one of the components of the rule. This makes a difference when there is another action later in the same rule (and usually there is another at the end): you have to count the actions along with the symbols when working out which number N to use in ‘$N’.

The mid-rule action can also have a semantic value, and actions later in the rule can refer to the value using ‘$N’.

There is no way to set the value of the entire rule with a mid-rule action. The only way to set the value for the entire rule is with an ordinary action at the end of the rule.

Here is an example taken from the semantic/grammar.wy grammar in the distribution:

          nonterminal:
              SYMBOL
              ;; Mid-rule action
              (setq semantic/grammar-wy--nterm $1
                    semantic/grammar-wy--rindx 0)
              COLON rules SEMI
              ;; End-of-rule action
              (TAG $1 'nonterminal :children $4)
            ;