4. Advanced example 1: parsing expressions (CForm)

In this chapter we show how Yacas represents expressions and how one can build functions that work on various types of expressions. Our specific example will be CForm(), a standard library function that converts Yacas expressions into C or C++ code. Although the input format of Yacas expressions is already very close to C and perhaps could be converted to C by means of an external text filter, it is instructive to understand how to use Yacas to parse its own expressions and produce the corresponding C code. Here we shall only design the core mechanism of CForm() and build a limited version that handles only expressions using the four arithmetic actions.


4.1 Recursive parsing of expression trees

As we have seen in the tutorial, Yacas represents all expressions as trees, or equivalently, as lists of lists. For example, the expression "a+b+c+d+e" is for Yacas a tree of depth 4 that could be visualized as

  "+"
 a  "+"
   b  "+"
     c  "+"
       d   e
or as a nested list: ("+" a ("+" b ("+" c ("+" d e)))).

Complicated expressions are thus built from simple ones in a general and flexible way. If we want a function that acts on sums of any number of terms, we only need to define this function on a single atom and on a sum of two terms, and the Yacas engine will recursively perform the action on the entire tree.

So our first try is to define rules for transforming an atom and for transforming sums and products. The result of CForm() will always be a string. We can use recursion like this:

In> 100 # CForm(a_IsAtom) <-- String(a);
Out> True;
In> 100 # CForm(_a + _b) <-- CForm(a) : \
  " + " : CForm(b);
Out> True;
In> 100 # CForm(_a * _b) <-- CForm(a) : \
  " * " : CForm(b);
Out> True;

We used the string concatenation operator ":" and we added spaces around the binary operators for clarity. All rules have the same precedence 100 because there are no conflicts in rule ordering so far: these rules apply in mutually exclusive cases. Let's try converting some simple expressions now:

In> CForm(a+b*c);
Out> "a + b * c";
In> CForm(a+b*c*d+e+1+f);
Out> "a + b * c * d + e + 1 + f";

With only three rules, we were able to process even some complicated expressions. How did it work? We could illustrate the steps Yacas went through when simplifying CForm(a+b*c) roughly like this:

CForm(a+b*c)
    ... apply 2nd rule
CForm(a) : " + " : CForm(b*c)
    ... apply 1st rule and 3rd rule
"a" : " + " : CForm(b) : " * " : CForm(c)
    ... apply 1st rule
"a" : " + " : "b" : " * " : "c"
    ... concatenate strings
"a + b * c"


4.2 Handling precedence of infix operations

It seems that recursion will do all the work for us. The power of recursion is indeed great and extensive use of recursion is built into the design of Yacas. We might now add rules for more operators, for example, the unary addition, subtraction and division:

100 # CForm(+ _a) <-- "+ " : CForm(a);
100 # CForm(- _a) <-- "- " : CForm(a);
100 # CForm(_a - _b) <-- CForm(a) : " - "
  : CForm(b);
100 # CForm(_a / _b) <-- CForm(a) : " / "
  : CForm(b);

However, soon we find that we forgot about operator precedence. Our simple-minded CForm() gives wrong C code for expressions like this:

In> CForm( (a+b) * c );
Out> "a + b * c";

We need to get something like "(a+b)*c" in this case. How would we add a rule to insert parentheses around subexpressions? A simple way out would be to put parentheses around every subexpression, replacing our rules by something like this:

100 # CForm(_a + _b) <-- "(" : CForm(a)
  : " + " : CForm(b) : ")";
100 # CForm(- _a) <-- "(- " : CForm(a)
  : ")";
and so on. This will always produce correct C code, e.g. in our case "((a+b)*c)", but generally the output will be full of unnecessary parentheses. It is instructive to find a better solution.

We could improve the situation by inserting parentheses only if the higher-order expression requires them; for this to work, we need to make a call such as CForm(a+b) aware that the enveloping expression has a multiplication by c around the addition a+b. This can be implemented by passing an extra argument to CForm() that will indicate the precedence of the enveloping operation. A compound expression that uses an infix operator must be bracketed if the precedence of that infix operator is higher than the precedence of the enveloping infix operation.

We shall define an auxiliary function also named "CForm" but with a second argument, the precedence of the enveloping infix operation. If there is no enveloping operation, we shall set the precedence to a large number, e.g. 60000, to indicate that no parentheses should be inserted around the whole expression. The new "CForm(expr, precedence)" will handle two cases: either parentheses are necessary, or unnecessary. For clarity we shall implement these cases in two separate rules. The initial call to "CForm(expr)" will be delegated to "CForm(expr, precedence)".

The precedence values of infix operators such as "+" and "*" are defined in the Yacas library but may change in a future version. Therefore, we shall not hard-code these precedence values but instead use the function OpPrecedence() to determine them. The new rules for the "+" operation could look like this:

PlusPrec := OpPrecedence("+");
100 # CForm(_expr) <-- CForm(expr, 60000);
100 # CForm(_a + _b, _prec)_(PlusPrec>prec)
  <-- "(" : CForm(a, PlusPrec) : " + "
  : CForm(b, PlusPrec) : ")";
120 # CForm(_a + _b, _prec) <--
    CForm(a, PlusPrec) : " + "
	: CForm(b, PlusPrec);
and so on. We omitted the predicate for the last rule because it has a later precedence than the preceding rule.

The way we wrote these rules is unnecessarily repetitive but straightforward and it illustrates the central ideas of expression processing in Yacas. The standard library implements CForm() essentially in this way. In addition the library implementation supports standard mathematical functions, arrays and so on, and is somewhat better organized to allow easier extensions and avoid repetition of code.