793 lines
27 KiB
Plaintext
793 lines
27 KiB
Plaintext
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
LET'S BUILD A COMPILER!
|
|
|
|
By
|
|
|
|
Jack W. Crenshaw, Ph.D.
|
|
|
|
24 July 1988
|
|
|
|
|
|
Part II: EXPRESSION PARSING
|
|
|
|
|
|
*****************************************************************
|
|
* *
|
|
* COPYRIGHT NOTICE *
|
|
* *
|
|
* Copyright (C) 1988 Jack W. Crenshaw. All rights reserved. *
|
|
* *
|
|
*****************************************************************
|
|
|
|
|
|
GETTING STARTED
|
|
|
|
If you've read the introduction document to this series, you will
|
|
already know what we're about. You will also have copied the
|
|
cradle software into your Turbo Pascal system, and have compiled
|
|
it. So you should be ready to go.
|
|
|
|
|
|
The purpose of this article is for us to learn how to parse and
|
|
translate mathematical expressions. What we would like to see as
|
|
output is a series of assembler-language statements that perform
|
|
the desired actions. For purposes of definition, an expression
|
|
is the right-hand side of an equation, as in
|
|
|
|
x = 2*y + 3/(4*z)
|
|
|
|
In the early going, I'll be taking things in _VERY_ small steps.
|
|
That's so that the beginners among you won't get totally lost.
|
|
There are also some very good lessons to be learned early on,
|
|
that will serve us well later. For the more experienced readers:
|
|
bear with me. We'll get rolling soon enough.
|
|
|
|
SINGLE DIGITS
|
|
|
|
In keeping with the whole theme of this series (KISS, remember?),
|
|
let's start with the absolutely most simple case we can think of.
|
|
That, to me, is an expression consisting of a single digit.
|
|
|
|
Before starting to code, make sure you have a baseline copy of
|
|
the "cradle" that I gave last time. We'll be using it again for
|
|
other experiments. Then add this code:
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
{ Parse and Translate a Math Expression }
|
|
|
|
procedure Expression;
|
|
begin
|
|
EmitLn('MOVE #' + GetNum + ',D0')
|
|
end;
|
|
{---------------------------------------------------------------}
|
|
|
|
|
|
And add the line "Expression;" to the main program so that it
|
|
reads:
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
begin
|
|
Init;
|
|
Expression;
|
|
end.
|
|
{---------------------------------------------------------------}
|
|
|
|
|
|
Now run the program. Try any single-digit number as input. You
|
|
should get a single line of assembler-language output. Now try
|
|
any other character as input, and you'll see that the parser
|
|
properly reports an error.
|
|
|
|
|
|
CONGRATULATIONS! You have just written a working translator!
|
|
|
|
OK, I grant you that it's pretty limited. But don't brush it off
|
|
too lightly. This little "compiler" does, on a very limited
|
|
scale, exactly what any larger compiler does: it correctly
|
|
recognizes legal statements in the input "language" that we have
|
|
defined for it, and it produces correct, executable assembler
|
|
code, suitable for assembling into object format. Just as
|
|
importantly, it correctly recognizes statements that are NOT
|
|
legal, and gives a meaningful error message. Who could ask for
|
|
more? As we expand our parser, we'd better make sure those two
|
|
characteristics always hold true.
|
|
|
|
There are some other features of this tiny program worth
|
|
mentioning. First, you can see that we don't separate code
|
|
generation from parsing ... as soon as the parser knows what we
|
|
want done, it generates the object code directly. In a real
|
|
compiler, of course, the reads in GetChar would be from a disk
|
|
file, and the writes to another disk file, but this way is much
|
|
easier to deal with while we're experimenting.
|
|
|
|
Also note that an expression must leave a result somewhere. I've
|
|
chosen the 68000 register DO. I could have made some other
|
|
choices, but this one makes sense.
|
|
|
|
|
|
BINARY EXPRESSIONS
|
|
|
|
Now that we have that under our belt, let's branch out a bit.
|
|
Admittedly, an "expression" consisting of only one character is
|
|
not going to meet our needs for long, so let's see what we can do
|
|
to extend it. Suppose we want to handle expressions of the form:
|
|
|
|
1+2
|
|
or 4-3
|
|
or, in general, <term> +/- <term>
|
|
|
|
(That's a bit of Backus-Naur Form, or BNF.)
|
|
|
|
To do this we need a procedure that recognizes a term and leaves
|
|
its result somewhere, and another that recognizes and
|
|
distinguishes between a '+' and a '-' and generates the
|
|
appropriate code. But if Expression is going to leave its result
|
|
in DO, where should Term leave its result? Answer: the same
|
|
place. We're going to have to save the first result of Term
|
|
somewhere before we get the next one.
|
|
|
|
OK, basically what we want to do is have procedure Term do what
|
|
Expression was doing before. So just RENAME procedure Expression
|
|
as Term, and enter the following new version of Expression:
|
|
|
|
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
{ Parse and Translate an Expression }
|
|
|
|
procedure Expression;
|
|
begin
|
|
Term;
|
|
EmitLn('MOVE D0,D1');
|
|
case Look of
|
|
'+': Add;
|
|
'-': Subtract;
|
|
else Expected('Addop');
|
|
end;
|
|
end;
|
|
{--------------------------------------------------------------}
|
|
|
|
|
|
Next, just above Expression enter these two procedures:
|
|
|
|
|
|
{--------------------------------------------------------------}
|
|
{ Recognize and Translate an Add }
|
|
|
|
procedure Add;
|
|
begin
|
|
Match('+');
|
|
Term;
|
|
EmitLn('ADD D1,D0');
|
|
end;
|
|
|
|
|
|
{-------------------------------------------------------------}
|
|
{ Recognize and Translate a Subtract }
|
|
|
|
procedure Subtract;
|
|
begin
|
|
Match('-');
|
|
Term;
|
|
EmitLn('SUB D1,D0');
|
|
end;
|
|
{-------------------------------------------------------------}
|
|
|
|
|
|
When you're finished with that, the order of the routines should
|
|
be:
|
|
|
|
o Term (The OLD Expression)
|
|
o Add
|
|
o Subtract
|
|
o Expression
|
|
|
|
Now run the program. Try any combination you can think of of two
|
|
single digits, separated by a '+' or a '-'. You should get a
|
|
series of four assembler-language instructions out of each run.
|
|
Now try some expressions with deliberate errors in them. Does
|
|
the parser catch the errors?
|
|
|
|
Take a look at the object code generated. There are two
|
|
observations we can make. First, the code generated is NOT what
|
|
we would write ourselves. The sequence
|
|
|
|
MOVE #n,D0
|
|
MOVE D0,D1
|
|
|
|
is inefficient. If we were writing this code by hand, we would
|
|
probably just load the data directly to D1.
|
|
|
|
There is a message here: code generated by our parser is less
|
|
efficient than the code we would write by hand. Get used to it.
|
|
That's going to be true throughout this series. It's true of all
|
|
compilers to some extent. Computer scientists have devoted whole
|
|
lifetimes to the issue of code optimization, and there are indeed
|
|
things that can be done to improve the quality of code output.
|
|
Some compilers do quite well, but there is a heavy price to pay
|
|
in complexity, and it's a losing battle anyway ... there will
|
|
probably never come a time when a good assembler-language pro-
|
|
grammer can't out-program a compiler. Before this session is
|
|
over, I'll briefly mention some ways that we can do a little op-
|
|
timization, just to show you that we can indeed improve things
|
|
without too much trouble. But remember, we're here to learn, not
|
|
to see how tight we can make the object code. For now, and
|
|
really throughout this series of articles, we'll studiously
|
|
ignore optimization and concentrate on getting out code that
|
|
works.
|
|
|
|
Speaking of which: ours DOESN'T! The code is _WRONG_! As things
|
|
are working now, the subtraction process subtracts D1 (which has
|
|
the FIRST argument in it) from D0 (which has the second). That's
|
|
the wrong way, so we end up with the wrong sign for the result.
|
|
So let's fix up procedure Subtract with a sign-changer, so that
|
|
it reads
|
|
|
|
|
|
{-------------------------------------------------------------}
|
|
{ Recognize and Translate a Subtract }
|
|
|
|
procedure Subtract;
|
|
begin
|
|
Match('-');
|
|
Term;
|
|
EmitLn('SUB D1,D0');
|
|
EmitLn('NEG D0');
|
|
end;
|
|
{-------------------------------------------------------------}
|
|
|
|
|
|
Now our code is even less efficient, but at least it gives the
|
|
right answer! Unfortunately, the rules that give the meaning of
|
|
math expressions require that the terms in an expression come out
|
|
in an inconvenient order for us. Again, this is just one of
|
|
those facts of life you learn to live with. This one will come
|
|
back to haunt us when we get to division.
|
|
|
|
OK, at this point we have a parser that can recognize the sum or
|
|
difference of two digits. Earlier, we could only recognize a
|
|
single digit. But real expressions can have either form (or an
|
|
infinity of others). For kicks, go back and run the program with
|
|
the single input line '1'.
|
|
|
|
Didn't work, did it? And why should it? We just finished
|
|
telling our parser that the only kinds of expressions that are
|
|
legal are those with two terms. We must rewrite procedure
|
|
Expression to be a lot more broadminded, and this is where things
|
|
start to take the shape of a real parser.
|
|
|
|
|
|
|
|
|
|
GENERAL EXPRESSIONS
|
|
|
|
In the REAL world, an expression can consist of one or more
|
|
terms, separated by "addops" ('+' or '-'). In BNF, this is
|
|
written
|
|
|
|
<expression> ::= <term> [<addop> <term>]*
|
|
|
|
|
|
We can accomodate this definition of an expression with the
|
|
addition of a simple loop to procedure Expression:
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
{ Parse and Translate an Expression }
|
|
|
|
procedure Expression;
|
|
begin
|
|
Term;
|
|
while Look in ['+', '-'] do begin
|
|
EmitLn('MOVE D0,D1');
|
|
case Look of
|
|
'+': Add;
|
|
'-': Subtract;
|
|
else Expected('Addop');
|
|
end;
|
|
end;
|
|
end;
|
|
{--------------------------------------------------------------}
|
|
|
|
|
|
NOW we're getting somewhere! This version handles any number of
|
|
terms, and it only cost us two extra lines of code. As we go on,
|
|
you'll discover that this is characteristic of top-down parsers
|
|
... it only takes a few lines of code to accomodate extensions to
|
|
the language. That's what makes our incremental approach
|
|
possible. Notice, too, how well the code of procedure Expression
|
|
matches the BNF definition. That, too, is characteristic of the
|
|
method. As you get proficient in the approach, you'll find that
|
|
you can turn BNF into parser code just about as fast as you can
|
|
type!
|
|
|
|
OK, compile the new version of our parser, and give it a try. As
|
|
usual, verify that the "compiler" can handle any legal
|
|
expression, and will give a meaningful error message for an
|
|
illegal one. Neat, eh? You might note that in our test version,
|
|
any error message comes out sort of buried in whatever code had
|
|
already been generated. But remember, that's just because we are
|
|
using the CRT as our "output file" for this series of
|
|
experiments. In a production version, the two outputs would be
|
|
separated ... one to the output file, and one to the screen.
|
|
|
|
|
|
USING THE STACK
|
|
|
|
At this point I'm going to violate my rule that we don't
|
|
introduce any complexity until it's absolutely necessary, long
|
|
enough to point out a problem with the code we're generating. As
|
|
things stand now, the parser uses D0 for the "primary" register,
|
|
and D1 as a place to store the partial sum. That works fine for
|
|
now, because as long as we deal with only the "addops" '+' and
|
|
'-', any new term can be added in as soon as it is found. But in
|
|
general that isn't true. Consider, for example, the expression
|
|
|
|
1+(2-(3+(4-5)))
|
|
|
|
If we put the '1' in D1, where do we put the '2'? Since a
|
|
general expression can have any degree of complexity, we're going
|
|
to run out of registers fast!
|
|
|
|
Fortunately, there's a simple solution. Like every modern
|
|
microprocessor, the 68000 has a stack, which is the perfect place
|
|
to save a variable number of items. So instead of moving the term
|
|
in D0 to D1, let's just push it onto the stack. For the benefit
|
|
of those unfamiliar with 68000 assembler language, a push is
|
|
written
|
|
|
|
-(SP)
|
|
|
|
and a pop, (SP)+ .
|
|
|
|
|
|
So let's change the EmitLn in Expression to read:
|
|
|
|
EmitLn('MOVE D0,-(SP)');
|
|
|
|
and the two lines in Add and Subtract to
|
|
|
|
EmitLn('ADD (SP)+,D0')
|
|
|
|
and EmitLn('SUB (SP)+,D0'),
|
|
|
|
respectively. Now try the parser again and make sure we haven't
|
|
broken it.
|
|
|
|
Once again, the generated code is less efficient than before, but
|
|
it's a necessary step, as you'll see.
|
|
|
|
|
|
MULTIPLICATION AND DIVISION
|
|
|
|
Now let's get down to some REALLY serious business. As you all
|
|
know, there are other math operators than "addops" ...
|
|
expressions can also have multiply and divide operations. You
|
|
also know that there is an implied operator PRECEDENCE, or
|
|
hierarchy, associated with expressions, so that in an expression
|
|
like
|
|
|
|
2 + 3 * 4,
|
|
|
|
we know that we're supposed to multiply FIRST, then add. (See
|
|
why we needed the stack?)
|
|
|
|
In the early days of compiler technology, people used some rather
|
|
complex techniques to insure that the operator precedence rules
|
|
were obeyed. It turns out, though, that none of this is
|
|
necessary ... the rules can be accommodated quite nicely by our
|
|
top-down parsing technique. Up till now, the only form that
|
|
we've considered for a term is that of a single decimal digit.
|
|
|
|
More generally, we can define a term as a PRODUCT of FACTORS;
|
|
i.e.,
|
|
|
|
<term> ::= <factor> [ <mulop> <factor ]*
|
|
|
|
What is a factor? For now, it's what a term used to be ... a
|
|
single digit.
|
|
|
|
Notice the symmetry: a term has the same form as an expression.
|
|
As a matter of fact, we can add to our parser with a little
|
|
judicious copying and renaming. But to avoid confusion, the
|
|
listing below is the complete set of parsing routines. (Note the
|
|
way we handle the reversal of operands in Divide.)
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
{ Parse and Translate a Math Factor }
|
|
|
|
procedure Factor;
|
|
begin
|
|
EmitLn('MOVE #' + GetNum + ',D0')
|
|
end;
|
|
|
|
|
|
{--------------------------------------------------------------}
|
|
{ Recognize and Translate a Multiply }
|
|
|
|
procedure Multiply;
|
|
begin
|
|
Match('*');
|
|
Factor;
|
|
EmitLn('MULS (SP)+,D0');
|
|
end;
|
|
|
|
|
|
{-------------------------------------------------------------}
|
|
{ Recognize and Translate a Divide }
|
|
|
|
procedure Divide;
|
|
begin
|
|
Match('/');
|
|
Factor;
|
|
EmitLn('MOVE (SP)+,D1');
|
|
EmitLn('DIVS D1,D0');
|
|
end;
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
{ Parse and Translate a Math Term }
|
|
|
|
procedure Term;
|
|
begin
|
|
Factor;
|
|
while Look in ['*', '/'] do begin
|
|
EmitLn('MOVE D0,-(SP)');
|
|
case Look of
|
|
'*': Multiply;
|
|
'/': Divide;
|
|
else Expected('Mulop');
|
|
end;
|
|
end;
|
|
end;
|
|
|
|
|
|
|
|
|
|
{--------------------------------------------------------------}
|
|
{ Recognize and Translate an Add }
|
|
|
|
procedure Add;
|
|
begin
|
|
Match('+');
|
|
Term;
|
|
EmitLn('ADD (SP)+,D0');
|
|
end;
|
|
|
|
|
|
{-------------------------------------------------------------}
|
|
{ Recognize and Translate a Subtract }
|
|
|
|
procedure Subtract;
|
|
begin
|
|
Match('-');
|
|
Term;
|
|
EmitLn('SUB (SP)+,D0');
|
|
EmitLn('NEG D0');
|
|
end;
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
{ Parse and Translate an Expression }
|
|
|
|
procedure Expression;
|
|
begin
|
|
Term;
|
|
while Look in ['+', '-'] do begin
|
|
EmitLn('MOVE D0,-(SP)');
|
|
case Look of
|
|
'+': Add;
|
|
'-': Subtract;
|
|
else Expected('Addop');
|
|
end;
|
|
end;
|
|
end;
|
|
{--------------------------------------------------------------}
|
|
|
|
|
|
Hot dog! A NEARLY functional parser/translator, in only 55 lines
|
|
of Pascal! The output is starting to look really useful, if you
|
|
continue to overlook the inefficiency, which I hope you will.
|
|
Remember, we're not trying to produce tight code here.
|
|
|
|
|
|
PARENTHESES
|
|
|
|
We can wrap up this part of the parser with the addition of
|
|
parentheses with math expressions. As you know, parentheses are
|
|
a mechanism to force a desired operator precedence. So, for
|
|
example, in the expression
|
|
|
|
2*(3+4) ,
|
|
|
|
the parentheses force the addition before the multiply. Much
|
|
more importantly, though, parentheses give us a mechanism for
|
|
defining expressions of any degree of complexity, as in
|
|
|
|
(1+2)/((3+4)+(5-6))
|
|
|
|
The key to incorporating parentheses into our parser is to
|
|
realize that no matter how complicated an expression enclosed by
|
|
parentheses may be, to the rest of the world it looks like a
|
|
simple factor. That is, one of the forms for a factor is:
|
|
|
|
<factor> ::= (<expression>)
|
|
|
|
This is where the recursion comes in. An expression can contain a
|
|
factor which contains another expression which contains a factor,
|
|
etc., ad infinitum.
|
|
|
|
Complicated or not, we can take care of this by adding just a few
|
|
lines of Pascal to procedure Factor:
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
{ Parse and Translate a Math Factor }
|
|
|
|
procedure Expression; Forward;
|
|
|
|
procedure Factor;
|
|
begin
|
|
if Look = '(' then begin
|
|
Match('(');
|
|
Expression;
|
|
Match(')');
|
|
end
|
|
else
|
|
EmitLn('MOVE #' + GetNum + ',D0');
|
|
end;
|
|
{--------------------------------------------------------------}
|
|
|
|
|
|
Note again how easily we can extend the parser, and how well the
|
|
Pascal code matches the BNF syntax.
|
|
|
|
As usual, compile the new version and make sure that it correctly
|
|
parses legal sentences, and flags illegal ones with an error
|
|
message.
|
|
|
|
|
|
UNARY MINUS
|
|
|
|
At this point, we have a parser that can handle just about any
|
|
expression, right? OK, try this input sentence:
|
|
|
|
-1
|
|
|
|
WOOPS! It doesn't work, does it? Procedure Expression expects
|
|
everything to start with an integer, so it coughs up the leading
|
|
minus sign. You'll find that +3 won't work either, nor will
|
|
something like
|
|
|
|
-(3-2) .
|
|
|
|
There are a couple of ways to fix the problem. The easiest
|
|
(although not necessarily the best) way is to stick an imaginary
|
|
leading zero in front of expressions of this type, so that -3
|
|
becomes 0-3. We can easily patch this into our existing version
|
|
of Expression:
|
|
|
|
|
|
|
|
{---------------------------------------------------------------}
|
|
{ Parse and Translate an Expression }
|
|
|
|
procedure Expression;
|
|
begin
|
|
if IsAddop(Look) then
|
|
EmitLn('CLR D0')
|
|
else
|
|
Term;
|
|
while IsAddop(Look) do begin
|
|
EmitLn('MOVE D0,-(SP)');
|
|
case Look of
|
|
'+': Add;
|
|
'-': Subtract;
|
|
else Expected('Addop');
|
|
end;
|
|
end;
|
|
end;
|
|
{--------------------------------------------------------------}
|
|
|
|
|
|
I TOLD you that making changes was easy! This time it cost us
|
|
only three new lines of Pascal. Note the new reference to
|
|
function IsAddop. Since the test for an addop appeared twice, I
|
|
chose to embed it in the new function. The form of IsAddop
|
|
should be apparent from that for IsAlpha. Here it is:
|
|
|
|
|
|
{--------------------------------------------------------------}
|
|
{ Recognize an Addop }
|
|
|
|
function IsAddop(c: char): boolean;
|
|
begin
|
|
IsAddop := c in ['+', '-'];
|
|
end;
|
|
{--------------------------------------------------------------}
|
|
|
|
|
|
OK, make these changes to the program and recompile. You should
|
|
also include IsAddop in your baseline copy of the cradle. We'll
|
|
be needing it again later. Now try the input -1 again. Wow!
|
|
The efficiency of the code is pretty poor ... six lines of code
|
|
just for loading a simple constant ... but at least it's correct.
|
|
Remember, we're not trying to replace Turbo Pascal here.
|
|
|
|
At this point we're just about finished with the structure of our
|
|
expression parser. This version of the program should correctly
|
|
parse and compile just about any expression you care to throw at
|
|
it. It's still limited in that we can only handle factors
|
|
involving single decimal digits. But I hope that by now you're
|
|
starting to get the message that we can accomodate further
|
|
extensions with just some minor changes to the parser. You
|
|
probably won't be surprised to hear that a variable or even a
|
|
function call is just another kind of a factor.
|
|
|
|
In the next session, I'll show you just how easy it is to extend
|
|
our parser to take care of these things too, and I'll also show
|
|
you just how easily we can accomodate multicharacter numbers and
|
|
variable names. So you see, we're not far at all from a truly
|
|
useful parser.
|
|
|
|
|
|
|
|
|
|
A WORD ABOUT OPTIMIZATION
|
|
|
|
Earlier in this session, I promised to give you some hints as to
|
|
how we can improve the quality of the generated code. As I said,
|
|
the production of tight code is not the main purpose of this
|
|
series of articles. But you need to at least know that we aren't
|
|
just wasting our time here ... that we can indeed modify the
|
|
parser further to make it produce better code, without throwing
|
|
away everything we've done to date. As usual, it turns out that
|
|
SOME optimization is not that difficult to do ... it simply takes
|
|
some extra code in the parser.
|
|
|
|
There are two basic approaches we can take:
|
|
|
|
o Try to fix up the code after it's generated
|
|
|
|
This is the concept of "peephole" optimization. The general
|
|
idea it that we know what combinations of instructions the
|
|
compiler is going to generate, and we also know which ones
|
|
are pretty bad (such as the code for -1, above). So all we
|
|
do is to scan the produced code, looking for those
|
|
combinations, and replacing them by better ones. It's sort
|
|
of a macro expansion, in reverse, and a fairly
|
|
straightforward exercise in pattern-matching. The only
|
|
complication, really, is that there may be a LOT of such
|
|
combinations to look for. It's called peephole optimization
|
|
simply because it only looks at a small group of instructions
|
|
at a time. Peephole optimization can have a dramatic effect
|
|
on the quality of the code, with little change to the
|
|
structure of the compiler itself. There is a price to pay,
|
|
though, in both the speed, size, and complexity of the
|
|
compiler. Looking for all those combinations calls for a lot
|
|
of IF tests, each one of which is a source of error. And, of
|
|
course, it takes time.
|
|
|
|
In the classical implementation of a peephole optimizer,
|
|
it's done as a second pass to the compiler. The output code
|
|
is written to disk, and then the optimizer reads and
|
|
processes the disk file again. As a matter of fact, you can
|
|
see that the optimizer could even be a separate PROGRAM from
|
|
the compiler proper. Since the optimizer only looks at the
|
|
code through a small "window" of instructions (hence the
|
|
name), a better implementation would be to simply buffer up a
|
|
few lines of output, and scan the buffer after each EmitLn.
|
|
|
|
o Try to generate better code in the first place
|
|
|
|
This approach calls for us to look for special cases BEFORE
|
|
we Emit them. As a trivial example, we should be able to
|
|
identify a constant zero, and Emit a CLR instead of a load,
|
|
or even do nothing at all, as in an add of zero, for example.
|
|
Closer to home, if we had chosen to recognize the unary minus
|
|
in Factor instead of in Expression, we could treat constants
|
|
like -1 as ordinary constants, rather then generating them
|
|
from positive ones. None of these things are difficult to
|
|
deal with ... they only add extra tests in the code, which is
|
|
why I haven't included them in our program. The way I see
|
|
it, once we get to the point that we have a working compiler,
|
|
generating useful code that executes, we can always go back
|
|
and tweak the thing to tighten up the code produced. That's
|
|
why there are Release 2.0's in the world.
|
|
|
|
There IS one more type of optimization worth mentioning, that
|
|
seems to promise pretty tight code without too much hassle. It's
|
|
my "invention" in the sense that I haven't seen it suggested in
|
|
print anywhere, though I have no illusions that it's original
|
|
with me.
|
|
|
|
This is to avoid such a heavy use of the stack, by making better
|
|
use of the CPU registers. Remember back when we were doing only
|
|
addition and subtraction, that we used registers D0 and D1,
|
|
rather than the stack? It worked, because with only those two
|
|
operations, the "stack" never needs more than two entries.
|
|
|
|
Well, the 68000 has eight data registers. Why not use them as a
|
|
privately managed stack? The key is to recognize that, at any
|
|
point in its processing, the parser KNOWS how many items are on
|
|
the stack, so it can indeed manage it properly. We can define a
|
|
private "stack pointer" that keeps track of which stack level
|
|
we're at, and addresses the corresponding register. Procedure
|
|
Factor, for example, would not cause data to be loaded into
|
|
register D0, but into whatever the current "top-of-stack"
|
|
register happened to be.
|
|
|
|
What we're doing in effect is to replace the CPU's RAM stack with
|
|
a locally managed stack made up of registers. For most
|
|
expressions, the stack level will never exceed eight, so we'll
|
|
get pretty good code out. Of course, we also have to deal with
|
|
those odd cases where the stack level DOES exceed eight, but
|
|
that's no problem either. We simply let the stack spill over
|
|
into the CPU stack. For levels beyond eight, the code is no
|
|
worse than what we're generating now, and for levels less than
|
|
eight, it's considerably better.
|
|
|
|
For the record, I have implemented this concept, just to make
|
|
sure it works before I mentioned it to you. It does. In
|
|
practice, it turns out that you can't really use all eight levels
|
|
... you need at least one register free to reverse the operand
|
|
order for division (sure wish the 68000 had an XTHL, like the
|
|
8080!). For expressions that include function calls, we would
|
|
also need a register reserved for them. Still, there is a nice
|
|
improvement in code size for most expressions.
|
|
|
|
So, you see, getting better code isn't that difficult, but it
|
|
does add complexity to the our translator ... complexity we can
|
|
do without at this point. For that reason, I STRONGLY suggest
|
|
that we continue to ignore efficiency issues for the rest of this
|
|
series, secure in the knowledge that we can indeed improve the
|
|
code quality without throwing away what we've done.
|
|
|
|
Next lesson, I'll show you how to deal with variables factors and
|
|
function calls. I'll also show you just how easy it is to handle
|
|
multicharacter tokens and embedded white space.
|
|
|
|
*****************************************************************
|
|
* *
|
|
* COPYRIGHT NOTICE *
|
|
* *
|
|
* Copyright (C) 1988 Jack W. Crenshaw. All rights reserved. *
|
|
* *
|
|
*****************************************************************
|
|
|
|
|
|
|
|
|