Dodajem knjige
This commit is contained in:
@@ -0,0 +1,821 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
LET'S BUILD A COMPILER!
|
||||
|
||||
By
|
||||
|
||||
Jack W. Crenshaw, Ph.D.
|
||||
|
||||
16 April 1989
|
||||
|
||||
|
||||
Part IX: A TOP VIEW
|
||||
|
||||
|
||||
*****************************************************************
|
||||
* *
|
||||
* COPYRIGHT NOTICE *
|
||||
* *
|
||||
* Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. *
|
||||
* *
|
||||
*****************************************************************
|
||||
|
||||
|
||||
INTRODUCTION
|
||||
|
||||
In the previous installments, we have learned many of the
|
||||
techniques required to build a full-blown compiler. We've done
|
||||
both assignment statements (with Boolean and arithmetic
|
||||
expressions), relational operators, and control constructs. We
|
||||
still haven't addressed procedure or function calls, but even so
|
||||
we could conceivably construct a mini-language without them.
|
||||
I've always thought it would be fun to see just how small a
|
||||
language one could build that would still be useful. We're
|
||||
ALMOST in a position to do that now. The problem is: though we
|
||||
know how to parse and translate the constructs, we still don't
|
||||
know quite how to put them all together into a language.
|
||||
|
||||
In those earlier installments, the development of our programs
|
||||
had a decidedly bottom-up flavor. In the case of expression
|
||||
parsing, for example, we began with the very lowest level
|
||||
constructs, the individual constants and variables, and worked
|
||||
our way up to more complex expressions.
|
||||
|
||||
Most people regard the top-down design approach as being better
|
||||
than the bottom-up one. I do too, but the way we did it
|
||||
certainly seemed natural enough for the kinds of things we were
|
||||
parsing.
|
||||
|
||||
You mustn't get the idea, though, that the incremental approach
|
||||
that we've been using in all these tutorials is inherently
|
||||
bottom-up. In this installment I'd like to show you that the
|
||||
approach can work just as well when applied from the top down ...
|
||||
maybe better. We'll consider languages such as C and Pascal, and
|
||||
see how complete compilers can be built starting from the top.
|
||||
|
||||
In the next installment, we'll apply the same technique to build
|
||||
a complete translator for a subset of the KISS language, which
|
||||
I'll be calling TINY. But one of my goals for this series is
|
||||
that you will not only be able to see how a compiler for TINY or
|
||||
KISS works, but that you will also be able to design and build
|
||||
compilers for your own languages. The C and Pascal examples will
|
||||
help. One thing I'd like you to see is that the natural
|
||||
structure of the compiler depends very much on the language being
|
||||
translated, so the simplicity and ease of construction of the
|
||||
compiler depends very much on letting the language set the
|
||||
program structure.
|
||||
|
||||
It's a bit much to produce a full C or Pascal compiler here, and
|
||||
we won't try. But we can flesh out the top levels far enough so
|
||||
that you can see how it goes.
|
||||
|
||||
Let's get started.
|
||||
|
||||
|
||||
THE TOP LEVEL
|
||||
|
||||
One of the biggest mistakes people make in a top-down design is
|
||||
failing to start at the true top. They think they know what the
|
||||
overall structure of the design should be, so they go ahead and
|
||||
write it down.
|
||||
|
||||
Whenever I start a new design, I always like to do it at the
|
||||
absolute beginning. In program design language (PDL), this top
|
||||
level looks something like:
|
||||
|
||||
|
||||
begin
|
||||
solve the problem
|
||||
end
|
||||
|
||||
|
||||
OK, I grant you that this doesn't give much of a hint as to what
|
||||
the next level is, but I like to write it down anyway, just to
|
||||
give me that warm feeling that I am indeed starting at the top.
|
||||
|
||||
For our problem, the overall function of a compiler is to compile
|
||||
a complete program. Any definition of the language, written in
|
||||
BNF, begins here. What does the top level BNF look like? Well,
|
||||
that depends quite a bit on the language to be translated. Let's
|
||||
take a look at Pascal.
|
||||
|
||||
|
||||
THE STRUCTURE OF PASCAL
|
||||
|
||||
Most texts for Pascal include a BNF or "railroad-track"
|
||||
definition of the language. Here are the first few lines of one:
|
||||
|
||||
|
||||
<program> ::= <program-header> <block> '.'
|
||||
|
||||
<program-header> ::= PROGRAM <ident>
|
||||
|
||||
<block> ::= <declarations> <statements>
|
||||
|
||||
|
||||
We can write recognizers to deal with each of these elements,
|
||||
just as we've done before. For each one, we'll use our familiar
|
||||
single-character tokens to represent the input, then flesh things
|
||||
out a little at a time. Let's begin with the first recognizer:
|
||||
the program itself.
|
||||
|
||||
To translate this, we'll start with a fresh copy of the Cradle.
|
||||
Since we're back to single-character names, we'll just use a 'p'
|
||||
to stand for 'PROGRAM.'
|
||||
|
||||
To a fresh copy of the cradle, add the following code, and insert
|
||||
a call to it from the main program:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Parse and Translate A Program }
|
||||
|
||||
procedure Prog;
|
||||
var Name: char;
|
||||
begin
|
||||
Match('p'); { Handles program header part }
|
||||
Name := GetName;
|
||||
Prolog(Name);
|
||||
Match('.');
|
||||
Epilog(Name);
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
The procedures Prolog and Epilog perform whatever is required to
|
||||
let the program interface with the operating system, so that it
|
||||
can execute as a program. Needless to say, this part will be
|
||||
VERY OS-dependent. Remember, I've been emitting code for a 68000
|
||||
running under the OS I use, which is SK*DOS. I realize most of
|
||||
you are using PC's and would rather see something else, but I'm
|
||||
in this thing too deep to change now!
|
||||
|
||||
Anyhow, SK*DOS is a particularly easy OS to interface to. Here
|
||||
is the code for Prolog and Epilog:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Write the Prolog }
|
||||
|
||||
procedure Prolog;
|
||||
begin
|
||||
EmitLn('WARMST EQU $A01E');
|
||||
end;
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Write the Epilog }
|
||||
|
||||
procedure Epilog(Name: char);
|
||||
begin
|
||||
EmitLn('DC WARMST');
|
||||
EmitLn('END ' + Name);
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
As usual, add this code and try out the "compiler." At this
|
||||
point, there is only one legal input:
|
||||
|
||||
|
||||
px. (where x is any single letter, the program name)
|
||||
|
||||
|
||||
Well, as usual our first effort is rather unimpressive, but by
|
||||
now I'm sure you know that things will get more interesting.
|
||||
There is one important thing to note: THE OUTPUT IS A WORKING,
|
||||
COMPLETE, AND EXECUTABLE PROGRAM (at least after it's assembled).
|
||||
|
||||
This is very important. The nice feature of the top-down
|
||||
approach is that at any stage you can compile a subset of the
|
||||
complete language and get a program that will run on the target
|
||||
machine. From here on, then, we need only add features by
|
||||
fleshing out the language constructs. It's all very similar to
|
||||
what we've been doing all along, except that we're approaching it
|
||||
from the other end.
|
||||
|
||||
|
||||
FLESHING IT OUT
|
||||
|
||||
To flesh out the compiler, we only have to deal with language
|
||||
features one by one. I like to start with a stub procedure that
|
||||
does nothing, then add detail in incremental fashion. Let's
|
||||
begin by processing a block, in accordance with its PDL above.
|
||||
We can do this in two stages. First, add the null procedure:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Parse and Translate a Pascal Block }
|
||||
|
||||
procedure DoBlock(Name: char);
|
||||
begin
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
and modify Prog to read:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Parse and Translate A Program }
|
||||
|
||||
procedure Prog;
|
||||
var Name: char;
|
||||
begin
|
||||
Match('p');
|
||||
Name := GetName;
|
||||
Prolog;
|
||||
DoBlock(Name);
|
||||
Match('.');
|
||||
Epilog(Name);
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
That certainly shouldn't change the behavior of the program, and
|
||||
it doesn't. But now the definition of Prog is complete, and we
|
||||
can proceed to flesh out DoBlock. That's done right from its BNF
|
||||
definition:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Parse and Translate a Pascal Block }
|
||||
|
||||
procedure DoBlock(Name: char);
|
||||
begin
|
||||
Declarations;
|
||||
PostLabel(Name);
|
||||
Statements;
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
The procedure PostLabel was defined in the installment on
|
||||
branches. Copy it into your cradle.
|
||||
|
||||
I probably need to explain the reason for inserting the label
|
||||
where I have. It has to do with the operation of SK*DOS. Unlike
|
||||
some OS's, SK*DOS allows the entry point to the main program to
|
||||
be anywhere in the program. All you have to do is to give that
|
||||
point a name. The call to PostLabel puts that name just before
|
||||
the first executable statement in the main program. How does
|
||||
SK*DOS know which of the many labels is the entry point, you ask?
|
||||
It's the one that matches the END statement at the end of the
|
||||
program.
|
||||
|
||||
OK, now we need stubs for the procedures Declarations and
|
||||
Statements. Make them null procedures as we did before.
|
||||
|
||||
Does the program still run the same? Then we can move on to the
|
||||
next stage.
|
||||
|
||||
|
||||
DECLARATIONS
|
||||
|
||||
The BNF for Pascal declarations is:
|
||||
|
||||
|
||||
<declarations> ::= ( <label list> |
|
||||
<constant list> |
|
||||
<type list> |
|
||||
<variable list> |
|
||||
<procedure> |
|
||||
<function> )*
|
||||
|
||||
|
||||
(Note that I'm using the more liberal definition used by Turbo
|
||||
Pascal. In the standard Pascal definition, each of these parts
|
||||
must be in a specific order relative to the rest.)
|
||||
|
||||
As usual, let's let a single character represent each of these
|
||||
declaration types. The new form of Declarations is:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Parse and Translate the Declaration Part }
|
||||
|
||||
procedure Declarations;
|
||||
begin
|
||||
while Look in ['l', 'c', 't', 'v', 'p', 'f'] do
|
||||
case Look of
|
||||
'l': Labels;
|
||||
'c': Constants;
|
||||
't': Types;
|
||||
'v': Variables;
|
||||
'p': DoProcedure;
|
||||
'f': DoFunction;
|
||||
end;
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
Of course, we need stub procedures for each of these declaration
|
||||
types. This time, they can't quite be null procedures, since
|
||||
otherwise we'll end up with an infinite While loop. At the very
|
||||
least, each recognizer must eat the character that invokes it.
|
||||
Insert the following procedures:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process Label Statement }
|
||||
|
||||
procedure Labels;
|
||||
begin
|
||||
Match('l');
|
||||
end;
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process Const Statement }
|
||||
|
||||
procedure Constants;
|
||||
begin
|
||||
Match('c');
|
||||
end;
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process Type Statement }
|
||||
procedure Types;
|
||||
begin
|
||||
Match('t');
|
||||
end;
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process Var Statement }
|
||||
|
||||
procedure Variables;
|
||||
begin
|
||||
Match('v');
|
||||
end;
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process Procedure Definition }
|
||||
|
||||
procedure DoProcedure;
|
||||
begin
|
||||
Match('p');
|
||||
end;
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process Function Definition }
|
||||
|
||||
procedure DoFunction;
|
||||
begin
|
||||
Match('f');
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
Now try out the compiler with a few representative inputs. You
|
||||
can mix the declarations any way you like, as long as the last
|
||||
character in the program is'.' to indicate the end of the
|
||||
program. Of course, none of the declarations actually declare
|
||||
anything, so you don't need (and can't use) any characters other
|
||||
than those standing for the keywords.
|
||||
|
||||
We can flesh out the statement part in a similar way. The BNF
|
||||
for it is:
|
||||
|
||||
|
||||
<statements> ::= <compound statement>
|
||||
|
||||
<compound statement> ::= BEGIN <statement>
|
||||
(';' <statement>) END
|
||||
|
||||
|
||||
Note that statements can begin with any identifier except END.
|
||||
So the first stub form of procedure Statements is:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Parse and Translate the Statement Part }
|
||||
|
||||
procedure Statements;
|
||||
begin
|
||||
Match('b');
|
||||
while Look <> 'e' do
|
||||
GetChar;
|
||||
Match('e');
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
At this point the compiler will accept any number of
|
||||
declarations, followed by the BEGIN block of the main program.
|
||||
This block itself can contain any characters at all (except an
|
||||
END), but it must be present.
|
||||
|
||||
The simplest form of input is now
|
||||
|
||||
'pxbe.'
|
||||
|
||||
Try it. Also try some combinations of this. Make some
|
||||
deliberate errors and see what happens.
|
||||
|
||||
At this point you should be beginning to see the drill. We begin
|
||||
with a stub translator to process a program, then we flesh out
|
||||
each procedure in turn, based upon its BNF definition. Just as
|
||||
the lower-level BNF definitions add detail and elaborate upon the
|
||||
higher-level ones, the lower-level recognizers will parse more
|
||||
detail of the input program. When the last stub has been
|
||||
expanded, the compiler will be complete. That's top-down
|
||||
design/implementation in its purest form.
|
||||
|
||||
You might note that even though we've been adding procedures, the
|
||||
output of the program hasn't changed. That's as it should be.
|
||||
At these top levels there is no emitted code required. The
|
||||
recognizers are functioning as just that: recognizers. They are
|
||||
accepting input sentences, catching bad ones, and channeling good
|
||||
input to the right places, so they are doing their job. If we
|
||||
were to pursue this a bit longer, code would start to appear.
|
||||
|
||||
The next step in our expansion should probably be procedure
|
||||
Statements. The Pascal definition is:
|
||||
|
||||
|
||||
<statement> ::= <simple statement> | <structured statement>
|
||||
|
||||
<simple statement> ::= <assignment> | <procedure call> | null
|
||||
|
||||
<structured statement> ::= <compound statement> |
|
||||
<if statement> |
|
||||
<case statement> |
|
||||
<while statement> |
|
||||
<repeat statement> |
|
||||
<for statement> |
|
||||
<with statement>
|
||||
|
||||
|
||||
These are starting to look familiar. As a matter of fact, you
|
||||
have already gone through the process of parsing and generating
|
||||
code for both assignment statements and control structures. This
|
||||
is where the top level meets our bottom-up approach of previous
|
||||
sessions. The constructs will be a little different from those
|
||||
we've been using for KISS, but the differences are nothing you
|
||||
can't handle.
|
||||
|
||||
I think you can get the picture now as to the procedure. We
|
||||
begin with a complete BNF description of the language. Starting
|
||||
at the top level, we code up the recognizer for that BNF
|
||||
statement, using stubs for the next-level recognizers. Then we
|
||||
flesh those lower-level statements out one by one.
|
||||
|
||||
As it happens, the definition of Pascal is very compatible with
|
||||
the use of BNF, and BNF descriptions of the language abound.
|
||||
Armed with such a description, you will find it fairly
|
||||
straightforward to continue the process we've begun.
|
||||
|
||||
You might have a go at fleshing a few of these constructs out,
|
||||
just to get a feel for it. I don't expect you to be able to
|
||||
complete a Pascal compiler here ... there are too many things
|
||||
such as procedures and types that we haven't addressed yet ...
|
||||
but it might be helpful to try some of the more familiar ones.
|
||||
It will do you good to see executable programs coming out the
|
||||
other end.
|
||||
|
||||
If I'm going to address those issues that we haven't covered yet,
|
||||
I'd rather do it in the context of KISS. We're not trying to
|
||||
build a complete Pascal compiler just yet, so I'm going to stop
|
||||
the expansion of Pascal here. Let's take a look at a very
|
||||
different language.
|
||||
|
||||
|
||||
THE STRUCTURE OF C
|
||||
|
||||
The C language is quite another matter, as you'll see. Texts on
|
||||
C rarely include a BNF definition of the language. Probably
|
||||
that's because the language is quite hard to write BNF for.
|
||||
|
||||
One reason I'm showing you these structures now is so that I can
|
||||
impress upon you these two facts:
|
||||
|
||||
(1) The definition of the language drives the structure of the
|
||||
compiler. What works for one language may be a disaster for
|
||||
another. It's a very bad idea to try to force a given
|
||||
structure upon the compiler. Rather, you should let the BNF
|
||||
drive the structure, as we have done here.
|
||||
|
||||
(2) A language that is hard to write BNF for will probably be
|
||||
hard to write a compiler for, as well. C is a popular
|
||||
language, and it has a reputation for letting you do
|
||||
virtually anything that is possible to do. Despite the
|
||||
success of Small C, C is _NOT_ an easy language to parse.
|
||||
|
||||
|
||||
A C program has less structure than its Pascal counterpart. At
|
||||
the top level, everything in C is a static declaration, either of
|
||||
data or of a function. We can capture this thought like this:
|
||||
|
||||
|
||||
<program> ::= ( <global declaration> )*
|
||||
|
||||
<global declaration> ::= <data declaration> |
|
||||
<function>
|
||||
|
||||
In Small C, functions can only have the default type int, which
|
||||
is not declared. This makes the input easy to parse: the first
|
||||
token is either "int," "char," or the name of a function. In
|
||||
Small C, the preprocessor commands are also processed by the
|
||||
compiler proper, so the syntax becomes:
|
||||
|
||||
|
||||
<global declaration> ::= '#' <preprocessor command> |
|
||||
'int' <data list> |
|
||||
'char' <data list> |
|
||||
<ident> <function body> |
|
||||
|
||||
|
||||
Although we're really more interested in full C here, I'll show
|
||||
you the code corresponding to this top-level structure for Small
|
||||
C.
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Parse and Translate A Program }
|
||||
|
||||
procedure Prog;
|
||||
begin
|
||||
while Look <> ^Z do begin
|
||||
case Look of
|
||||
'#': PreProc;
|
||||
'i': IntDecl;
|
||||
'c': CharDecl;
|
||||
else DoFunction(Int);
|
||||
end;
|
||||
end;
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
Note that I've had to use a ^Z to indicate the end of the source.
|
||||
C has no keyword such as END or the '.' to otherwise indicate the
|
||||
end.
|
||||
|
||||
With full C, things aren't even this easy. The problem comes
|
||||
about because in full C, functions can also have types. So when
|
||||
the compiler sees a keyword like "int," it still doesn't know
|
||||
whether to expect a data declaration or a function definition.
|
||||
Things get more complicated since the next token may not be a
|
||||
name ... it may start with an '*' or '(', or combinations of the
|
||||
two.
|
||||
|
||||
More specifically, the BNF for full C begins with:
|
||||
|
||||
|
||||
<program> ::= ( <top-level decl> )*
|
||||
|
||||
<top-level decl> ::= <function def> | <data decl>
|
||||
|
||||
<data decl> ::= [<class>] <type> <decl-list>
|
||||
|
||||
<function def> ::= [<class>] [<type>] <function decl>
|
||||
|
||||
|
||||
You can now see the problem: The first two parts of the
|
||||
declarations for data and functions can be the same. Because of
|
||||
the ambiguity in the grammar as written above, it's not a
|
||||
suitable grammar for a recursive-descent parser. Can we
|
||||
transform it into one that is suitable? Yes, with a little work.
|
||||
Suppose we write it this way:
|
||||
|
||||
|
||||
<top-level decl> ::= [<class>] <decl>
|
||||
|
||||
<decl> ::= <type> <typed decl> | <function decl>
|
||||
|
||||
<typed decl> ::= <data list> | <function decl>
|
||||
|
||||
|
||||
We can build a parsing routine for the class and type
|
||||
definitions, and have them store away their findings and go on,
|
||||
without their ever having to "know" whether a function or a data
|
||||
declaration is being processed.
|
||||
|
||||
To begin, key in the following version of the main program:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Main Program }
|
||||
|
||||
begin
|
||||
Init;
|
||||
while Look <> ^Z do begin
|
||||
GetClass;
|
||||
GetType;
|
||||
TopDecl;
|
||||
end;
|
||||
end.
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
For the first round, just make the three procedures stubs that do
|
||||
nothing _BUT_ call GetChar.
|
||||
|
||||
Does this program work? Well, it would be hard put NOT to, since
|
||||
we're not really asking it to do anything. It's been said that a
|
||||
C compiler will accept virtually any input without choking. It's
|
||||
certainly true of THIS compiler, since in effect all it does is
|
||||
to eat input characters until it finds a ^Z.
|
||||
|
||||
Next, let's make GetClass do something worthwhile. Declare the
|
||||
global variable
|
||||
|
||||
|
||||
var Class: char;
|
||||
|
||||
|
||||
and change GetClass to do the following:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Get a Storage Class Specifier }
|
||||
|
||||
Procedure GetClass;
|
||||
begin
|
||||
if Look in ['a', 'x', 's'] then begin
|
||||
Class := Look;
|
||||
GetChar;
|
||||
end
|
||||
else Class := 'a';
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
Here, I've used three single characters to represent the three
|
||||
storage classes "auto," "extern," and "static." These are not
|
||||
the only three possible classes ... there are also "register" and
|
||||
"typedef," but this should give you the picture. Note that the
|
||||
default class is "auto."
|
||||
|
||||
We can do a similar thing for types. Enter the following
|
||||
procedure next:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Get a Type Specifier }
|
||||
|
||||
procedure GetType;
|
||||
begin
|
||||
Typ := ' ';
|
||||
if Look = 'u' then begin
|
||||
Sign := 'u';
|
||||
Typ := 'i';
|
||||
GetChar;
|
||||
end
|
||||
else Sign := 's';
|
||||
if Look in ['i', 'l', 'c'] then begin
|
||||
Typ := Look;
|
||||
GetChar;
|
||||
end;
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
Note that you must add two more global variables, Sign and Typ.
|
||||
|
||||
With these two procedures in place, the compiler will process the
|
||||
class and type definitions and store away their findings. We can
|
||||
now process the rest of the declaration.
|
||||
|
||||
We are by no means out of the woods yet, because there are still
|
||||
many complexities just in the definition of the type, before we
|
||||
even get to the actual data or function names. Let's pretend for
|
||||
the moment that we have passed all those gates, and that the next
|
||||
thing in the input stream is a name. If the name is followed by
|
||||
a left paren, we have a function declaration. If not, we have at
|
||||
least one data item, and possibly a list, each element of which
|
||||
can have an initializer.
|
||||
|
||||
Insert the following version of TopDecl:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process a Top-Level Declaration }
|
||||
|
||||
procedure TopDecl;
|
||||
var Name: char;
|
||||
begin
|
||||
Name := Getname;
|
||||
if Look = '(' then
|
||||
DoFunc(Name)
|
||||
else
|
||||
DoData(Name);
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
(Note that, since we have already read the name, we must pass it
|
||||
along to the appropriate routine.)
|
||||
|
||||
Finally, add the two procedures DoFunc and DoData:
|
||||
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process a Function Definition }
|
||||
|
||||
procedure DoFunc(n: char);
|
||||
begin
|
||||
Match('(');
|
||||
Match(')');
|
||||
Match('{');
|
||||
Match('}');
|
||||
if Typ = ' ' then Typ := 'i';
|
||||
Writeln(Class, Sign, Typ, ' function ', n);
|
||||
end;
|
||||
|
||||
{--------------------------------------------------------------}
|
||||
{ Process a Data Declaration }
|
||||
|
||||
procedure DoData(n: char);
|
||||
begin
|
||||
if Typ = ' ' then Expected('Type declaration');
|
||||
Writeln(Class, Sign, Typ, ' data ', n);
|
||||
while Look = ',' do begin
|
||||
Match(',');
|
||||
n := GetName;
|
||||
WriteLn(Class, Sign, Typ, ' data ', n);
|
||||
end;
|
||||
Match(';');
|
||||
end;
|
||||
{--------------------------------------------------------------}
|
||||
|
||||
|
||||
Since we're still a long way from producing executable code, I
|
||||
decided to just have these two routines tell us what they found.
|
||||
|
||||
OK, give this program a try. For data declarations, it's OK to
|
||||
give a list separated by commas. We can't process initializers
|
||||
as yet. We also can't process argument lists for the functions,
|
||||
but the "(){}" characters should be there.
|
||||
|
||||
We're still a _VERY_ long way from having a C compiler, but what
|
||||
we have is starting to process the right kinds of inputs, and is
|
||||
recognizing both good and bad inputs. In the process, the
|
||||
natural structure of the compiler is starting to take form.
|
||||
|
||||
Can we continue this until we have something that acts more like
|
||||
a compiler. Of course we can. Should we? That's another matter.
|
||||
I don't know about you, but I'm beginning to get dizzy, and we've
|
||||
still got a long way to go to even get past the data
|
||||
declarations.
|
||||
|
||||
At this point, I think you can see how the structure of the
|
||||
compiler evolves from the language definition. The structures
|
||||
we've seen for our two examples, Pascal and C, are as different
|
||||
as night and day. Pascal was designed at least partly to be easy
|
||||
to parse, and that's reflected in the compiler. In general, in
|
||||
Pascal there is more structure and we have a better idea of what
|
||||
kinds of constructs to expect at any point. In C, on the other
|
||||
hand, the program is essentially a list of declarations,
|
||||
terminated only by the end of file.
|
||||
|
||||
We could pursue both of these structures much farther, but
|
||||
remember that our purpose here is not to build a Pascal or a C
|
||||
compiler, but rather to study compilers in general. For those of
|
||||
you who DO want to deal with Pascal or C, I hope I've given you
|
||||
enough of a start so that you can take it from here (although
|
||||
you'll soon need some of the stuff we still haven't covered yet,
|
||||
such as typing and procedure calls). For the rest of you, stay
|
||||
with me through the next installment. There, I'll be leading you
|
||||
through the development of a complete compiler for TINY, a subset
|
||||
of KISS.
|
||||
|
||||
See you then.
|
||||
|
||||
|
||||
*****************************************************************
|
||||
* *
|
||||
* COPYRIGHT NOTICE *
|
||||
* *
|
||||
* Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. *
|
||||
* *
|
||||
*****************************************************************
|
||||
|
||||
Reference in New Issue
Block a user