Arhitektura-i-Kompajleri/Compiler construction/Lets_build_compiler/tutor13.txt



                     LET'S BUILD A COMPILER!

                                By

                     Jack W. Crenshaw, Ph.D.

                          27 August 1989


                      Part XIII: PROCEDURES


*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************


INTRODUCTION

At last we get to the good part!

At  this point we've studied almost all  the  basic  features  of
compilers  and  parsing.    We  have  learned  how  to  translate
arithmetic expressions, Boolean expressions, control  constructs,
data  declarations,  and  I/O  statements.    We  have defined  a
language, TINY 1.3, that embodies all these features, and we have
written  a  rudimentary  compiler that can translate  them.    By
adding some file I/O we could indeed have a working compiler that
could produce executable object files  from  programs  written in
TINY.  With such a compiler, we could write simple  programs that
could read integer data, perform calculations with it, and output
the results.

That's nice, but what we have is still only a  toy  language.  We
can't read or write even a single character of text, and we still
don't have procedures.

It's  the  features  to  be  discussed  in  the  next  couple  of
installments  that  separate  the men from the toys, so to speak.
"Real" languages have more than one data type,  and  they support
procedure calls.  More than any others, it's  these  two features
that give a language much of its character and personality.  Once
we  have  provided   for   them,  our  languages,  TINY  and  its
successors, will cease  to  become  toys  and  will  take  on the
character  of  real  languages,  suitable for serious programming
jobs.

For several installments now, I've been promising you sessions on
these  two  important  subjects.  Each time, other issues came up
that required me to  digress  and deal with them.  Finally, we've
been able to put all those issues to rest and can get on with the
mainstream  of  things.    In   this   installment,   I'll  cover
procedures.  Next time, we'll talk about the basic data types.


ONE LAST DIGRESSION

This has  been an extraordinarily difficult installment for me to
write.  The reason has nothing to do with the subject  itself ...
I've  known  what I wanted to say for some time, and  in  fact  I
presented  most  of  this at Software Development  '89,  back  in
February.  It has more to do with the approach.  Let me explain.

When I first  began  this  series,  I  told you that we would use
several "tricks" to  make  things  easy,  and to let us learn the
concepts without getting too bogged down in the  details.   Among
these tricks was the idea of looking at individual  pieces  of  a
compiler at  a time, i.e. performing experiments using the Cradle
as a base.  When we studied expressions, for  example,  we  dealt
with only that part of compiler theory.  When we  studied control
structures,  we wrote a different program,  still  based  on  the
Cradle, to do that part. We only incorporated these concepts into
a complete language fairly recently. These techniques have served
us very well indeed, and led us to the development of  a compiler
for TINY version 1.3.

When  I  first  began this session, I tried to build upon what we
had already done, and  just  add the new features to the existing
compiler.  That turned out to be a little awkward and  tricky ...
much too much to suit me.

I finally figured out why.  In this series of experiments,  I had
abandoned the very useful techniques that had allowed  us  to get
here, and  without  meaning  to  I  had  switched over into a new
method of  working, that involved incremental changes to the full
TINY compiler.

You  need  to  understand that what we are doing here is a little
unique.  There have been a number of articles, such as  the Small
C articles by Cain and Hendrix, that presented finished compilers
for one language or another.  This is different.  In  this series
of tutorials, you are  watching  me  design  and implement both a
language and a compiler, in real time.

In the experiments that I've been doing in  preparation  for this
article,  I  was  trying to inject  the  changes  into  the  TINY
compiler  in such a way that, at every step, we still had a real,
working  compiler.     In   other  words,  I  was  attempting  an
incremental enhancement of the language and  its  compiler, while
at the same time explaining to you what I was doing.

That's a tough act to pull off!  I finally  realized  that it was
dumb to try.    Having  gotten  this  far using the idea of small
experiments   based   on   single-character  tokens  and  simple,
special-purpose  programs,  I  had  abandoned  them  in  favor of
working with the full compiler.  It wasn't working.

So we're going to go back to our  roots,  so  to  speak.  In this
installment and the next, I'll be  using  single-character tokens
again as we study the concepts of procedures,  unfettered  by the
other baggage  that we have accumulated in the previous sessions.
As a  matter  of  fact,  I won't even attempt, at the end of this
session, to merge the constructs into the TINY  compiler.   We'll
save that for later.

After all this time, you don't need more buildup  than  that,  so
let's waste no more time and dive right in.


THE BASICS

All modern  CPU's provide direct support for procedure calls, and
the  68000  is no exception.  For the 68000, the call  is  a  BSR
(PC-relative version) or JSR, and the return is RTS.  All we have
to do is to arrange for  the  compiler to issue these commands at
the proper place.

Actually, there are really THREE things we have to address.   One
of  them  is  the  call/return  mechanism.    The second  is  the
mechanism  for  DEFINING  the procedure in the first place.  And,
finally, there is the issue of passing parameters  to  the called
procedure.  None of these things are really  very  difficult, and
we can of course borrow heavily on what people have done in other
languages ... there's no need to reinvent the wheel here.  Of the
three issues, that of parameter passing will occupy  most  of our
attention, simply because there are so many options available.


A BASIS FOR EXPERIMENTS

As always, we will need some software to  serve  as  a  basis for
what  we are doing.  We don't need the full TINY compiler, but we
do need enough of a program so that some of the  other constructs
are present.  Specifically, we need at least to be able to handle
statements of some sort, and data declarations.

The program shown below is that basis.  It's a vestigial  form of
TINY, with single-character tokens.   It  has  data declarations,
but only in their simplest form ... no lists or initializers.  It
has assignment statements, but only of the kind

     <ident> = <ident>

In  other  words,  the only legal expression is a single variable
name.    There  are no control  constructs  ...  the  only  legal
statement is the assignment.

Most of the program  is  just the standard Cradle routines.  I've
shown the whole thing here, just to make sure we're  all starting
from the same point:


{--------------------------------------------------------------}
program Calls;

{--------------------------------------------------------------}
{ Constant Declarations }

const TAB = ^I;
      CR  = ^M;
      LF  = ^J;

{--------------------------------------------------------------}
{ Variable Declarations }

var Look: char;              { Lookahead Character }

var ST: Array['A'..'Z'] of char;


{--------------------------------------------------------------}
{ Read New Character From Input Stream }

procedure GetChar;
begin
   Read(Look);
end;

{--------------------------------------------------------------}
{ Report an Error }

procedure Error(s: string);
begin
   WriteLn;
   WriteLn(^G, 'Error: ', s, '.');
end;


{--------------------------------------------------------------}
{ Report Error and Halt }

procedure Abort(s: string);
begin
   Error(s);
   Halt;
end;


{--------------------------------------------------------------}
{ Report What Was Expected }

procedure Expected(s: string);
begin
   Abort(s + ' Expected');
end;


{--------------------------------------------------------------}
{ Report an Undefined Identifier }

procedure Undefined(n: string);
begin
   Abort('Undefined Identifier ' + n);
end;


{--------------------------------------------------------------}
{ Report an Duplicate Identifier }

procedure Duplicate(n: string);
begin
     Abort('Duplicate Identifier ' + n);
end;


{--------------------------------------------------------------}
{ Get Type of Symbol }

function TypeOf(n: char): char;
begin
     TypeOf := ST[n];
end;


{--------------------------------------------------------------}
{ Look for Symbol in Table }

function InTable(n: char): Boolean;
begin
   InTable := ST[n] <> ' ';
end;


{--------------------------------------------------------------}
{ Add a New Symbol to Table }

procedure AddEntry(Name, T: char);
begin
     if Intable(Name) then Duplicate(Name);
     ST[Name] := T;
end;


{--------------------------------------------------------------}
{ Check an Entry to Make Sure It's a Variable }

procedure CheckVar(Name: char);
begin
     if not InTable(Name) then Undefined(Name);
     if  TypeOf(Name)  <>  'v'  then    Abort(Name  +  ' is not a
variable');
end;


{--------------------------------------------------------------}
{ Recognize an Alpha Character }

function IsAlpha(c: char): boolean;
begin
   IsAlpha := upcase(c) in ['A'..'Z'];
end;


{--------------------------------------------------------------}
{ Recognize a Decimal Digit }

function IsDigit(c: char): boolean;
begin
   IsDigit := c in ['0'..'9'];
end;


{--------------------------------------------------------------}
{ Recognize an AlphaNumeric Character }

function IsAlNum(c: char): boolean;
begin
   IsAlNum := IsAlpha(c) or IsDigit(c);
end;


{--------------------------------------------------------------}
{ Recognize an Addop }

function IsAddop(c: char): boolean;
begin
   IsAddop := c in ['+', '-'];
end;


{--------------------------------------------------------------}
{ Recognize a Mulop }

function IsMulop(c: char): boolean;
begin
   IsMulop := c in ['*', '/'];
end;


{--------------------------------------------------------------}
{ Recognize a Boolean Orop }

function IsOrop(c: char): boolean;
begin
   IsOrop := c in ['|', '~'];
end;


{--------------------------------------------------------------}
{ Recognize a Relop }

function IsRelop(c: char): boolean;
begin
   IsRelop := c in ['=', '#', '<', '>'];
end;


{--------------------------------------------------------------}
{ Recognize White Space }

function IsWhite(c: char): boolean;
begin
   IsWhite := c in [' ', TAB];
end;


{--------------------------------------------------------------}
{ Skip Over Leading White Space }

procedure SkipWhite;
begin
   while IsWhite(Look) do
      GetChar;
end;


{--------------------------------------------------------------}
{ Skip Over an End-of-Line }

procedure Fin;
begin
   if Look = CR then begin
      GetChar;
      if Look = LF then
         GetChar;
   end;
end;


{--------------------------------------------------------------}
{ Match a Specific Input Character }

procedure Match(x: char);
begin
   if Look = x then GetChar
     else Expected('''' + x + '''');
     SkipWhite;
end;


{--------------------------------------------------------------}
{ Get an Identifier }

function GetName: char;
begin
   if not IsAlpha(Look) then Expected('Name');
   GetName := UpCase(Look);
     GetChar;
     SkipWhite;
end;


{--------------------------------------------------------------}
{ Get a Number }

function GetNum: char;
begin
   if not IsDigit(Look) then Expected('Integer');
   GetNum := Look;
     GetChar;
     SkipWhite;
end;


{--------------------------------------------------------------}
{ Output a String with Tab }

procedure Emit(s: string);
begin
   Write(TAB, s);
end;


{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }

procedure EmitLn(s: string);
begin
   Emit(s);
   WriteLn;
end;


{--------------------------------------------------------------}
{ Post a Label To Output }

procedure PostLabel(L: string);
begin
   WriteLn(L, ':');
end;


{--------------------------------------------------------------}
{ Load a Variable to the Primary Register }

procedure LoadVar(Name: char);
begin
     CheckVar(Name);
     EmitLn('MOVE ' + Name + '(PC),D0');
end;


{--------------------------------------------------------------}
{ Store the Primary Register }

procedure StoreVar(Name: char);
begin
     CheckVar(Name);
     EmitLn('LEA ' + Name + '(PC),A0');
   EmitLn('MOVE D0,(A0)')
end;


{--------------------------------------------------------------}
{ Initialize }

procedure Init;
var i: char;
begin
     GetChar;
     SkipWhite;
     for i := 'A' to 'Z' do
          ST[i] := ' ';
end;


{--------------------------------------------------------------}
{ Parse and Translate an Expression }
{ Vestigial Version }

procedure Expression;
begin
     LoadVar(GetName);
end;


{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
var Name: char;
begin
     Name := GetName;
     Match('=');
     Expression;
     StoreVar(Name);
end;


{--------------------------------------------------------------}


{ Parse and Translate a Block of Statements }

procedure DoBlock;
begin
     while not(Look in ['e']) do begin
          Assignment;
          Fin;
   end;
end;


{--------------------------------------------------------------}
{ Parse and Translate a Begin-Block }

procedure BeginBlock;
begin
     Match('b');
     Fin;
     DoBlock;
     Match('e');
     Fin;
end;


{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc(N: char);
begin
     if InTable(N) then Duplicate(N);
   ST[N] := 'v';
     WriteLn(N, ':', TAB, 'DC 0');
end;


{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }

procedure Decl;
var Name: char;
begin
   Match('v');
     Alloc(GetName);
end;


{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }

procedure TopDecls;
begin
     while Look <> 'b' do begin
      case Look of
        'v': Decl;
      else Abort('Unrecognized Keyword ' + Look);
          end;
          Fin;
     end;
end;


{--------------------------------------------------------------}
{ Main Program }

begin
     Init;
     TopDecls;
     BeginBlock;
end.
{--------------------------------------------------------------}


Note  that we DO have a symbol table, and there is logic to check
a variable name to make sure it's a legal one.    It's also worth
noting that I  have  included  the  code  you've  seen  before to
provide for white space  and  newlines.    Finally, note that the
main program is delimited, as usual, by BEGIN-END brackets.

Once you've copied  the  program  to  Turbo, the first step is to
compile it and make sure it  works.   Give it a few declarations,
and then a begin-block.  Try something like:


     va             (for VAR A)
     vb             (for VAR B)
     vc             (for VAR C)
     b              (for BEGIN)
     a=b
     b=c
     e.             (for END.)


As usual, you should also make some deliberate errors, and verify
that the program catches them correctly.


DECLARING A PROCEDURE

If you're satisfied that our little program works, then it's time
to  deal  with  the  procedures.  Since we haven't  talked  about


parameters yet, we'll begin by considering  only  procedures that
have no parameter lists.

As a start, let's consider a simple program with a procedure, and
think about the code we'd like to see generated for it:


     PROGRAM FOO;
     .
     .
     PROCEDURE BAR;                     BAR:
     BEGIN                                   .
     .                                       .
     .                                       .
     END;                                    RTS

     BEGIN { MAIN PROGRAM }             MAIN:
     .                                       .
     .                                       .
     FOO;                                    BSR BAR
     .                                       .
     .                                       .
     END.                                    END MAIN


Here I've shown  the  high-order language constructs on the left,
and the desired assembler code on the right.  The first  thing to
notice  is that we certainly don't have  much  code  to  generate
here!  For  the  great  bulk  of  both the procedure and the main
program,  our existing constructs take care of  the  code  to  be
generated.

The key to dealing with the body of the procedure is to recognize
that  although a procedure may be quite  long,  declaring  it  is
really no different than  declaring  a  variable.   It's just one
more kind of declaration.  We can write the BNF:


     <declaration> ::= <data decl> | <procedure>


This means that it should be easy to modify TopDecl to  deal with
procedures.  What about the syntax of a procedure?   Well, here's
a suggested syntax, which is essentially that of Pascal:


     <procedure> ::= PROCEDURE <ident> <begin-block>


There is practically no code generation required, other than that
generated within the begin-block.    We need only emit a label at
the beginning of the procedure, and an RTS at the end.

Here's the required code:

{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }

procedure DoProc;
var N: char;
begin
     Match('p');
     N := GetName;
     Fin;
     if InTable(N) then Duplicate(N);
     ST[N] := 'p';
     PostLabel(N);
     BeginBlock;
     Return;
end;
{--------------------------------------------------------------}


Note that I've added a new code generation routine, Return, which
merely emits an RTS instruction.  The creation of that routine is
"left as an exercise for the student."

To  finish  this  version, add the following line within the Case
statement in DoBlock:


            'p': DoProc;


I should mention that  this  structure  for declarations, and the
BNF that drives it, differs from standard Pascal.  In  the Jensen
& Wirth  definition of Pascal, variable declarations, in fact ALL
kinds of declarations,  must  appear in a specific sequence, i.e.
labels,   constants,  types,  variables,  procedures,  and   main
program.  To  follow  such  a  scheme, we should separate the two
declarations, and have code in the main program something like


     DoVars;
     DoProcs;
     DoMain;


However,  most implementations of Pascal, including Turbo,  don't
require  that  order  and  let  you  freely  mix up  the  various
declarations,  as  long  as  you  still  don't  try to  refer  to
something  before  it's  declared.    Although  it  may  be  more
aesthetically pleasing to declare all the global variables at the
top of the  program,  it  certainly  doesn't do any HARM to allow
them to be sprinkled around.   In  fact,  it may do some GOOD, in
the  sense  that it gives you the  opportunity  to  do  a  little
rudimentary  information  hiding.     Variables  that  should  be
accessed only by the main program, for example,  can  be declared
just before it and will thus be inaccessible by the procedures.

OK, try this new version out.  Note that we  can  declare as many
procedures as we choose (as long  as  we don't run out of single-
character names!), and the  labels  and RTS's all come out in the
right places.

It's  worth  noting  here  that  I  do  _NOT_  allow  for  nested
procedures.   In TINY, all procedures must  be  declared  at  the
global level,  the  same  as  in  C.    There  has  been  quite a
discussion about this point in  the  Computer  Language  Forum of
CompuServe.  It turns out that there is a significant  penalty in
complexity that must be paid for the luxury of nested procedures.
What's  more,  this  penalty gets paid at RUN TIME, because extra
code must be added and executed every time a procedure is called.
I also happen to believe that nesting is not a good  idea, simply
on the grounds that I have seen too many abuses of the feature.
Before going on to the next step, it's also worth noting that the
"main program" as it stands  is incomplete, since it doesn't have
the label and END statement.  Let's fix that little oversight:


{--------------------------------------------------------------}
{ Parse and Translate a Main Program }

procedure DoMain;
begin
     Match('b');
     Fin;
     Prolog;
     DoBlock;
     Epilog;
end;
{--------------------------------------------------------------}
.
.
.
{--------------------------------------------------------------}
{ Main Program }

begin
     Init;
     TopDecls;
     DoMain;
end.
{--------------------------------------------------------------}


Note  that  DoProc  and DoMain are not quite symmetrical.  DoProc
uses a call to BeginBlock, whereas DoMain cannot.  That's because
a procedure  is signaled by the keyword PROCEDURE (abbreviated by
a 'p' here), while the main program gets no  keyword  other  than
the BEGIN itself.

And _THAT_ brings up an interesting question: WHY?

If  we  look  at the structure of C programs, we  find  that  all
functions are treated just  alike,  except  that the main program
happens to be identified by its name, "main."  Since  C functions
can appear in any order, the main program can also be anywhere in
the compilation unit.

In Pascal, on the other hand, all variables  and  procedures must
be declared before they're  used,  which  means  that there is no
point putting anything after the  main program ... it could never
be accessed.  The "main program" is not identified at  all, other
than  being that part of the code that  comes  after  the  global
BEGIN.  In other words, if it ain't anything else, it must be the
main program.

This  causes  no  small  amount   of   confusion   for  beginning
programmers, and for big Pascal programs sometimes it's difficult
to  find the beginning of the main program at all.  This leads to
conventions such as identifying it in comments:


     BEGIN { of MAIN }


This  has  always  seemed  to  me to be a bit of a kludge.    The
question comes up:    Why  should  the main program be treated so
much  differently  than  a  procedure?   In fact, now that  we've
recognized that  procedure declarations are just that ... part of
the global declarations ... isn't  the main program just one more
declaration, also?

The answer is yes, and by  treating  it that way, we can simplify
the code and make  it  considerably  more  orthogonal.  I propose
that  we  use  an explicit keyword, PROGRAM, to identify the main
program (Note that this  means  that we can't start the file with
it, as in Pascal).  In this case, our BNF becomes:


     <declaration> ::= <data decl> | <procedure> | <main program>


     <procedure> ::= PROCEDURE <ident> <begin-block>


     <main program> ::= PROGRAM <ident> <begin-block>


The code  also  looks  much  better,  at  least in the sense that
DoMain and DoProc look more alike:


{--------------------------------------------------------------}
{ Parse and Translate a Main Program }

procedure DoMain;
var N: char;
begin
     Match('P');
     N := GetName;
     Fin;
     if InTable(N) then Duplicate(N);
     Prolog;
     BeginBlock;
end;
{--------------------------------------------------------------}
.
.
.
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }

procedure TopDecls;
begin
     while Look <> '.' do begin
      case Look of
            'v': Decl;
            'p': DoProc;
            'P': DoMain;
          else Abort('Unrecognized Keyword ' + Look);
          end;
          Fin;
     end;
end;


{--------------------------------------------------------------}
{ Main Program }

begin
     Init;
     TopDecls;
     Epilog;
end.
{--------------------------------------------------------------}


Since the declaration of the main program is now within  the loop
of  TopDecl,  that  does  present  some difficulties.  How do  we
ensure that it's  the last thing in the file?  And how do we ever
exit  from  the  loop?  My answer for the second question, as you
can see, was to bring back our old friend the  period.   Once the
parser sees that, we're done.

To  answer  the first question:  it  depends  on  how  far  we're
willing to go to  protect  the programmer from dumb mistakes.  In
the code that I've shown,  there's nothing to keep the programmer
from adding code after  the  main  program  ... even another main
program.   The code will just not be  accessible.    However,  we
COULD access it via a FORWARD statement, which we'll be providing
later. As a  matter  of fact, many assembler language programmers
like to use  the  area  just  after the program to declare large,
uninitialized data blocks, so there may indeed be  some  value in
not  requiring the main program to be last.  We'll leave it as it
is.

If we decide  that  we  should  give the programmer a little more
help than that, it's pretty easy to add some logic to kick us out
of the loop  once  the  main  program  has been processed.  Or we
could  at least flag an error if someone  tries  to  include  two
mains.


CALLING THE PROCEDURE

If you're satisfied that  things  are  working, let's address the
second half of the equation ... the call.

Consider the BNF for a procedure call:


     <proc_call> ::= <identifier>


for an assignment statement, on the other hand, the BNF is:


     <assignment> ::= <identifier> '=' <expression>


At this point we seem to  have  a problem. The two BNF statements
both begin on the  right-hand  side  with the token <identifier>.
How are we supposed to know, when we see the  identifier, whether
we have a procedure call or an assignment statement?   This looks
like a case where our  parser ceases being predictive, and indeed
that's exactly the case.  However, it turns  out  to  be  an easy
problem to fix, since all we have to do is to look at the type of
the identifier, as  recorded  in  the  symbol  table.    As we've
discovered before, a  minor  local  violation  of  the predictive
parsing rule can be easily handled as a special case.

Here's how to do it:


{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment(Name: char);
begin
     Match('=');
     Expression;
     StoreVar(Name);
end;


{--------------------------------------------------------------}
{ Decide if a Statement is an Assignment or Procedure Call }

procedure AssignOrProc;
var Name: char;
begin
     Name := GetName;
     case TypeOf(Name) of
          ' ': Undefined(Name);
          'v': Assignment(Name);
          'p': CallProc(Name);
          else Abort('Identifier ' + Name +
                                   ' Cannot Be Used Here');
     end;
end;


{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }

procedure DoBlock;
begin
     while not(Look in ['e']) do begin
          AssignOrProc;
          Fin;
   end;
end;
{--------------------------------------------------------------}


As you can see, procedure Block now calls AssignOrProc instead of
Assignment.  The function of this new procedure is to simply read
the identifier,  determine  its  type,  and  then  call whichever
procedure  is  appropriate  for  that  type.  Since the name  has
already been read,  we  must  pass  it to the two procedures, and
modify Assignment to match.   Procedure CallProc is a simple code
generation routine:


{--------------------------------------------------------------}
{ Call a Procedure }

procedure CallProc(N: char);
begin
     EmitLn('BSR ' + N);
end;
{--------------------------------------------------------------}


Well,  at  this  point  we  have  a  compiler  that can deal with
procedures.    It's  worth  noting  that   procedures   can  call
procedures to any depth.  So even though we  don't  allow  nested
DECLARATIONS, there  is certainly nothing to keep us from nesting
CALLS, just as  we  would  expect  to  do in any language.  We're
getting there, and it wasn't too hard, was it?

Of course, so far we can  only  deal with procedures that have no
parameters.    The  procedures  can  only operate on  the  global
variables  by  their  global names.  So at this point we have the
equivalent of BASIC's GOSUB construct.  Not too bad ... after all
lots of serious programs were written using GOSUBs, but we can do
better, and we will.  That's the next step.


PASSING PARAMETERS

Again, we all know the basic idea of passed parameters, but let's
review them just to be safe.

In general the procedure is given a parameter list, for example

     PROCEDURE FOO(X, Y, Z)

In  the declaration of a procedure,  the  parameters  are  called
formal  parameters, and may be referred to in  the  body  of  the
procedure  by  those  names.    The  names  used for  the  formal
parameters  are  really  arbitrary.    Only  the  position really
counts.  In  the  example  above,  the name 'X' simply means "the
first parameter" wherever it is used.

When a procedure is called,  the "actual parameters" passed to it
are associated  with  the  formal  parameters,  on  a one-for-one
basis.

The BNF for the syntax looks something like this:


     <procedure> ::= PROCEDURE <ident>
                    '(' <param-list> ')' <begin-block>


     <param_list> ::= <parameter> ( ',' <parameter> )* | null

Similarly, the procedure call looks like:


     <proc call> ::= <ident> '(' <param-list> ')'


Note that there is already an implicit decision  built  into this
syntax.  Some languages, such as Pascal and Ada, permit parameter
lists to be  optional.    If  there are no parameters, you simply
leave off the parens  completely.    Other  languages, like C and
Modula 2, require the parens even if the list is empty.  Clearly,
the example we just finished corresponds to the  former  point of
view.  But to tell the truth I prefer the latter.  For procedures
alone, the  decision would seem to favor the "listless" approach.
The statement


     Initialize; ,


standing alone, can only  mean  a procedure call.  In the parsers
we've  been  writing,  we've  made  heavy  use  of  parameterless
procedures, and it would seem a  shame  to have to write an empty
pair of parens for each case.

But later on we're going to  be  using functions, too.  And since
functions  can  appear  in  the  same  places  as  simple  scalar
identifiers, you can't tell the  difference between the two.  You
have to go  back  to  the  declarations  to find out.  Some folks
consider  this to be an advantage.  Their  argument  is  that  an
identifier gets replaced by a value, and what do you care whether
it's done by  substitution  or  by  a function?  But we sometimes
_DO_ care, because the function may be quite time-consuming.  If,
by  writing  a  simple identifier into a given expression, we can
incur a heavy run-time penalty, it seems to  me  we  ought  to be
made aware of it.

Anyway,  Niklaus  Wirth  designed both Pascal and Modula 2.  I'll
give him the benefit of the doubt and assume that  he  had a good
reason for changing the rules the second time around!

Needless to say, it's an easy thing to accomodate either point of
view as we design a language, so this one is strictly a matter of
personal preference.  Do it whichever way you like best.

Before we go any further, let's alter the translator to  handle a
(possibly empty) parameter list.  For now we  won't  generate any
extra code ... just parse the syntax.  The  code  for  processing
the declaration has very  much  the  same  form we've seen before
when dealing with VAR-lists:


{--------------------------------------------------------------}
{ Process the Formal Parameter List of a Procedure }

procedure FormalList;
begin
     Match('(');
     if Look <> ')' then begin
          FormalParam;
          while Look = ',' do begin
               Match(',');
               FormalParam;
          end;
     end;
     Match(')');
end;
{--------------------------------------------------------------}


Procedure DoProc needs to have a line added to call FormalList:


{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }

procedure DoProc;
var N: char;
begin
     Match('p');
     N := GetName;
     FormalList;
     Fin;
     if InTable(N) then Duplicate(N);
     ST[N] := 'p';
     PostLabel(N);
     BeginBlock;
     Return;
end;
{--------------------------------------------------------------}


For now, the code for FormalParam is just a dummy one that simply
skips the parameter name:


{--------------------------------------------------------------}
{ Process a Formal Parameter }

procedure FormalParam;
var Name:  char;
begin
     Name := GetName;
end;
{--------------------------------------------------------------}


For  the actual procedure call, there must  be  similar  code  to
process the actual parameter list:


{--------------------------------------------------------------}
{ Process an Actual Parameter }

procedure Param;
var Name:  char;
begin
     Name := GetName;
end;


{--------------------------------------------------------------}
{ Process the Parameter List for a Procedure  Call }

procedure ParamList;
begin
     Match('(');
     if Look <> ')' then begin
          Param;
          while Look = ',' do begin
               Match(',');
               Param;
          end;
     end;
     Match(')');
end;


{--------------------------------------------------------------}
{ Process a Procedure Call }

procedure CallProc(Name: char);
begin
     ParamList;
     Call(Name);
end;
{--------------------------------------------------------------}


Note  here  that  CallProc  is  no  longer  just  a  simple  code
generation  routine.  It has some structure to  it.    To  handle
this, I've renamed the code  generation routine to just Call, and
called it from within CallProc.

OK, if you'll add all this code to  your  translator  and  try it
out, you'll find that you can indeed parse the syntax properly.
I'll note in  passing  that  there  is _NO_ checking to make sure
that  the  number  (and,  later,  types)  of  formal  and  actual
parameters match up.  In a production compiler, we must of course
do  this.  We'll ignore the issue now if for no other reason than
that the structure of our  symbol table doesn't currently give us
a place to store the necessary information.  Later on, we'll have
a place for that data and we can deal with the issue then.


THE SEMANTICS OF PARAMETERS

So  far we've dealt with the SYNTAX  of  parameter  passing,  and
we've got the parsing mechanisms in place to handle it.  Next, we
have to look at the SEMANTICS, i.e., the actions to be taken when
we encounter parameters. This brings  us  square  up  against the
issue of the different ways parameters can be passed.

There is more than one way to pass a parameter, and the way we do
it can have a  profound  effect on the character of the language.
So  this is another of those areas where I can't just give you my
solution.  Rather, it's important that we spend some time looking
at the  alternatives  so  that  you  can  go another route if you
choose to.

There are two main ways parameters are passed:

     o By value
     o By reference (address)

The differences are best seen in the light of a little history.

The old FORTRAN compilers passed all parameters by reference.  In
other  words, what was actually passed was  the  address  of  the
parameter.  This meant  that  the  called  subroutine was free to
either read or  write  that  parameter,  as often as it chose to,
just  as though it were a global variable.    This  was  actually
quite an efficient  way  to  do  things, and it was pretty simple
since  the  same  mechanism  was  used  in  all cases,  with  one
exception that I'll get to shortly.

There were problems, though.  Many people felt  that  this method
created entirely too much coupling between the  called subroutine
and  its  caller.    In  effect, it gave the subroutine  complete
access to all variables that appeared in the parameter list.

Many  times,  we  didn't want to actually change a parameter, but
only use it as an input.  For example, we  might  pass an element
count  to a subroutine, and wish we could  then  use  that  count
within a DO-loop.    To  avoid  changing the value in the calling
program, we had to make a local copy of the input  parameter, and
operate only on the  copy.    Some  FORTRAN programmers, in fact,
made it a practice to copy ALL parameters except those  that were
to be used as return values.    Needless to say, all this copying
defeated  a  good  bit  of  the  efficiency  associated with  the
approach.

There was, however, an even more insidious problem, which was not
really just the fault of  the "pass by reference" convention, but
a bad convergence of several implementation decisions.

Suppose we have a subroutine:


     SUBROUTINE FOO(X, Y, N)


where N is some kind of  input  count  or flag.  Many times, we'd
like  to be able to pass a literal or even an expression in place
of a variable, such as:


     CALL FOO(A, B, J + 1)


Here the third  parameter  is  not  a  variable, and so it has no
address.    The  earliest FORTRAN compilers did  not  allow  such
things, so we had to resort to subterfuges like:


     K = J + 1
     CALL FOO(A, B, K)


Here again, there was copying required, and the burden was on the
programmer to do it.  Not good.

Later  FORTRAN  implementations  got  rid  of  this  by  allowing
expressions  as  parameters.   What they  did  was  to  assign  a
compiler-generated variable, store the value of the expression in
the variable, and then pass the address of the expression.

So far, so good.    Even if the subroutine mistakenly altered the
anonymous variable, who was to know  or  care?  On the next call,
it would be recalculated anyway.

The  problem  arose  when  someone  decided to make  things  more
efficient.  They  reasoned,  rightly enough, that the most common
kind of "expression" was a single integer value, as in:


     CALL FOO(A, B, 4)


It seemed inefficient to go to the trouble of "computing" such an
integer and storing it  in  a temporary variable, just to pass it
through  the  calling  list.  Since we had to pass the address of
the  thing  anyway,  it seemed to make lots of sense to just pass
the address of the literal integer, 4 in the example above.

To make matters  more  interesting, most compilers, then and now,
identify all literals and store  them  separately  in  a "literal
pool,"  so that we only have to store one  value  for each unique
literal.    That  combination  of  design  decisions:     passing
expressions, optimization for literals as a special case, and use
of a literal pool, is what led to disaster.

To  see  how  it works, imagine that we call subroutine FOO as in
the example above, passing  it  a literal 4.  Actually, what gets
passed  is  the  address of the literal 4, which is stored in the
literal pool.   This address corresponds to the formal parameter,
K, in the subroutine itself.

Now suppose that, unbeknownst to the  programmer,  subroutine FOO
actually modifies K to be, say, -7.  Suddenly, that literal  4 in
the literal pool  gets  CHANGED,  to  a  -7.  From then on, every
expression that uses  a  4  and  every subroutine that passes a 4
will be using the value of -7 instead!  Needless to say, this can
lead to some  bizarre  and difficult-to-find behavior.  The whole
thing gave  the concept of pass-by-reference a bad name, although
as we have seen, it was really a combination of  design decisions
that led to the problem.

In spite of  the  problem,  the  FORTRAN  approach  had  its good
points.    Chief  among them is the fact that we  don't  have  to
support  multiple  mechanisms.    The  same  scheme,  passing the
address of  the argument, works for EVERY case, including arrays.
So the size of the compiler can be reduced.

Partly because of the FORTRAN  gotcha, and partly just because of
the reduced coupling involved, modern languages  like  C, Pascal,
Ada, and Modula 2 generally pass scalars by value.

This means that the value of the scalar is COPIED into a separate
value  used only for the call.  Since the value passed is a copy,
the called procedure can use it as a local variable and modify it
any way it likes.  The value in the caller will not be changed.

It may seem at first that  this  is a bit inefficient, because of
the need to copy the parameter.  But remember that we're going to
have  to  fetch SOME value to pass  anyway,  whether  it  be  the
parameter  itself  or  an address for it.  Inside the subroutine,
using  pass-by-value  is  definitely  more  efficient,  since  we
eliminate one level of indirection.  Finally, we saw earlier that
with  FORTRAN,  it  was often necessary to make copies within the
subroutine anyway, so pass-by-value reduces the  number  of local
variables.  All in all, pass-by-value is better.

Except for one small little detail:  if all parameters are passed
by value, there is no way for a called to  procedure  to return a
result to its caller!  The parameter passed is NOT altered in the
caller,  only  in  the called procedure.  Clearly, that won't get
the job done.

There  have  been   two   answers  to  this  problem,  which  are
equivalent.   In Pascal, Wirth provides for VAR parameters, which
are  passed-by-reference.    What a VAR parameter is, in fact, is
none other than our old friend the FORTRAN parameter, with  a new
name and paint job for disguise.  Wirth neatly  gets  around  the
"changing a literal"  problem  as  well  as  the  "address  of an
expression" problem, by  the  simple expedient of allowing only a
variable to be the actual parameter.  In other  words,  it's  the
same restriction that the earliest FORTRANs imposed.

C does the same thing, but explicitly.  In  C,  _ALL_  parameters
are passed  by  value.    One  kind  of variable that C supports,
however, is the pointer.  So  by  passing a pointer by value, you
in effect pass what it points to by reference.  In some ways this
works even better yet,  because  even  though  you can change the
variable  pointed to all you like, you  still  CAN'T  change  the
pointer itself.  In a function such as strcpy, for example, where
the  pointers are incremented as the string  is  copied,  we  are
really only incrementing copies of the pointers, so the values of
those  pointers in the calling procedure  still  remain  as  they
were.  To modify a  pointer,  you  must  pass  a  pointer  to the
pointer.

Since we are simply  performing  experiments  here, we'll look at
BOTH pass-by-value and pass-by-reference.    That  way,  we'll be
able to use either one as we need to.  It's worth mentioning that
it's  going  to  be tough to use the C approach to pointers here,
since a pointer is a different type and we haven't  studied types
yet!


PASS-BY-VALUE

Let's just try some simple-minded  things and see where they lead
us.    Let's begin with the pass-by-value  case.    Consider  the
procedure call:


     FOO(X, Y)


Almost the only reasonable way to pass the data  is  through  the
CPU stack.  So the code we'd like  to  see  generated  might look
something like this:


     MOVE X(PC),-(SP)    ; Push X
     MOVE Y(PC),-(SP)    ; Push Y
     BSR FOO             ; Call FOO


That certainly doesn't seem too complex!

When the BSR is executed, the CPU pushes the return  address onto
the stack and jumps to FOO.    At  this point the stack will look
like this:

          .
          .
          Value of X (2 bytes)
          Value of Y (2 bytes)
  SP -->  Return Address (4 bytes)


So the values of  the  parameters  have  addresses that are fixed
offsets from the stack pointer.  In this  example,  the addresses
are:


     X:  6(SP)
     Y:  4(SP)


Now consider what the called procedure might look like:


     PROCEDURE FOO(A, B)
     BEGIN
          A = B
     END

(Remember, the names  of  the formal parameters are arbitrary ...
only the positions count.)

The desired output code might look like:


     FOO: MOVE 4(SP),D0
          MOVE D0,6(SP)
          RTS


Note that, in order to address the formal parameters, we're going
to have to know  which  position they have in the parameter list.
This means some changes to the symbol table stuff.  In  fact, for
our single-character case it's best to just create  a  new symbol
table for the formal parameters.

Let's begin by declaring a new table:


     var Params: Array['A'..'Z'] of integer;


We  also  will  need to keep track of how many parameters a given
procedure has:


     var NumParams: integer;


And we need to initialize the new table.  Now, remember  that the
formal parameter list  will  be different for each procedure that
we process, so we'll need to initialize that table anew  for each
procedure.  Here's the initializer:


{--------------------------------------------------------------}
{ Initialize Parameter Table to Null }

procedure ClearParams;
var i: char;
begin
     for i := 'A' to 'Z' do
          Params[i] := 0;
     NumParams := 0;
end;
{--------------------------------------------------------------}


We'll put a call to this procedure in Init, and  also  at the end
of DoProc:


{--------------------------------------------------------------}
{ Initialize }

procedure Init;
var i: char;
begin
     GetChar;
     SkipWhite;
     for i := 'A' to 'Z' do
          ST[i] := ' ';
     ClearParams;
end;
{--------------------------------------------------------------}
.
.
.
{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }

procedure DoProc;
var N: char;
begin
     Match('p');
     N := GetName;
     FormalList;
     Fin;
     if InTable(N) then Duplicate(N);
     ST[N] := 'p';
     PostLabel(N);
     BeginBlock;
     Return;
     ClearParams;
end;
{--------------------------------------------------------------}


Note that the call  within  DoProc ensures that the table will be
clear when we're in the main program.


OK, now  we  need  a  few procedures to work with the table.  The
next few functions are  essentially  copies  of  InTable, TypeOf,
etc.:


{--------------------------------------------------------------}
{ Find the Parameter Number }

function ParamNumber(N: char): integer;
begin
     ParamNumber := Params[N];
end;


{--------------------------------------------------------------}
{ See if an Identifier is a Parameter }

function IsParam(N: char): boolean;
begin
     IsParam := Params[N] <> 0;
end;


{--------------------------------------------------------------}
{ Add a New Parameter to Table }

procedure AddParam(Name: char);
begin
     if IsParam(Name) then Duplicate(Name);
     Inc(NumParams);
     Params[Name] := NumParams;
end;
{--------------------------------------------------------------}


Finally, we need some code generation routines:


{--------------------------------------------------------------}
{ Load a Parameter to the Primary Register }

procedure LoadParam(N: integer);
var Offset: integer;
begin
     Offset := 4 + 2 * (NumParams - N);
     Emit('MOVE ');
     WriteLn(Offset, '(SP),D0');
end;


{--------------------------------------------------------------}
{ Store a Parameter from the Primary Register }

procedure StoreParam(N: integer);
var Offset: integer;
begin
     Offset := 4 + 2 * (NumParams - N);
     Emit('MOVE D0,');
     WriteLn(Offset, '(SP)');
end;


{--------------------------------------------------------------}
{ Push The Primary Register to the Stack }

procedure Push;
begin
     EmitLn('MOVE D0,-(SP)');
end;
{--------------------------------------------------------------}


( The last routine is one we've seen  before,  but  it  wasn't in
this vestigial version of the program.)

With those preliminaries in place, we're ready to  deal  with the
semantics of procedures with calling lists (remember, the code to
deal with the syntax is already in place).

Let's begin by processing a formal parameter.  All we have  to do
is to add each parameter to the parameter symbol table:


{--------------------------------------------------------------}
{ Process a Formal Parameter }

procedure FormalParam;
begin
     AddParam(GetName);
end;
{--------------------------------------------------------------}


Now, what about dealing with a formal parameter  when  it appears
in the body of the procedure?  That takes a little more work.  We
must first determine that it IS a formal parameter.  To  do this,
I've written a modified version of TypeOf:


{--------------------------------------------------------------}
{ Get Type of Symbol }

function TypeOf(n: char): char;
begin
     if IsParam(n) then
          TypeOf := 'f'
     else
          TypeOf := ST[n];
end;
{--------------------------------------------------------------}


(Note that, since  TypeOf  now  calls  IsParam, it may need to be
relocated in your source.)

We also must modify AssignOrProc to deal with this new type:


{--------------------------------------------------------------}
{ Decide if a Statement is an Assignment or Procedure Call }

procedure AssignOrProc;
var Name: char;
begin
     Name := GetName;
     case TypeOf(Name) of
          ' ': Undefined(Name);
          'v', 'f': Assignment(Name);
          'p': CallProc(Name);
          else  Abort('Identifier ' + Name +  '  Cannot  Be  Used
Here');
     end;
end;
{--------------------------------------------------------------}


Finally,  the  code  to process an assignment  statement  and  an
expression must be extended:


{--------------------------------------------------------------}
{ Parse and Translate an Expression }
{ Vestigial Version }

procedure Expression;
var Name: char;
begin
     Name := GetName;
     if IsParam(Name) then
          LoadParam(ParamNumber(Name))
     else
          LoadVar(Name);
end;


{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment(Name: char);
begin
     Match('=');
     Expression;
     if IsParam(Name) then
          StoreParam(ParamNumber(Name))
     else
          StoreVar(Name);
end;
{--------------------------------------------------------------}


As you can see, these procedures will treat  every  variable name
encountered as either a  formal  parameter  or a global variable,
depending  on  whether  or not it appears in the parameter symbol
table.   Remember  that  we  are  using  only a vestigial form of
Expression.  In the  final  program,  the  change shown here will
have to be added to Factor, not Expression.

The rest is easy.  We need only add the  semantics  to the actual
procedure call, which we can do with one new line of code:


{--------------------------------------------------------------}
{ Process an Actual Parameter }

procedure Param;
begin
     Expression;
     Push;
end;
{--------------------------------------------------------------}


That's  it.  Add these changes to your program and give it a try.
Try declaring one or two procedures, each with a formal parameter
list.  Then do some assignments, using combinations of global and
formal  parameters.    You  can  call one procedure  from  within
another, but you cannot DECLARE a nested procedure.  You can even
pass formal parameters from one procedure to another.  If  we had
the  full  syntax  of the language here, you'd also be able to do
things like read  or  write  formal  parameters  or  use  them in
complicated expressions.


WHAT'S WRONG?

At this point, you might be thinking: Surely there's more to this
than a few pushes and  pops.    There  must  be  more  to passing
parameters than this.

You'd  be  right.    As  a  matter  of fact, the code that  we're
generating here leaves a lot to be desired in several respects.

The most glaring oversight is that it's wrong!   If  you'll  look
back at the code for a procedure call, you'll see that the caller
pushes each actual parameter onto the stack before  it  calls the
procedure.  The  procedure  USES that information, but it doesn't
change the stack  pointer.    That  means that the stuff is still
there when we return. SOMEBODY needs to clean up  the  stack,  or
we'll soon be in very hot water!

Fortunately,  that's  easily fixed.  All we  have  to  do  is  to
increment the stack pointer when we're finished.

Should  we  do  that  in  the  calling  program,  or  the  called
procedure?   Some folks let the called  procedure  clean  up  the
stack,  since  that  requires less code to be generated per call,
and since the procedure, after  all,  knows  how  many parameters
it's got.   But  that  means  that  it must do something with the
return address so as not to lose it.

I prefer letting  the  caller  clean  up, so that the callee need
only execute a return.  Also, it seems a bit more balanced, since
the caller is  the  one  who  "messed  up" the stack in the first
place.  But  THAT  means  that  the caller must remember how many
items  it  pushed.    To  make  things  easy, I've  modified  the
procedure  ParamList to be a function  instead  of  a  procedure,
returning the number of bytes pushed:


{--------------------------------------------------------------}
{ Process the Parameter List for a Procedure  Call }

function ParamList: integer;
var N: integer;
begin
     N := 0;
     Match('(');
     if Look <> ')' then begin
          Param;
          inc(N);
          while Look = ',' do begin
               Match(',');
               Param;
               inc(N);
          end;
     end;
     Match(')');
     ParamList := 2 * N;
end;
{--------------------------------------------------------------}


Procedure CallProc then uses this to clean up the stack:


{--------------------------------------------------------------}
{ Process a Procedure Call }

procedure CallProc(Name: char);
var N: integer;
begin
     N := ParamList;
     Call(Name);
     CleanStack(N);
end;
{--------------------------------------------------------------}


Here I've created yet another code generation procedure:


{--------------------------------------------------------------}
{ Adjust the Stack Pointer Upwards by N Bytes }

procedure CleanStack(N: integer);
begin
     if N > 0 then begin
          Emit('ADD #');
          WriteLn(N, ',SP');
     end;
end;
{--------------------------------------------------------------}


OK, if you'll add this code to your compiler, I think you'll find
that the stack is now under control.

The next problem has to do with our way of addressing relative to
the stack pointer.  That works fine in our simple examples, since
with our rudimentary  form  of expressions nobody else is messing
with the stack.  But consider a different example as simple as:


     PROCEDURE FOO(A, B)
     BEGIN
          A = A + B
     END


The code generated by a simple-minded parser might be:


     FOO: MOVE 6(SP),D0       ; Fetch A
          MOVE D0,-(SP)       ; Push it
          MOVE 4(SP),D0       ; Fetch B
          ADD (SP)+,D0        ; Add A
          MOVE D0,6(SP)       : Store A
          RTS


This  would  be  wrong.  When we push the first argument onto the
stack, the offsets for the two formal parameters are no  longer 4
and 6, but are 6 and 8.  So the second fetch would fetch A again,
not B.

This is not  the  end of the world.  I think you can see that all
we really have to do is to alter the offset every  time  we  do a
push, and that in fact is what's done if the  CPU  has no support
for other methods.

Fortunately,   though,   the   68000   does  have  such  support.
Recognizing that this CPU  would  be  used  a lot with high-order
language compilers, Motorola decided to  add  direct  support for
this kind of thing.

The problem, as you  can  see, is that as the procedure executes,
the stack  pointer  bounces  up  and  down,  and so it becomes an
awkward  thing  to  use  as  a  reference  to access  the  formal
parameters.  The solution is to define some _OTHER_ register, and
use  it instead.  This register is typically  set  equal  to  the
original stack pointer, and is called the frame pointer.

The  68000 instruction set LINK lets you  declare  such  a  frame
pointer, and  sets  it  equal  to  the  stack pointer, all in one
instruction.  As a matter of  fact,  it does even more than that.
Since this register may have been in use for  something  else  in
the calling procedure, LINK also pushes the current value of that
register onto the stack.  It  can  also  add a value to the stack
pointer, to make room for local variables.

The complement of LINK is UNLK, which simply  restores  the stack
pointer and pops the old value back into the register.

Using these two  instructions,  the code for the previous example
becomes:


     FOO: LINK A6,#0
          MOVE 10(A6),D0      ; Fetch A
          MOVE D0,-(SP)       ; Push it
          MOVE 8(A6),D0       ; Fetch B
          ADD (SP)+,D0        ; Add A
          MOVE D0,10(A6)      : Store A
          UNLK A6
          RTS


Fixing the compiler to generate this code is a lot easier than it
is  to  explain  it.    All we need to do is to modify  the  code
generation created by DoProc.  Since that makes the code a little
more than one line, I've created new procedures to deal  with it,
paralleling the Prolog and Epilog procedures called by DoMain:


{--------------------------------------------------------------}
{ Write the Prolog for a Procedure }

procedure ProcProlog(N: char);
begin
     PostLabel(N);
     EmitLn('LINK A6,#0');
end;


{--------------------------------------------------------------}
{ Write the Epilog for a Procedure }

procedure ProcEpilog;
begin
     EmitLn('UNLK A6');
     EmitLn('RTS');
end;
{--------------------------------------------------------------}


Procedure DoProc now just calls these:


{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }

procedure DoProc;
var N: char;
begin
     Match('p');
     N := GetName;
     FormalList;
     Fin;
     if InTable(N) then Duplicate(N);
     ST[N] := 'p';
     ProcProlog(N);
     BeginBlock;
     ProcEpilog;
     ClearParams;
end;
{--------------------------------------------------------------}


Finally, we need to  change  the  references  to SP in procedures
LoadParam and StoreParam:


{--------------------------------------------------------------}
{ Load a Parameter to the Primary Register }

procedure LoadParam(N: integer);
var Offset: integer;
begin
     Offset := 8 + 2 * (NumParams - N);
     Emit('MOVE ');
     WriteLn(Offset, '(A6),D0');
end;


{--------------------------------------------------------------}
{ Store a Parameter from the Primary Register }

procedure StoreParam(N: integer);
var Offset: integer;
begin
     Offset := 8 + 2 * (NumParams - N);
     Emit('MOVE D0,');
     WriteLn(Offset, '(A6)');
end;
{--------------------------------------------------------------}


(Note that the Offset computation  changes to allow for the extra
push of A6.)

That's all it takes.  Try this out and see how you like it.

At this point we  are  generating  some  relatively nice code for
procedures and procedure calls.  Within the limitation that there
are no local variables  (yet)  and  that  no procedure nesting is
allowed, this code is just what we need.

There is still just one little small problem remaining:


     WE HAVE NO WAY TO RETURN RESULTS TO THE CALLER!


But  that,  of course, is not a  limitation  of  the  code  we're
generating, but  one  inherent  in  the  call-by-value  protocol.
Notice that we CAN use formal parameters in any  way  inside  the
procedure.  We  can  calculate  new  values for them, use them as
loop counters (if we had loops, that is!), etc.   So  the code is
doing what it's supposed to.   To  get over this last problem, we
need to look at the alternative protocol.


CALL-BY-REFERENCE

This  one is easy, now that we have  the  mechanisms  already  in
place.    We  only  have  to  make  a few  changes  to  the  code
generation.  Instead of  pushing  a value onto the stack, we must
push an address.  As it turns out, the 68000 has  an instruction,
PEA, that does just that.

We'll be  making  a  new  version  of  the test program for this.
Before we do anything else,

>>>> MAKE A COPY <<<<

of  the program as it now stands, because  we'll  be  needing  it
again later.

Let's begin by looking at the code we'd like to see generated for
the new case. Using the same example as before, we need the call


     FOO(X, Y)


to be translated to:


     PEA X(PC)           ; Push the address of X
     PEA Y(PC)           ; Push Y the address of Y
     BSR FOO             ; Call FOO


That's a simple matter of a slight change to Param:


{--------------------------------------------------------------}
{ Process an Actual Parameter }

procedure Param;
begin
     EmitLn('PEA ' + GetName + '(PC)');
end;
{--------------------------------------------------------------}


(Note that with pass-by-reference, we can't  have  expressions in
the calling list, so Param can just read the name directly.)

At the other end, the references to the formal parameters must be
given one level of indirection:


     FOO: LINK A6,#0
          MOVE.L 12(A6),A0    ; Fetch the address of A
          MOVE (A0),D0        ; Fetch A
          MOVE D0,-(SP)       ; Push it
          MOVE.L 8(A6),A0     ; Fetch the address of B
          MOVE (A0),D0        ; Fetch B
          ADD (SP)+,D0        ; Add A
          MOVE.L 12(A6),A0    ; Fetch the address of A
          MOVE D0,(A0)        : Store A
          UNLK A6
          RTS


All  of  this  can  be   handled  by  changes  to  LoadParam  and
StoreParam:


{--------------------------------------------------------------}
{ Load a Parameter to the Primary Register }

procedure LoadParam(N: integer);
var Offset: integer;
begin
     Offset := 8 + 4 * (NumParams - N);
     Emit('MOVE.L ');
     WriteLn(Offset, '(A6),A0');
     EmitLn('MOVE (A0),D0');
end;


{--------------------------------------------------------------}
{ Store a Parameter from the Primary Register }

procedure StoreParam(N: integer);
var Offset: integer;
begin
     Offset := 8 + 4 * (NumParams - N);
     Emit('MOVE.L ');
     WriteLn(Offset, '(A6),A0');
     EmitLn('MOVE D0,(A0)');
end;
{--------------------------------------------------------------}

To  get  the  count  right,  we  must  also  change  one line  in
ParamList:


     ParamList := 4 * N;


That  should  do it.  Give it a try and see  if  it's  generating
reasonable-looking code.  As  you  will  see,  the code is hardly
optimal,  since  we  reload  the  address register every  time  a
parameter  is  needed.    But  that's  consistent  with our  KISS
approach  here,  of  just being sure to generate code that works.
We'll  just  make  a  little  note here, that here's yet  another
candidate for optimization, and press on.

Now we've learned to process parameters  using  pass-by-value and
pass-by-reference.  In the real world, of course, we'd like to be
able  to  deal  with BOTH methods.  We can't do that yet, though,
because we have not yet had a session on types,  and  that has to
come first.

If  we can only have ONE method, then of course it has to be  the
good ol' FORTRAN method of  pass-by-reference,  since  that's the
only way procedures can ever return values to their caller.

This, in fact, will be one of the differences  between  TINY  and
KISS.  In the next version of TINY,  we'll  use pass-by-reference
for all parameters.  KISS will support both methods.


LOCAL VARIABLES

So  far,  we've  said  nothing  about  local  variables, and  our
definition of procedures doesn't allow  for  them.    Needless to
say, that's a big gap in our language, and one  that  needs to be
corrected.

Here again we are faced with a choice: Static or dynamic storage?

In those  old FORTRAN programs, local variables were given static
storage just like global ones.  That is, each local  variable got
a  name  and  allocated address, like any other variable, and was
referenced by that name.

That's easy for us to do, using the allocation mechanisms already
in place.  Remember,  though,  that local variables can have  the
same  names as global ones.  We need to somehow deal with that by
assigning unique names for these variables.

The characteristic of static storage, of course, is that the data
survives  a procedure call and return.   When  the  procedure  is
called  again,  the  data will still be there.  That  can  be  an
advantage in some applications.    In the FORTRAN days we used to
do tricks like initialize a flag, so that you could tell when you
were entering a  procedure  for  the  first time and could do any
one-time initialization that needed to be done.

Of  course,  the  same  "feature"  is also what  makes  recursion
impossible with static storage.  Any new call to a procedure will
overwrite the data already in the local variables.

The alternative is dynamic storage, in which storage is allocated
on the stack just as for passed parameters.    We  also  have the
mechanisms  already  for  doing this.  In fact, the same routines
that  deal with passed (by value) parameters  on  the  stack  can
easily deal  with  local  variables  as  well  ... the code to be
generated  is  the  same.  The purpose of the offset in the 68000
LINK instruction is there just for that reason:  we can use it to
adjust the stack  pointer  to  make  room  for  locals.   Dynamic
storage, of course, inherently supports recursion.

When  I  first  began  planning  TINY,  I  must  admit  to  being
prejudiced in favor of static  storage.    That's  simply because
those old FORTRAN  programs  were pretty darned efficient ... the
early FORTRAN compilers  produced  a quality of code that's still
rarely matched by modern compilers.   Even today, a given program
written  in  FORTRAN  is likely to outperform  the  same  program
written in C or Pascal, sometimes  by  wide margins. (Whew!  Am I
going to hear about THAT statement!)

I've always supposed that the reason had to do with the  two main
differences  between  FORTRAN  implementations  and  the  others:
static  storage  and  pass-by-reference.    I  know  that dynamic
storage  supports  recursion,  but it's always seemed to me a bit
peculiar to be willing to accept slower code in the 95%  of cases
that don't need recursion, just to get that feature when you need
it.  The idea is that, with static storage, you can  use absolute
addressing  rather than indirect addressing, which should  result
in faster code.

More recently, though, several folks  have pointed out to me that
there really is no performance  penalty  associated  with dynamic
storage.  With the 68000, for example, you shouldn't use absolute
addressing  anyway  ...  most  operating systems require position
independent code.  And the 68000 instruction

     MOVE 8(A6),D0

has exactly the same timing as

     MOVE X(PC),D0.

So  I'm  convinced,  now, that there is no good reason NOT to use
dynamic storage.

Since this use of local variables fits so well into the scheme of
pass-by-value  parameters,  we'll  use   that   version   of  the
translator to illustrate it. (I _SURE_ hope you kept a copy!)

The general idea is to keep track of how  many  local  parameters
there  are.    Then we use the integer in the LINK instruction to
adjust the stack pointer downward to make room for them.   Formal
parameters are  addressed  as  positive  offsets  from  the frame
pointer, and locals as negative offsets.  With a  little  bit  of
work, the same procedures we've  already created can take care of
the whole thing.

Let's start by creating a new variable, Base:


     var Base: integer;

We'll use this  variable,  instead of NumParams, to compute stack
offsets.  That means changing  the two references to NumParams in
LoadParam and StoreParam:


{--------------------------------------------------------------}
{ Load a Parameter to the Primary Register }

procedure LoadParam(N: integer);
var Offset: integer;
begin
     Offset := 8 + 2 * (Base - N);
     Emit('MOVE ');
     WriteLn(Offset, '(A6),D0');
end;


{--------------------------------------------------------------}
{ Store a Parameter from the Primary Register }

procedure StoreParam(N: integer);
var Offset: integer;
begin
     Offset := 8 + 2 * (Base - N);
     Emit('MOVE D0,');
     WriteLn(Offset, '(A6)');
end;
{--------------------------------------------------------------}


The idea is that the value of Base will be  frozen  after we have
processed the formal parameters, and  won't  increase  further as
the new, local variables, are inserted in the symbol table.  This
is taken care of at the end of FormalList:


{--------------------------------------------------------------}
{ Process the Formal Parameter List of a Procedure }

procedure FormalList;
begin
     Match('(');
     if Look <> ')' then begin
          FormalParam;
          while Look = ',' do begin
               Match(',');
               FormalParam;
          end;
     end;
     Match(')');
     Fin;
     Base := NumParams;
     NumParams := NumParams + 4;
end;
{--------------------------------------------------------------}


(We add four words to make allowances for the return  address and
old frame pointer, which end up between the formal parameters and
the locals.)

About all we  need  to  do  next  is to install the semantics for
declaring local variables into the parser.  The routines are very
similar to Decl and TopDecls:


{--------------------------------------------------------------}
{ Parse and Translate a Local Data Declaration }

procedure LocDecl;
var Name: char;
begin
   Match('v');
     AddParam(GetName);
     Fin;
end;


{--------------------------------------------------------------}


{ Parse and Translate Local Declarations }

function LocDecls: integer;
var n: integer;
begin
     n := 0;
     while Look = 'v' do begin
          LocDecl;
          inc(n);
     end;
     LocDecls := n;
end;
{--------------------------------------------------------------}


Note that LocDecls is a  FUNCTION, returning the number of locals
to DoProc.

Next, we modify DoProc to use this information:


{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }

procedure DoProc;
var N: char;
      k: integer;
begin
     Match('p');
     N := GetName;
     if InTable(N) then Duplicate(N);
     ST[N] := 'p';
     FormalList;
     k := LocDecls;
     ProcProlog(N, k);
     BeginBlock;
     ProcEpilog;
     ClearParams;
end;
{--------------------------------------------------------------}


(I've  made   a  couple  of  changes  here  that  weren't  really
necessary.  Aside from rearranging things a bit, I moved the call
to  Fin  to  within FormalList, and placed one inside LocDecls as
well.   Don't forget to put one at the end of FormalList, so that
we're together here.)

Note the change in the call  to  ProcProlog.  The new argument is
the number of WORDS (not bytes) to allocate space  for.    Here's
the new version of ProcProlog:


{--------------------------------------------------------------}
{ Write the Prolog for a Procedure }

procedure ProcProlog(N: char; k: integer);
begin
     PostLabel(N);
     Emit('LINK A6,#');
     WriteLn(-2 * k)
end;
{--------------------------------------------------------------}


That should do it.  Add these changes and see how they work.


CONCLUSION

At this point you know  how to compile procedure declarations and
procedure calls,  with  parameters  passed  by  reference  and by
value.  You can also handle local variables.  As you can see, the
hard part is not  in  providing  the  mechanisms, but in deciding
just which mechanisms to use.  Once we make these  decisions, the
code to translate the constructs is really not that difficult.
I didn't  show  you  how  to  deal  with the combination of local
parameters   and  pass-by-reference  parameters,  but  that's   a
straightforward extension to  what  you've already seen.  It just
gets a little more messy, that's all, since we  need  to  support
both mechanisms instead of just one at a  time.    I'd  prefer to
save  that  one  until after we've  dealt  with  ways  to  handle
different variable types.

That will be the next installment, which will be coming soon to a
Forum near you.  See you then.


*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************