Dodajem knjige

2026-05-29 00:39:46 +02:00
parent 34af1ebdc7
commit 4ce48dfb1a
309 changed files with 92526 additions and 0 deletions
@@ -0,0 +1,398 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+                            LET'S BUILD A COMPILER!
+
+                                       By
+
+                            Jack W. Crenshaw, Ph.D.
+
+                                  24 July 1988
+
+
+                              Part I: INTRODUCTION
+
+
+*****************************************************************
+*                                                               *
+*                        COPYRIGHT NOTICE                       *
+*                                                               *
+*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *
+*                                                               *
+*****************************************************************
+
+
+INTRODUCTION
+
+
+This series of articles is a tutorial on the theory  and practice
+of  developing language parsers and compilers.    Before  we  are
+finished,  we  will  have  covered  every   aspect   of  compiler
+construction, designed a new programming  language,  and  built a
+working compiler.
+
+Though I am not a computer scientist by education (my Ph.D. is in
+a different  field, Physics), I have been interested in compilers
+for many years.  I have  bought  and tried to digest the contents
+of virtually every  book  on  the  subject ever written.  I don't
+mind  telling you that it was slow going.    Compiler  texts  are
+written for Computer  Science  majors, and are tough sledding for
+the rest of us.  But over the years a bit of it began to seep in.
+What really caused it to jell was when I began  to  branch off on
+my own and begin to try things on my own computer.  Now I plan to
+share with you what I have  learned.    At the end of this series
+you will by no means be  a  computer scientist, nor will you know
+all the esoterics of  compiler  theory.    I intend to completely
+ignore the more theoretical  aspects  of  the  subject.  What you
+_WILL_ know is all  the  practical aspects that one needs to know
+to build a working system.
+
+This is a "learn-by-doing" series.  In the course of the series I
+will be performing  experiments  on  a  computer.    You  will be
+expected to follow along,  repeating  the  experiments that I do,
+and  performing  some  on your own.  I will be using Turbo Pascal
+4.0 on a PC  clone.   I will periodically insert examples written
+in TP.  These will be executable code, which you will be expected
+to copy into your own computer and run.  If you don't have a copy
+of  Turbo,  you  will be severely limited in how well you will be
+able to follow what's going on.  If you don't have a copy, I urge
+you to get one.  After  all,  it's an excellent product, good for
+many other uses!
+
+Some articles on compilers show you examples, or show you  (as in
+the case of Small-C) a finished product, which you can  then copy
+and  use without a whole lot of understanding of how it works.  I
+hope to do much more  than  that.    I  hope to teach you HOW the
+things get done,  so that you can go off on your own and not only
+reproduce what I have done, but improve on it.
+                              
+This is admittedly an ambitious undertaking, and it won't be done
+in  one page.  I expect to do it in the course  of  a  number  of
+articles.    Each  article will cover a single aspect of compiler
+theory,  and  will  pretty  much  stand  alone.   If  all  you're
+interested in at a given time is one  aspect,  then  you  need to
+look only at that one article.  Each article will be  uploaded as
+it  is complete, so you will have to wait for the last one before
+you can consider yourself finished.  Please be patient.
+
+
+
+The average text on  compiler  theory covers a lot of ground that
+we won't be covering here.  The typical sequence is:
+
+ o An introductory chapter describing what a compiler is.
+
+ o A chapter or two on syntax equations, using Backus-Naur Form
+   (BNF).
+
+ o A chapter or two on lexical scanning, with emphasis on
+   deterministic and non-deterministic finite automata.
+
+ o Several chapters on parsing theory, beginning with top-down
+   recursive descent, and ending with LALR parsers.
+
+ o A chapter on intermediate languages, with emphasis on P-code
+   and similar reverse polish representations.
+
+ o Many chapters on alternative ways to handle subroutines and
+   parameter passing, type declarations, and such.
+
+ o A chapter toward the end on code generation, usually for some
+   imaginary CPU with a simple instruction set.  Most readers
+   (and in fact, most college classes) never make it this far.
+
+ o A final chapter or two on optimization. This chapter often
+   goes unread, too.
+
+
+I'll  be taking a much different approach in  this  series.    To
+begin  with,  I  won't dwell long on options.  I'll be giving you
+_A_ way that works.  If you want  to  explore  options,  well and
+good ...  I  encourage  you  to do so ... but I'll be sticking to
+what I know.   I also will skip over most of the theory that puts
+people  to  sleep.  Don't get me  wrong:  I  don't  belittle  the
+theory, and it's vitally important  when it comes to dealing with
+the more tricky  parts  of  a  given  language.  But I believe in
+putting first things first.    Here we'll be dealing with the 95%
+of compiler techniques that don't need a lot of theory to handle.
+
+I  also  will  discuss only one approach  to  parsing:  top-down,
+recursive descent parsing, which is the  _ONLY_  technique that's
+at  all   amenable  to  hand-crafting  a  compiler.    The  other
+approaches are only useful if you have a tool like YACC, and also
+don't care how much memory space the final product uses.
+                              
+I  also take a page from the work of Ron Cain, the author of  the
+original Small C.  Whereas almost all other compiler authors have
+historically  used  an  intermediate  language  like  P-code  and
+divided  the  compiler  into two parts (a front end that produces
+P-code,  and   a  back  end  that  processes  P-code  to  produce
+executable   object  code),  Ron  showed  us   that   it   is   a
+straightforward  matter  to  make  a  compiler  directly  produce
+executable  object  code,  in  the  form  of  assembler  language
+statements.  The code will _NOT_ be the world's tightest code ...
+producing optimized code is  a  much  more  difficult job. But it
+will work, and work reasonably well.  Just so that I  don't leave
+you with the impression that our end product will be worthless, I
+_DO_ intend to show you how  to  "soup up" the compiler with some
+optimization.
+
+
+
+Finally, I'll be  using  some  tricks  that I've found to be most
+helpful in letting  me  understand what's going on without wading
+through a lot of boiler plate.  Chief among these  is  the use of
+single-character tokens, with no embedded spaces,  for  the early
+design work.  I figure that  if  I  can get a parser to recognize
+and deal with I-T-L, I can  get  it  to do the same with IF-THEN-
+ELSE.  And I can.  In the second "lesson,"   I'll  show  you just
+how easy it  is  to  extend  a  simple parser to handle tokens of
+arbitrary length.  As another  trick,  I  completely  ignore file
+I/O, figuring that  if  I  can  read source from the keyboard and
+output object to the screen, I can also do it from/to disk files.
+Experience  has  proven  that  once  a   translator   is  working
+correctly, it's a  straightforward  matter to redirect the I/O to
+files.    The last trick is that I make no attempt  to  do  error
+correction/recovery.   The   programs   we'll  be  building  will
+RECOGNIZE errors, and will not CRASH, but they  will  simply stop
+on the first error ... just like good ol' Turbo does.  There will
+be  other tricks that you'll see as you go. Most of them can't be
+found in any compiler textbook, but they work.
+
+A word about style and efficiency.    As  you will see, I tend to
+write programs in  _VERY_  small, easily understood pieces.  None
+of the procedures we'll  be  working with will be more than about
+15-20 lines long.  I'm a fervent devotee  of  the  KISS  (Keep It
+Simple, Sidney) school of software development.  I  try  to never
+do something tricky or  complex,  when  something simple will do.
+Inefficient?  Perhaps, but you'll like the  results.    As  Brian
+Kernighan has said,  FIRST  make  it  run, THEN make it run fast.
+If, later on,  you want to go back and tighten up the code in one
+of  our products, you'll be able to do so, since the code will be
+quite understandable. If you  do  so, however, I urge you to wait
+until the program is doing everything you want it to.
+
+I  also  have  a  tendency  to  delay  building  a module until I
+discover that I need  it.    Trying  to anticipate every possible
+future contingency can  drive  you  crazy,  and  you'll generally
+guess wrong anyway.    In  this  modern day of screen editors and
+fast compilers, I don't hesitate to change a module when I feel I
+need a more powerful one.  Until then,  I'll  write  only  what I
+need.
+
+One final caveat: One of the principles we'll be sticking to here
+is that we don't  fool  around with P-code or imaginary CPUs, but
+that we will start out on day one  producing  working, executable
+object code, at least in the form of  assembler  language source.
+However, you may not  like  my  choice  of assembler language ...
+it's 68000 code, which is what works on my system (under SK*DOS).
+I  think  you'll  find, though, that the translation to any other
+CPU such as the 80x86 will  be  quite obvious, though, so I don't
+see  a problem here.  In fact, I hope someone out there who knows
+the '86 language better than I do will offer  us  the  equivalent
+object code fragments as we need them.
+
+
+THE CRADLE
+
+Every program needs some boiler  plate  ...  I/O  routines, error
+message routines, etc.   The  programs we develop here will be no
+exceptions.    I've  tried to hold  this  stuff  to  an  absolute
+minimum, however, so that we  can  concentrate  on  the important
+stuff without losing it  among  the  trees.  The code given below
+represents about the minimum that we need to  get  anything done.
+It consists of some I/O routines, an error-handling routine and a
+skeleton, null main program.   I  call  it  our  cradle.    As we
+develop other routines, we'll add them to the cradle, and add the
+calls to them as we  need to.  Make a copy of the cradle and save
+it, because we'll be using it more than once.
+
+There are many different ways to organize the scanning activities
+of  a  parser.   In Unix systems, authors tend to  use  getc  and
+ungetc.  I've had very good luck with the  approach  shown  here,
+which is to use  a  single, global, lookahead character.  Part of
+the initialization procedure  (the  only part, so far!) serves to
+"prime  the  pump"  by reading the first character from the input
+stream.  No other special  techniques are required with Turbo 4.0
+... each successive call to  GetChar will read the next character
+in the stream.
+
+
+{--------------------------------------------------------------}
+program Cradle;
+
+{--------------------------------------------------------------}
+{ Constant Declarations }
+
+const TAB = ^I;
+
+{--------------------------------------------------------------}
+{ Variable Declarations }
+
+var Look: char;              { Lookahead Character }
+                              
+{--------------------------------------------------------------}
+{ Read New Character From Input Stream }
+
+procedure GetChar;
+begin
+   Read(Look);
+end;
+
+{--------------------------------------------------------------}
+{ Report an Error }
+
+procedure Error(s: string);
+begin
+   WriteLn;
+   WriteLn(^G, 'Error: ', s, '.');
+end;
+
+
+{--------------------------------------------------------------}
+{ Report Error and Halt }
+
+procedure Abort(s: string);
+begin
+   Error(s);
+   Halt;
+end;
+
+
+{--------------------------------------------------------------}
+{ Report What Was Expected }
+
+procedure Expected(s: string);
+begin
+   Abort(s + ' Expected');
+end;
+
+{--------------------------------------------------------------}
+{ Match a Specific Input Character }
+
+procedure Match(x: char);
+begin
+   if Look = x then GetChar
+   else Expected('''' + x + '''');
+end;
+
+
+{--------------------------------------------------------------}
+{ Recognize an Alpha Character }
+
+function IsAlpha(c: char): boolean;
+begin
+   IsAlpha := upcase(c) in ['A'..'Z'];
+end;
+                              
+
+{--------------------------------------------------------------}
+
+{ Recognize a Decimal Digit }
+
+function IsDigit(c: char): boolean;
+begin
+   IsDigit := c in ['0'..'9'];
+end;
+
+
+{--------------------------------------------------------------}
+{ Get an Identifier }
+
+function GetName: char;
+begin
+   if not IsAlpha(Look) then Expected('Name');
+   GetName := UpCase(Look);
+   GetChar;
+end;
+
+
+{--------------------------------------------------------------}
+{ Get a Number }
+
+function GetNum: char;
+begin
+   if not IsDigit(Look) then Expected('Integer');
+   GetNum := Look;
+   GetChar;
+end;
+
+
+{--------------------------------------------------------------}
+{ Output a String with Tab }
+
+procedure Emit(s: string);
+begin
+   Write(TAB, s);
+end;
+
+
+
+
+{--------------------------------------------------------------}
+{ Output a String with Tab and CRLF }
+
+procedure EmitLn(s: string);
+begin
+   Emit(s);
+   WriteLn;
+end;
+
+{--------------------------------------------------------------}
+{ Initialize }
+
+procedure Init;
+begin
+   GetChar;
+end;
+
+
+{--------------------------------------------------------------}
+{ Main Program }
+
+begin
+   Init;
+end.
+{--------------------------------------------------------------}
+
+
+That's it for this introduction.  Copy the code above into TP and
+compile it.  Make sure that it compiles and runs  correctly. Then
+proceed to the first lesson, which is on expression parsing.
+
+
+*****************************************************************
+*                                                               *
+*                        COPYRIGHT NOTICE                       *
+*                                                               *
+*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *
+*                                                               *
+*****************************************************************
+
+
+
+