\author{Simon Cozens} \title{The Basis for a Recursive-Descent Parser} @* Recurse. On the 26th of October, I attended a lecture by Dr. Carroll Morgan on compiler and grammar design. He demonstrated how to program a right-recursive recursive-descent parser in some bizarre programming language, probably ML{}. Well, I'm attempting to try it out in Perl and see what happens. This parser works on a fairly silly grammar which is defined below, but which proves that different `types' of terminal symbols aren't a problem. @p @ @ @ @ @ Examine Parameters The parameter is a string to parse. I've replaced the |eof| idea in the lecture by a full stop, so I'm assuming the ``lexer\footnote{But of course, it's all happening by hand until someone (eg me) comes up with a clever way of lexing Japanese}'' produces a string something like this: \begin{verbatim} A A N V . \end{verbatim} We don't need to do anything fancy to the parameter just yet, other than throw the CRLF away. @= chomp($target=shift); @* Parse. The parsing function itself is fairly straightforward, but the non-terminal-based functions are quite strange. Basically, parsing in this instance means constructing a sentence and complaining if there's anything left. @= init($target); $p=sentence($target); if (nexttoken($target) eq ".") { print "S\n\n";} else { print "! Garbage after expression.\n"} @* Auxilliary Functions. The three functions defined here set up and operate the tokeniser. @= @ @ @ @ Initialise Tokens The tokenisation routines skip down a global array, which is built up here by taking the input and splitting it up by spaces. The pointer is set to the beginning of the array, a fairly good place to start. @= sub init{ $tokenstring=shift; @@tokenlist=split / /, $tokenstring; $tokencount=0; } @ Get and skip. These two small helper functions do the tokenisation work. We could do this inline, but I want to make the basis as much like Dr. Morgan's example as possible. I added |lasttoken| simply for producing more human error messages. @= sub nexttoken { return $tokenlist[$tokencount] } sub lasttoken { return $tokenlist[$tokencount-1] } @= sub skiptoken { $tokencount++ } @* Non Terminals. These functions deal with the non-terminal tokens, implementing the PS\footnote{Or BNF, Chomsky II, Chomsky IV, Context-free or whatever you want to call them.} rules. I like the way this happens. The rules we're implementing are: \begin{eqnarray} S & \rightarrow & NP \quad VP \\ NP & \rightarrow & N \quad | \quad AP \quad N \\ VP & \rightarrow & V \\ AP & \rightarrow & A \quad | \quad A \quad AP \end{eqnarray} @= @ @ @ @ @ Deal with Sentences @= sub sentence{ print "Looking for a sentence...\n"; nounphr(); verbphr(); } @ Deal with Noun Phrases @= sub nounphr{ print "Looking for a noun phrase...\n"; if (nexttoken() eq "N") { print "NP\n"; skiptoken(); } elsif (nexttoken() eq "A") { adjphr(); nounphr() } else { die "! Syntax Error\nDidn't expect a ".nexttoken()." after ".lasttoken().".\n"; } } @ Deal with Verb Phrases @= sub verbphr{ print "Looking for a verb phrase...\n"; if (nexttoken() eq "V") { print "VP\n"; skiptoken(); } else { die "! Syntax Error\nDidn't expect a ".nexttoken()." after ".lasttoken().".\n"; } } @= sub adjphr{ print "Looking for an adjective phrase...\n"; while (nexttoken() eq "A") { print "A "; skiptoken(); } print "-> AP\n"; }