[plug] Using YACC

Fri Aug 2 17:45:39 WST 2002

Hi all,

This may be moderately off-topic (it is actually a linux problem - albeit a
programming problem  8-)  ); I am teaching myself to use LEX and YACC in
order to solve a problem we have here.  We have a bunch of source-code with
in-line comments all through it.  We want to re-arrange those comments so
that we can use the DOC++ tool to generate documentation.  The comments as
they exist have been written to a company standard so they are reasonably
consistent, however they are not how DOC++ wants to see them.  Basically,
rather than manually going through the 1-2 million lines of code we have
here and re-arranging the comments for DOC++, we would like to get a program
to do that for us....

To do this, I have begun writing a very simple grammer using lex and yacc.
I have never written a full grammer before - I have only ever hacked someone
else's existing working code, and so I am still a learner when it comes to
the nitty-gritty details.  Anyway here is my problem.

I have written this grammer to recognise comments in the code (either C or
C++ style), and to simply print them out to the stdout (for now) (see code
at the end of this email):

What I am trying to do is to define a "file" as containing 0 or more lines.
However, while it is fairly stragihtforward (using extended regular
expressions) in lex to define a token that has repeating parts, when you
want to define a grammatical component that has repeating components; well,
I haven't yet figured out how to do it.

My "file" rule in parser.y shown below, needs to consist of 0 or more lines.
At present, when reading an input stream, it reads the first line in the
input stream perfectly, but subsequently it causes a syntax error (because a
file is defined as having only one line at this point).

Does anyone know how to define a grammatical element that has (potentially)
infinitely repeating parts - ie. possible to have an infinite number of
lines to a file...

thanks heaps guys

David Buddrige

parser.y:

%{
#include <stdio.h>
#include <string.h>

void yyerror(const char *str)
{
	fprintf(stderr, "error: %s\n",str);
}

main()
{
	yyparse();
}

%}

%token C_COM_START C_COM_END CPP_COM TEXT EOLN

%%

file:           line                   { printf("line!\n"); }
              ;

line:           EOLN                   { printf("blank line!\n"); }
              | TEXT c_comment EOLN    { printf("c_comment!\n"); }
              | TEXT cpp_comment       { printf("cpp_comment!\n"); }
              | TEXT EOLN              { printf("ordinary line!\n"); }
	      ;

cpp_comment:    CPP_COM TEXT EOLN

c_comment:      C_COM_START TEXT C_COM_END

%%

This code uses the following lex built lexer as its input:

scanner.l:

/*

**  INCLUDE FILES
*/

%{

#include <stdio.h>

/*Note that y.tab.h is generated by using yacc with the -d option*/

#include "y.tab.h"

%}

C_COM_START           \/\*
C_COM_END             \*\/
CPP_COM               \/\/
TEXT                  [a-zA-Z0-9]
EOLN                  \n
WHITESPACE            [\t\ ]

%%

{C_COM_START}        {
                         return C_COM_START;
                     } 

{C_COM_END}          {
                         return C_COM_END;
                     }

{CPP_COM}            {
                         return CPP_COM;
                     }

{TEXT}*              {
                         return TEXT;
                     }
{EOLN}               {
                         return EOLN;
                     }

{WHITESPACE}         {  /* Ignore Whitespace */
                     }

%%

int yywrap()
{
    return 1;
}

/*
int main(int argc, char **argv)
{
    printf(yytext);
    yylex();
}
*/

This e-mail and any attachment is for authorised use by the intended recipient(s) only.  It may contain proprietary material, confidential information and/or be subject to legal privilege.  It should not be copied, disclosed to, retained or used by, any other party.  If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender.  Thank you.