Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Analyzing C++ code using awk 1

Status
Not open for further replies.

PerFnurt

Programmer
Feb 25, 2003
972
SE
Hiya AWK gurus!

I'm trying analyze C++ files, counting c-style casts and class definitions (possible more stuff in the future).

All in all it works fairly well, but I have some problems.

1. Line by line parsing

I have to make hacks to detect line breaked statements, like
Code:
class Foo
{
where { is on a new line.

I still want to differentiate a statement like
Code:
class Foo;
(which should be ignored) from
Code:
class Foo : public Bar
{
...
class Foo {
...
class Foo 
{
Which should be detected. Any clever way to accomplish this (Is awk perhaps not suitable for this kind of things - I dont wanna write a full c++ parser ;-) )?

2. c-style cast detection
This is my c-style detection pattern:
Code:
#--- Match c cast ---
# (Identifier*) Identifier
# With optional * and optional whitespaces here and there
/\([ \t]*[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\**[ \t]*\)[ \t]*[a-zA-Z_][a-zA-Z0-9_]*/ {

When detecting c-style cast I have some problem with statements like
Code:
void getBar() { if (foo) return bar; }
Any cool way to ignore those?

It'd also be relly neat to skip commented sections entirely. // comments are manageble;I can just check if a // is preceding the first (, but what about /*...*/ sections? Is that even feasible?

Any ideas are welcome.

Using some old awk.exe (yes - we're talking windows here) from the MKS toolkit)

Thank you

/Per

"It was a work of art, flawless, sublime. A triumph equaled only by its monumental failure."
 
Mmm, well you're always going to have trouble with bits of deliberately mis-formed code which is valid code, but which simple pattern matching will fail to find.
Code:
int main ( ) {
    double d = 123.456;
    printf( "%d\n", (
    int /* find me if you can */
    )d );
    return 0;
}

If your code is pretty well disciplined in terms of formatting, you can usually get away with what you're doing, but you have to keep being prepared for the next funny to emerge.
For example, writing
Code:
if (foo) { return bar; }
A 'don't omit optional {} policy'

If you don't want to write a complete parser, then consider the following options
1. Use Perl.
I think there are a few Perl modules which actually parse C++ code, and from there it may be possible to extract the info you require. The search for "C++ parser" throws up a lot of matches though, so actually finding something may take a while.
Then there's the whole "learn perl well enough to make use of it" problem, if you don't know it already.

2. Source Navigator
It is a pretty poor (compared to say VC++) IDE, but the information it extracts and stores in easy to handle databases is really quite nifty.

I use the 'dbdump' tool (Documentation->Programmers reference guide) to export the database files as text files, so they're easy to handle with Perl/Awk/Etc.

One thing you would find useful from this would be a list of class names and typedef names, which would simplify somewhat the location of casts, since you could tell whether the "(identifier)" was actually a type/class name and not something else.

--
 
Right, thanks.

Expected to get a perlish suggestion. Never worked with it though, so...

The code if fairly well disciplined in terms of formatting, but since it consists of about 10000 .h/.cpp files, there's bound to be exceptions/diversions from policy X-)


/Per

"It was a work of art, flawless, sublime. A triumph equaled only by its monumental failure."
 
Sure you're not trying to implement Lint?

Flexe-lint/PC-Lint from Gimpel has this very error message
[tt]
1924 C-style cast -- A C-style cast was detected. This can be
replaced by one of the newer C++ casts having the form:
Name<Type>(Expression) where Name is one of static_cast,
dynamic_cast, const_cast or reinterpret_cast. [23, Item 2].[/tt]

--
 
Quite sure :). We're already using lint quite extensively but allowing c-style casts (due to the fact we already have a bunch - that's legacy for you), but would like to make over-all measures to get some fancy charts on code metrics (autogenerated and updated daily).

/Per

&quot;It was a work of art, flawless, sublime. A triumph equaled only by its monumental failure.&quot;
 
You can eliminate the problems caused by oddly placed linebreaks by accumulating all lines you read into one string until you hit a blank line:
[tt]
/^ *$/ {#Blank line. Process what we've accumulated.
if ( !stuff ) next
# Remove comments; don't skip from starter
# to ender of next comment.
gsub( /\/\*(\*[^\/]|[^*])*\*\//, "", stuff )
# Reduce whitespace.
gsub( / +/, " ", stuff )

# More processing here.

print stuff
stuff = ""
next
}

{ stuff = stuff $0 " " }
[/tt]

 
More elaborate:
[tt]
/^ *$/ {#Blank line. Process what we've accumulated.
proc()
stuff = ""
next
}

{ stuff = stuff $0 " " }

END { proc() }

function proc()
{ gsub( /^ +| +$/, "", stuff )
if ( !stuff ) return
# Remove comments; don't skip from starter
# to ender of next comment.
gsub( /\/\*(\*[^\/]|[^*])*\*\//, "", stuff )
# Reduce whitespace.
gsub( / +/, " ", stuff )

casts( stuff )

}

function casts( s )
{ while ( match(s,
/\([ \t]*[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\**[ \t]*\)[ \t]*[a-zA-Z_][a-zA-Z0-9_]*/))
{ Match(s)
# Discard things like " if (foo) ".
if (!(RLEFT ~ /(^|[^a-zA-Z0-9_])if *$/))
print RMATCH
s = RRIGHT
}
}

function Match( s )
{ if ( RSTART )
{ RMATCH = substr(s, RSTART, RLENGTH )
RLEFT = substr(s, 1, RSTART - 1)
RRIGHT = substr(s, RSTART + RLENGTH )
}
else { RMATCH = RLEFT = ""
RRIGHT = s
}
return RSTART
}
[/tt]

 
> We're already using lint quite extensively but allowing c-style casts
So do two lint runs - one with and one without, and display the differences

--
 
Thanks for the advice...I'll look into it.

/Per

&quot;It was a work of art, flawless, sublime. A triumph equaled only by its monumental failure.&quot;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top