Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

when I add 2 lines, I lose 2

Status
Not open for further replies.

Clayster

Programmer
Oct 16, 2001
2
US
Hi,

I'm writing a pretty involved spam filter in awk ... so far so good, for the most part.

Basically, the program reads in a queued mail file, stores each line in an array:

tempMail[i++] = $0

... then scans each line against several spam regexps, and then in the END portion of the script, loops through the temp array and writes out to a new file.

While writing out to this new temporary file, each line is checked to see if it's the "Subject" header ... if it is, two lines are thrown into the temporary file: the overall (X-spam-score) score of the spam tests, and a second header (X-spam-failed) listing out codes for the tests the message failed. AFTER printing out those two lines, the matched line (Subject: *) is then written out, and the program continues writing out lines to the file.

FInally, the temp file is moved to overwrite the original file in the queue, and then the mail is delivered.

THE PROBLEM: when I write out the two X-spam headers, I lose the last two lines of the original message!

I've done some bug testing here and looped through withOUT printing out my two X-spam headers ... and when I don't print those out, I get all of the lines of the original file from the array.

Any ideas on why that would be happening, and how I might prevent it?

Thanks in advance -- once the script is final and bug-tested, I plan to release it publicly. (If anyone here is using CommuniGate Pro on a UNIX system and would like to beta-test, please let me know.)

Program tested on: gawk 3.1.0, mawk 1.3.3

-Clay
 
Without seeing the script it's quite tricky, however it sounds as though you may be assigning the lines to the array like so;
line 1
line 2
line 3
The array now thinks it has three elements.

Printing the two extra lines within the loop may be forcing the last two elements out of the array. I would need to see the script and a representative sample of the input file to say for sure.
SOL
Yeah people they won't understand,
Girlfriends they don't understand,
In spaceships they don't even understand,
and me I aint ever gonna understand
 
I can e-mail you a complete copy of the script ... it's too long to post here.

However, here are a few relevant snippets of an abridged version:

BEGIN {
# mailTemp array counter
m = 0
}
{
# Add to temp mail array
mailTemp[m++] = $0
# begin a series of over 100 regexps on $0
# looking for common spam contents.
# assign each match a user-defined severity
# score.
# EXAMPLE:
# DEAR_FRIEND Test
if (tolower($0) ~ /dear friend/)
TESTS_FAILED["DEAR_FRIEND"] = 1;
}
END {
# HITS of all messages is 0 to start off with
HITS = 0

# Loop through all tests results
for (thisTest in scores) {
if (TESTS_FAILED[thisTest] == 1)
HITS += scores[thisTest];
}

if (HITS >= REQUIRED_SCORE) {
# prepare failures output
FAILURES = ""

for(failedTest in scores) {
if (TESTS_FAILED[failedTest] == 1)
FAILURES = FAILURES ", " failedTest
}
sub(/^, /, "", FAILURES)
# INSERT X-SpamSniper- Headers
xssprinted = 0
# write out with new headers
mailTempFile = mailFile ".tmp"
system("touch "mailTempFile)
for(h=0; h < m; ++h) {
tempLine = mailTemp[h];
if(tempLine ~ /^(CC|To|Sender|Subject)/ && xssprinted == 0) {
# HERE IS WHERE MY PROBLEM IS
print &quot;X-SpamSniper-Score: &quot;HITS >> mailTempFile
print &quot;X-SpamSniper-Failed: &quot;FAILURES >> mailTempFile
print tempLine >> mailTempFile
++xssprinted
} else {
print tempLine >> mailTempFile
}
}
# OVERWRITE message with tmp file contents
system(&quot;mv &quot;mailTempFile&quot; &quot;mailFile)
# Mail Log warning
logWarning = &quot;SpamSniper: Score &quot; HITS &quot;; &quot; FAILURES
# print logWarning to STDERR
print logWarning > /dev/stderr
exit(1)
} else {
# Below threshold score, let it go
exit(0)
}
}


Thanks very much for the help!

-Clay


 
I can't see any obvious problems with the section you've sent through. If you can send a copy of the script and a sample file to steve.boardman@vivendiwater.com, I'll be happy to have a go at debugging it for you. SOL
Yeah people they won't understand,
Girlfriends they don't understand,
In spaceships they don't even understand,
and me I aint ever gonna understand
 
Clay,
I've looked at your script and I still can't see what is causing the problem. Although I don't appear to have received the full script only 1434 lines. Another problem is I that there appear to be subtle differences between awk and mawk, I've tried downloading a copy of mawk for NT but it doesn't like your script either. As soon as I get my Linux laptop back from the operating table ( it's poorly ) I'll have a go under linux but until then.... Anybody else got any ideas? SOL
Yeah people they won't understand,
Girlfriends they don't understand,
In spaceships they don't even understand,
and me I aint ever gonna understand
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top