Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

List to large in grep 1

Status
Not open for further replies.

lhg1

IS-IT--Management
Mar 29, 2005
134
DK
Hi

I am having problems finding a solution for a problem where I use grep -v.

And the list has become to large.

The problem is the bold sentence. And I need to figure out how to overcome problems with large lists.



Code:
echo "Start Medium"    
    # SECTION TO MAKE SURE THAT YOU ARE NOT PAGED TWICE FOR THE SAME LOG
echo 1
    for item in `cat $NAME_FILTER_MEDIUM`
	do
		cat $TMP_LIST.New_Logs|grep $item >> $TMP_LIST.medium_Logs
	done
 		[b]cat $TMP_LIST.medium_Logs|grep -f $NAME_FILTER_MEDIUM|egrep -v "$FILTER_OUT_MEDIUM"|while read LINE[/b]
        do
            LOG=$LINE
            STATUS=`grep $LOG $SCRIPT_LOG_FILE`
            if [ ${#STATUS} -gt 0 ]
                then
                   echo "Previously checked this file. Skipping!">null
                else
                    # this log has never been checked before
                    echo $LOG >> $TMP_LIST.New_Logs_Filtered_MEDIUM
            fi
    done

Regards
LHG
 
Can you describe more generally what you are trying to achieve here, because I can't make much sense of that code.

How many lines are there in the $NAME_FILTER_MEDIUM file?

Why is it not working, does grep return an error message? Incidentally, if the lines do not contain regular expressions, you should use fgrep (or grep -F) for efficiency.

Annihilannic.
 
It would help to see the contents of the file [tt]$NAME_FILTER_MEDIUM[/tt]. Also a description of what you're trying to do with the code would help (is the comment accurate?).

Regardless, it looks like this bit of code...
Code:
for item in `cat $NAME_FILTER_MEDIUM`
do
    cat $TMP_LIST.New_Logs|grep $item >> $TMP_LIST.medium_Logs
done
...is essentially doing the same thing as...
Code:
cat $TMP_LIST.medium_Logs|grep -f $NAME_FILTER_MEDIUM|...
...so it looks like you are filtering it twice. How about changing it to this...
Code:
while read ITEM
do
    grep $ITEM $TMP_LIST.New_Logs
done < $NAME_FILTER_MEDIUM|egrep -v "$FILTER_OUT_MEDIUM"|while read LINE
do
    LOG=$LINE
    STATUS=`grep $LOG $SCRIPT_LOG_FILE`
    if [ ${#STATUS} -gt 0 ]
    then
        echo "Previously checked this file. Skipping!">null
    else
        # this log has never been checked before
        echo $LOG >> $TMP_LIST.New_Logs_Filtered_MEDIUM
    fi
done
Or better yet, get rid of some of those greps...
Code:
sort $FILTER_OUT_MEDIUM  > $FILTER_OUT_MEDIUM.sorted
sort $NAME_FILTER_MEDIUM > $NAME_FILTER_MEDIUM.sorted
sort $TMP_LIST.New_Logs  > $TMP_LIST.New_Logs.sorted

# If you don't have "join", change to "comm -12"
join $NAME_FILTER_MEDIUM.sorted $TMP_LIST.New_Logs.sorted > $TMP_LIST.medium_Logs

# "comm -23" leaves only lines unique to the first file listed
comm -23 $TMP_LIST.medium_Logs $FILTER_OUT_MEDIUM.sorted | while read LINE
do
    STATUS=`grep $LINE $SCRIPT_LOG_FILE`
    if [ ${#STATUS} -gt 0 ]
    then
        echo "Previously checked this file. Skipping!">null
    else
        # this log has never been checked before
        echo $LINE >> $TMP_LIST.New_Logs_Filtered_MEDIUM
    fi
done
I haven't tested this, but if you sort the files first, you can use "[tt]comm[/tt]" and "[tt]uniq[/tt]" (and on some unixes "[tt]join[/tt]") to do matching with the files. The "[tt]comm[/tt]" and "[tt]join[/tt]" commands can show the lines that are common or unique between the two files, which is what you're trying to do with the [tt]grep[/tt]s anyway. See the man pages for more info.

You can also get rid of that "[tt]grep[/tt]" in the loop to get [tt]STATUS[/tt], but I'll leave that to you.

Hope this helps.






 
Hi

The grep version from SamBones works. Thanks

The files are not alike, so the sort is not a great option in this case.

The files are large - and the problems accurede with the grep -f $NAME_FILTER_MEDIUM and the error message was.
list to large

But new way of dooing it workede great.

/Lhg

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top