Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing multiple records from a file

Status
Not open for further replies.

ryanc2

MIS
Apr 18, 2003
73
US
I need to remove a lot of records (38k) from a file. I have a bad file containing all of the keys and then the master file needing cleaning.

started using a while loop with grep -v and appending the records to a new file, but it just doesn't seem to be a logigcal way to do it.

Any help would be appreciated.

Thanks
 
Can you be slight more specific? You want to remove 38k records... Is every record on one line? Column? Is there anything specific you want removed? Any specific number of lines?

perl -e 'print $i=pack(c5,(40*2),sqrt(7600),(unpack(c,Q)-3+1+3+3-7),oct(104),10,oct(101));'
 
Hi

You mentioned a master file. Sounds like you want something like this :
Code:
[gray]# data file[/gray]
[blue]master #[/blue] cat first.txt
one line
two line
three line
four line
five line

[gray]# pattern file[/gray]
[blue]master #[/blue] cat second.txt
five
two
three

[gray]# keep only data matchig a pattern[/gray]
[blue]master #[/blue] grep -f second.txt first.txt
two line
three line
five line

[gray]# remove data matching a pattern[/gray]
[blue]master #[/blue] grep -v -f second.txt first.txt
one line
four line

Feherke.
 
Why not something simpler:

sed '/dont_want_this_line/{d;}' file

Where dont_want_this_line is a concurrent string in each line he doesn't in the file:

[root@mybox ]# tail -n 5 /var/log/messages
Nov 3 13:06:58 mybox dhcpd: DHCPREQUEST for 192.168.162.249 (192.168.162.91) from 00:04:13:24:67:57 via eth1
Nov 3 13:06:58 mybox dhcpd: DHCPACK on 192.168.162.249 to 00:04:13:24:67:57 via eth1
Nov 3 13:08:02 mybox dhcpd: DHCPINFORM from 192.168.162.138 via eth1: not authoritative for subnet 192.168.162.0
Nov 3 13:08:05 mybox dhcpd: DHCPINFORM from 192.168.162.138 via eth1: not authoritative for subnet 192.168.162.0
Nov 3 13:10:41 mybox dhcpd: DHCPINFORM from 192.168.162.132 via eth1: not authoritative for subnet 192.168.162.0
[root@mybox ]# tail -n 5 /var/log/messages|sed '/INFORM/{d;}'
Nov 3 13:06:58 mybox dhcpd: DHCPREQUEST for 192.168.162.249 (192.168.162.91) from 00:04:13:24:67:57 via eth1
Nov 3 13:06:58 mybox dhcpd: DHCPACK on 192.168.162.249 to 00:04:13:24:67:57 via eth1

Its a vague question hence me asking for more info ;)


perl -e 'print $i=pack(c5,(40*2),sqrt(7600),(unpack(c,Q)-3+1+3+3-7),oct(104),10,oct(101));'
 
Anyhow... Thought you could fiddle with these...

Matching a pattern
sed -n '/dont_show_me_lines_with_this/!p'
sed '/dont_show_me_lines_with_this/d'
awk '!/dont_show_me_lines_with_this/'

Based on line numbers... (prints lines 1-2000)
sed -n '1,2000p' or sed '1,2000!d'
awk 'NR==1,NR==2000'



perl -e 'print $i=pack(c5,(40*2),sqrt(7600),(unpack(c,Q)-3+1+3+3-7),oct(104),10,oct(101));'
 
sorry - i normally assume people can read my mind when I speak.

More info:

Master file is one line per record and each record begins with 10 numbers (one unique record per line).

0000000001
0000000002

Bad file is a file containing all of the bad record numbers that need removing. So I basically need to remove the bad records from the master file based on the record numbers being inputted from the bad file.

I understand most of your approaches except how to feed in the record keys from the bad file.
 
man comm

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
diff -F goodfile badfile

Wow. comm how underrated it that!

perl -e 'print $i=pack(c5,(40*2),sqrt(7600),(unpack(c,Q)-3+1+3+3-7),oct(104),10,oct(101));'
 
Sorry again, but I guess I'm not being very clear. The bad file doesn't contain the entire record, just the key that identifies the bad record in the master file - so comm or diff would kick out everything.

master file: (19,435,954 records)
1234567891 some data more data even more data

bad file: (32,727 records)
1234567891

Thanks for the help.

 
Can you show me:

1 line from good_file
1 line from bad_file

perl -e 'print $i=pack(c5,(40*2),sqrt(7600),(unpack(c,Q)-3+1+3+3-7),oct(104),10,oct(101));'
 
Master File:

371110794 00607320060731A200017 YYYNN 007 61.07
371113005 00607320068991A200017 NNNNN 007 59.04

Bad File:

371110794
371110795
371110796
371110797
371110798
371110799

In this example, I need to remove the first record from master and not the second record.
 
grep -v -f bad_file master_file

but I would prepend every line in bad_file with a caret (^) so that it matches the key only in the beginning of a record in the master_file.

in order to do that:
vi badfile
:1,$ s/^/\^/
:wq


HTH,

p5wizard
 
sys_dir> grep -v -f bad_invoice_keys.txt master_invoice_file.dat > master_invoice_file.new
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . . .

what am I missing?
 
Or this (no modifications required presuming the files are sorted):

[tt]join -1 1 -v 1 master badfile[/tt]

Annihilannic.
 
/usr/xpg4/bin/grep ?

perl -e 'print $i=pack(c5,(40*2),sqrt(7600),(unpack(c,Q)-3+1+3+3-7),oct(104),10,oct(101));'
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top