Hi All,
I have 2 large files (~2GB each).
They both contain a list of URLS.
I believe they should roughly overlap.
So, I have a script to grep the URLs do exist in file1 but not in file2:
#!/bin/bash
file1=urls1.txt
file2=urls2.txt
file3=not_found_in_1.txt
while read line; do
grep "$line" $file2 || echo "$line not found file2" >> $file3
done < $file1
It works but I am sure there is something faster. (48+hours)
I know perl/python could probably speed this up, but I am more curious what tips/thought on keeping this using bash.
Thanks
-jouell
I have 2 large files (~2GB each).
They both contain a list of URLS.
I believe they should roughly overlap.
So, I have a script to grep the URLs do exist in file1 but not in file2:
#!/bin/bash
file1=urls1.txt
file2=urls2.txt
file3=not_found_in_1.txt
while read line; do
grep "$line" $file2 || echo "$line not found file2" >> $file3
done < $file1
It works but I am sure there is something faster. (48+hours)
I know perl/python could probably speed this up, but I am more curious what tips/thought on keeping this using bash.
Thanks
-jouell