Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Find all files in a directory and process only ones that have a specific line, skip if does not 1

Status
Not open for further replies.

Tester_V

Technical User
Nov 22, 2019
54
US
Need help, I’m new to python,
I wrote a script that should find all files in a directory and process only ones that have a specific line and skip the ones that do not have the line. The specific line is ‘," Run Time’ it fails to process ONLY files I need, it processes all files.

All Lines to find:

‘," Run Time’
‘,” Start Time’
‘,” End Time’
‘Test_ID:’
‘Test Program Name:’
‘Product:’
Also,
Lines 1, 2 and 3 are repeating lines and I need them all,
Lines 4, 5 and 6 also repeating but I need to capture them only ones.
Here is the script I have:
Python:
import os

runtime_l = ',"  Run  Time'
start_tm  = ',"  Start Time'
end_tm    = ',"  End  Time'
test_ID   = ' Host Name: '
program_n = 'Test Program Name:'
prod_n    = 'Product:'

given_path = 'C:\\02\\en15\\TST'
for filename in os.listdir(given_path):
    filepath = os.path.join(given_path, filename)
    if os.path.isfile(filepath):
        print("File Name:   ", filename) 
        print("File Name\\Path:", filepath) 
        with open(filepath) as mfile:        
            for line in mfile:
                if runtime_l in line:
                    # do something with the line
                    print(line)
                    
                if start_tm in line:
                    # do something with the line
                    print(line)  

                if end_tm in line:
                    # do something with the line
                    print(line) 
                    
                if test_ID in line:
                    # do something with the line
                    print (line)
        
                if program_n in line:
                    # do something with the line
                    print (line)
        
                if prod_n in line:
                    # do something with the line
                    print (line)
                else:                    
                    continue
 
Hi

Hard to tell without a sample file what would be the best way, but generally there would be 2 cases :
[ul]
[li]If the line containing runtime_l always occurs before the other lines, then use a flag : 1) initialize it to false; 2) switch it to true when runtime_l is met; 3) when encountering another line and the flag is false, stop processing the file.[/li]
[li]If the line containing runtime_l may occur anywhere in the file, then loop over the file twice : if the runtime_l was found, close the file and start the reading loop again looking only for the rest of lines.[/li]
[/ul]


Feherke.
feherke.github.io
 
I really appreciate your suggestion but I do not know what that means, I'm new to Python.
the File can be anything with the 6 lines inserted randomly.

The two questions I need help with are:
1. How I can test a file to make sure it has a line "Run Time", process it if it has the line, and skip the rest of the files in a directory?
2. how to print only first match and ignore the rest of the matched lines?
for example, I'm looking for a line in a file that has this string "Test Program Name:", the file can have 2 to 5 or even more lines with this string.
I need to find only the first matched line that has the string 'Test Program Name:' and if found I want the code to print the line and start looking for the next variable to match that is '‘Test_ID:'.
 
Hi Tester_V,

I'm trying to answer your two questions:
Tester_V said:
1. How I can test a file to make sure it has a line "Run Time", process it if it has the line, and skip the rest of the files in a directory?
2. how to print only first match and ignore the rest of the matched lines?


1. You could select for processing only those files which contain "Run Time".
Look at the command
Code:
find $PWD -name "*.txt" | xargs grep -l "Run Time"
used with popen() in my example below

2. You can mark when you have found one of the strings.
See the usage of the dictionary strings_found in my example below

Here is the example:
I created some files in a directory tree
Code:
.
├── dirname_01
│   └── file_04.txt
├── file_01.txt
├── file_02.txt
└── file_03.txt
The files file_01.txt, file_03.txt and file_04,txt contain "Run Time".

tester_v.py
Code:
[COLOR=#800080]import[/color] os

[COLOR=#a52a2a][b]def[/b][/color] [COLOR=#008b8b]process_file[/color](file_path):
  strings_found = {
    [COLOR=#ff00ff]"[/color][COLOR=#ff00ff]Run Time[/color][COLOR=#ff00ff]"[/color]:[COLOR=#008b8b]False[/color],
    [COLOR=#ff00ff]"[/color][COLOR=#ff00ff]Start Time[/color][COLOR=#ff00ff]"[/color]:[COLOR=#008b8b]False[/color],
    [COLOR=#ff00ff]"[/color][COLOR=#ff00ff]End Time[/color][COLOR=#ff00ff]"[/color]:[COLOR=#008b8b]False[/color],
    [COLOR=#ff00ff]"[/color][COLOR=#ff00ff]Test_ID[/color][COLOR=#ff00ff]"[/color]:[COLOR=#008b8b]False[/color],
    [COLOR=#ff00ff]"[/color][COLOR=#ff00ff]Test Program Name[/color][COLOR=#ff00ff]"[/color]:[COLOR=#008b8b]False[/color],
    [COLOR=#ff00ff]"[/color][COLOR=#ff00ff]Product[/color][COLOR=#ff00ff]"[/color]:[COLOR=#008b8b]False[/color]
  }
  [COLOR=#008b8b]print[/color]([COLOR=#ff00ff]"[/color][COLOR=#ff00ff]*** Processing %s:[/color][COLOR=#ff00ff]"[/color] % file_path)
  txt_file = [COLOR=#008b8b]open[/color](file_path, [COLOR=#ff00ff]"[/color][COLOR=#ff00ff]r[/color][COLOR=#ff00ff]"[/color])
  [COLOR=#a52a2a][b]for[/b][/color] line [COLOR=#a52a2a][b]in[/b][/color] txt_file:
    [COLOR=#a52a2a][b]for[/b][/color] key [COLOR=#a52a2a][b]in[/b][/color] strings_found.keys():
      [COLOR=#a52a2a][b]if[/b][/color] key [COLOR=#a52a2a][b]in[/b][/color] line [COLOR=#a52a2a][b]and[/b][/color] [COLOR=#a52a2a][b]not[/b][/color] strings_found[key]:
        [COLOR=#008b8b]print[/color](line.rstrip())
        strings_found[key] = [COLOR=#008b8b]True[/color]
  [COLOR=#0000ff]#[/color]
  txt_file.close
  [COLOR=#008b8b]print[/color]([COLOR=#ff00ff]"[/color][COLOR=#ff00ff]*** Done.[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color])

[COLOR=#0000ff]# -- main program----[/color]
[COLOR=#a52a2a][b]if[/b][/color] __name__ == [COLOR=#ff00ff]"[/color][COLOR=#ff00ff]__main__[/color][COLOR=#ff00ff]"[/color]:
  cmd = [COLOR=#ff00ff]'[/color][COLOR=#ff00ff]find $PWD -name "*.txt" | xargs grep -l "Run Time"[/color][COLOR=#ff00ff]'[/color]
  file_names = os.popen(cmd)
  [COLOR=#a52a2a][b]for[/b][/color] file_name [COLOR=#a52a2a][b]in[/b][/color] file_names:
    process_file(file_name.rstrip())

Running and output:
Code:
$ python3 tester_v.py
*** Processing /home/mikrom/Work/Python3/process_files/file_01.txt:
," Run Time
,” Start Time
,” End Time
Test_ID:
Test Program Name:
Product:
*** Done.

*** Processing /home/mikrom/Work/Python3/process_files/dirname_01/file_04.txt:
," Run Time
,” Start Time
,” End Time
Test_ID:
Test Program Name:
Product:
*** Done.

*** Processing /home/mikrom/Work/Python3/process_files/file_03.txt:
," Run Time
,” Start Time
,” End Time
Test_ID:
Test Program Name:
Product:
*** Done.
 
Thank you! The code you wrote looks amazing!
And I'm sure it works great but it is too complicated.
Is there any other way? a simpler way to do this?
I need to incorporate your snippet into my code and I cannot do that, it is too complicated.
I really appreciate your help, thank you again!
 
And what is in your opinion complicated ?
Using the command with find and grep to select only those files which contain the strings you want to process, or using the dictionary to mark the strings found in file ?
The best way would be, if you take the program apart, step by step and print some variables to see how it works, then you will see that it is not complicated.
 
You guys trying to help and I see it and appreciate it but I'm not on the same level as you in Python.
That is why obvious things for you look very complicated to me and I'm trying to do everything in very simple way.
Here is how I would try to test a file if it has a "Run Time" line in it and it works, kind of.
In a directory 'C:\\02\\en15\\TST' I'm skinning I have 4 files:
Debug_1.log
Debug_2.log
Debug_3_NO_Run Time
Debug_3_NO_Run Time
First two Debug log files have the "Run Time" line and the last two do not have the lines. When I execute my script with "else" block disabled, the script prints correctly:

<_io.TextIOWrapper name='C:\\02\\en15\\TST\\Debug_1.log' mode='r' encoding='cp1252'>
<_io.TextIOWrapper name='C:\\02\\en15\\TST\\Debug_2.log' mode='r' encoding='cp1252'>
>>>

When I enable the "else" block the script produces an error for some reason.
else:
^
IndentationError: expected an indented block
>>>



Python:
import os

runtime_l = ',"  Run  Time'

given_path = 'C:\\02\\en15\\TST'
for filename in os.listdir(given_path):
    filepath = os.path.join(given_path, filename)
    if os.path.isfile(filepath):
        #print("File Name:   ", filename) 
        #print("File Name\\Path:", filepath) 
        with open(filepath) as mfile:        
            for line in mfile:
                if runtime_l in line:
                    # do something with the line
                    print(mfile)
                 
                #else:                    
                #    print(mfile)
 
Hi Tester_V,
The code you posted works with me with else too. As your error message said, you must have an IndentationError near of the else. Maybe you mixed tabs and spaces for indentation.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top