Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with File I/O

Status
Not open for further replies.

stormholder

IS-IT--Management
Jul 1, 2006
1
US
Hey all,

I think I'm having a pretty basic problem - just started using python today. The code reads as follows:

f=open("sortjuly.txt","r")
for line in f:
line_txt = line
page_txt = open(line_txt,"r").read()
text = open(page_txt,"r").read()
print(page_txt + text)

Basically "sortjuly.txt" is a directory listing about 1000 htm files I need to process. In each of the htm files is a 4-6 digit number. Basically I need to get an output that gives the file name and then the text from inside the file. For example:

hmd-dyster.htm contains the string '43921'.

I need the final output to read:

"hmd-dyster.htm 43921"

The error I keep getting with this code reads:

Traceback (most recent call last):
File "C:/Documents and Settings/chagin/Desktop/HardinMD/sort-script.py", line 5, in -toplevel-
page_txt = open(line_txt,"r").read()
IOError: [Errno 2] No such file or directory: 'hmd-about.htm\n'
>>>

For some reason the string "line_txt" keeps getting the "\n" tacked onto it and I can't figure out how to get it off. I tried printing the variable and it (the "\n") didn't show up.

Thanks for any assistance!

Charles.
 
Here is one solution to your problem. I modified the names in the program to make it easier for me to

follow. This solution is based on the fact the each line in a file is terminated by the newline character

("\n"). The test data I used is as follows:

HTMFileList.txt is the name of the file containing the list of HTM files with data:
HTMfile1.htm
HTMfile2.htm
HTMfile3.htm

HTMfile1.htm data is:
11111
HTMfile2.htm data is:
22222
HTMfile3.htm data is:
33333

The program code is:

HTMFileList=open("HTMFileList.txt","r")
for HTMfile in HTMFileList:
HTMfile = HTMfile[:-1] #to remove the newline character
HTMdata = open(HTMfile,"r").read()
print HTMfile + " contains " + HTMdata[:-1]

The program output is:

HTMfile1.htm contains 11111
HTMfile2.htm contains 22222
HTMfile3.htm contains 33333

I am also new to python and I hope this helps you in some way!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top