Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Edit a text file

Status
Not open for further replies.

hisham

IS-IT--Management
Nov 6, 2000
194
I have a text file :
-----------------------------











text1
some text gows here
















another text
some text goes here too
















etc ...
-------------------------------

I need to replace the White spaces in one character i.e
-----------------------------
#
text1
some text gows here
#
another text
some text goes here too
#
etc ...
-------------------------------
to be able to use the functions strpos and substr then I can add the content to database

Any suggestions?

Thanks in advance

 
You have to consider what whitspace chars are. It's not just the newlines, it's all visible and invisible spacing chars, such as tabs, carriage returns, line-feeds.
If you want to replace all empty lines I recommend a regular expression.
 
Thank you Vragabond and DRJ478 ,
but how can i determine all visible and invisible spacing chars using PHP?
 
Characters can be represented in regular expressions and PHP double quoted strings, e.g.
\t is a tab
\n is a newline
\r is a carriage return
\s in regex stands for any whitespace character

You can also use str_replace() but that would take several passes.

If you have the text file in an array (using file() to read it) you can use a foreach loop and examine the content of the line. If empty -> discard. Then write out the new, compacted file. If you need some code, let me know.
 
Thanks again DRJ478
I use this code to read series of HTML file then remove the HTML tags, JavaScript sections and white space. It will also convert some common HTML entities to their text equivalent.
i.e
start:
end:
Code:
<?php
include "mode.php";
if (!$submit){
?>
<form action="<?=$_SERVER['PHP_SELF']?>" method="post">
<input type="text" name="url">
<input type="text" name="cant">
<input type="submit" name="submit" value="submit">
</form>
<?
}else {
$rest = substr("$url", 0, -8); 

for ($i = 1; $i <= $cant; $i++) {
if ($i < 10) {
    $fname = $rest . "00$i.html";
} elseif ($i < 100) {
   $fname = $rest . "0$i.html";
}else 

$fname = $url . "$i.html";

$document = file ($fname);

$search = array ("'<script[^>]*?>.*?</script>'si",  
                 "'<[\/\!]*?[^<>]*?>'si",           
                 "'([\r\n])[\s]+'",                
                 "'&(quot|#34);'i",               
                 "'&(amp|#38);'i",
                 "'&(lt|#60);'i",
                 "'&(gt|#62);'i",
                 "'&(nbsp|#160);'i",
                 "'&(iexcl|#161);'i",
                 "'&(cent|#162);'i",
                 "'&(pound|#163);'i",
                 "'&(copy|#169);'i",
                 "'&#(\d+);'e");                  

$replace = array ("",
                  "",
                  "\\1",
                  "\"",
                  "&",
                  "<",
                  ">",
                  " ",
                  chr(161),
                  chr(162),
                  chr(163),
                  chr(169),
                  "chr(\\1)");

$text = preg_replace ($search, $replace, $document);
foreach ($text as $line_num => $mytext) {
echo "$mytext";
//here must go the database code
}
}
}
?>
but when i look to the source of the generated page i find the text as I describe before , and when I insert the data in Mysql database I find many empty records because of the white spaces generated.
have you any Idea to edit this code?
 
Although your regex is from the PHP manual, it doesn't do really safely. For example, if someone uses the HTNL comment inside a <script> tag to hide it from older browsers - this regex will fail.
What do the files you want to edit look like? Point to an example.
Suggestion:
Code:
...
foreach ($text as $line_num => $mytext) {
   # skip empty array elements
   if trim($mytext == '') continue;
   echo "$mytext";
//here must go the database code
}
...

You might also consider posting your code within the [ignore]
Code:
....
[/ignore] tags on this site. It will preserve indentation.
 
Here an example of the HTML files:

Code:
<html>
<head>

	<title></title>
	<link rel=stylesheet type="text/css" href="../text_style.css">
</head>

<body>



<h3>text1:</h3>
<p>some text goes here.
</p>
 

</body>
</html>
 
That makes me believe you just want the content within the <body> ... </body> section.
I would read the entire file into a string with file_get_contents and cut the body content using the regex.
Code:
$pattern = '/<body[^>]*?>(.*)<\/body/sie';
With the e parameter you can then perform any php expression in the replacement.
Have a look at the functions strip_tags() and options for preg_replace.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top