Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can anyone helpme turn a gibrish string into human language?

Status
Not open for further replies.

lupidol

Programmer
Apr 23, 2008
125
0
0
IL
Hi everyon,
Here is my text file :
100055,䪰䬬񺐬054-4889012,,,򥮸 (south),并쬲8,78390,,,erans@hotmail,
102958,򮩬,񺙾󜀸-6418346,052-3964300,,⡸ 񡲠(south),⪺-񠯬35/2,84812,,,amiliv@hotmail,
Here is my code:
Code:
<?php
	$myText = 'test_gibrish.txt';
	$myHandle = fopen($myText,'r');
	$myRead = fread($myHandle, filesize($myText));
	$enc = mb_detect_encoding($myRead, "UTF-8,ISO-8859-1");
	$textArr = explode(',',$myRead);
	$str = $textArr[15];
	echo $str."<br>";
	echo iconv($enc, "ISO-8859-1", $str)."<br />";
	echo iconv($enc, "utf-8", $str)."<br />";
	echo iconv($enc, "utf-8", $str)."<br />";
	echo iconv($enc, "ISO-8859-1//TRANSLIT", $str), PHP_EOL;
	echo iconv($enc, "ISO-8859-1//IGNORE", $str), PHP_EOL;
	echo iconv($enc, "ISO-8859-1", $str), PHP_EOL;
	echo $str.'<br>';
?>
My output shows either gibrish or errors as shown in the attachment
Any advise how to get rid of gibrish from my text file?
Thanks
 
 http://files.engineering.com/getfile.aspx?folder=c8545d74-f3ee-42b0-96ee-8a0f200238d2&file=to_sitepoint.jpg
If it is Unicode gibberish (or Chinglish) to start with, ... the chances are that it WILL be ASCII/Unicode gibberish no matter what you try to do with it.

Is it something vital?

is it UTF-8?
or
Is it UTF-16 and missing the Byte Order Marker?






Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum
 
Looking at the string, it seems to be comma separated values with some special characters that got lost somewhere along the way and are now just ???? (question marks).

What exactly produced this?

If it lost some data at some point, there will be no way to get coherent data out of this.

You may need to alter whatever is producing this output to preserve the special characters.



----------------------------------
Phil AKA Vacunita
----------------------------------
Ignorance is not necessarily Bliss, case in point:
Unknown has caused an Unknown Error on Unknown and must be shutdown to prevent damage to Unknown.

Web & Tech
 
the difficulty with deciphering this is that there has already been a broken transformation to utf8 by both the delivery of this page to the browser but also the uploading of the text to the TT servers and probably the storing of the data in the database.

if you get us the actual original text as received by you; and you can tell us the charset of the method used to send you the data, and that of the page (if via web) then there's a fighting chance of recasting to the original chinese characters.

once you have the bytes, split them out into the values between the commas and then attempt a decode. something like this might work

Code:
echo iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-16", $text);

or if you don't support UTF16 try UTF8 but that may not be rich enough to show the chars anyway,
 
This is rather important to me.
It is a backup from SQLServer2005 database I exported to a text file 6 years ago on a XP platform.
Now I want to transfer it into a MySql database but some text became giberish in a WIN 7 platform or in a different machine.
Some thing important about that text is that if I copy a giberish string from the text file and paste it directly into the code it does show me the original string !!!
 
Does it look okay in the original text file.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum
 
The original file is OK except the gibberish fields.
At the beginning, if I copied/pasted the gibberish strings from the text file into the code page, I recieved the desired string but when I read it directly from the text file it remained gibberish.
I changed "encoding" of both code and text file and now copy/paste doesn't work as either.
 
probably you messed up harmonising the encoding of the connection between the application and the database and the table at the moment that you wrote the data to the table. unless you get them all the same, and all dense enough to take the intended content, you've probably lost the data.
 
lupidol said:
Some thing important about that text is that if I copy a giberish string from the text file and paste it directly into the code it does show me the original string !!!

In what program are you viewing the gibberish? Try a different text editor.

lupidol said:
...if I copied/pasted the gibberish strings from the text file into the code page, I recieved the desired string but when I read it directly from the text file it remained gibberish.

Can't you just copy everything and save as a new file?
 
To simplify, I copied one string: 'àôøéí' into a new text file which I named: "xxx.txt" and saved as "utf-8".
Then I copied/pasted the string into the following code:
PHP:
<?php
	$myText = 'xxx.txt';
	$myHandle = fopen($myText,'r');
	$myRead = fread($myHandle, filesize('xxx.txt'));
	$enc = mb_detect_encoding($myRead, "UTF-8,ISO-8859-1");
	$textArr = explode(',',$myRead);
	$str = $textArr[0];
	
	echo iconv("UTF-8", "ISO-8859-1", "àôøéí")."<br />";
	echo iconv($enc, "ISO-8859-1", $str)."<br />";
	echo iconv($enc, "utf-8", $str)."<br />";
	echo iconv($enc, "ISO-8859-1//TRANSLIT", 'àôøéí'), PHP_EOL;
	echo iconv($enc, "ISO-8859-1//IGNORE", $str), PHP_EOL;
	echo iconv($enc, "ISO-8859-1", $str), PHP_EOL;
	echo iconv($enc, "ISO-8859-1", 'àôøéí')."<br />";
	echo iconv($enc, "UTF-8", 'àôøéí')."<br />";
	echo iconv($enc, "ISO-8859-1//TRANSLIT", 'àôøéí')."<br />", PHP_EOL;
	echo iconv($enc, "UTF-8", 'àôøéí')."<br />", PHP_EOL;
	echo iconv('UTF-8', "ISO-8859-1", 'àôøéí')."<br />", PHP_EOL;
?>
The result I got is as shown in the attaches screenshot. Thank you !
 
 http://files.engineering.com/getfile.aspx?folder=83bf6ccd-babb-4536-a731-e19f61405e7f&file=tek-tips.jpg
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top