Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extra Spaces in Spool File 4

Status
Not open for further replies.

johnnybee

Programmer
Dec 16, 2002
24
GB
I am currently extracting data into a file using the following script:

Code:
set linesize 1180
set pagesize 0
set head off
set term off
set wrap off

column CONTACT_ID format 9999999999
column FIRST_NAME format a78
column LAST_NAME format a78
column JOB_TITLE format a78
column ADDRESS_LINE_1 format a78
column ADDRESS_LINE_2 format a78
column ADDRESS_LINE_3 format a78
column ADDRESS_LINE_4 format a78
column CITY format a78
column STATE format a78
column POSTAL_CODE format a78
column COUNTRY format a78
column LAST_SOURCE_SYSTEM format a78
column COMPANY_NAME format a78
column COUNTY format a78
column NVL(STATE,COUNTRY) format a78

spool uk_contacts

select * from
(select
a.CONTACT_ID,
a.FIRST_NAME,
a.LAST_NAME,
a.JOB_TITLE,
ADDRESS_LINE_1,
ADDRESS_LINE_2,
ADDRESS_LINE_3,
ADDRESS_LINE_4,
CITY,
NVL(STATE,COUNTY),
ZIP_POSTAL_CODE,
COUNTRY,
a.LAST_SOURCE_SYSTEM,
COMPANY_NAME,
COUNTY
from cccadm.gcr_contacts_ss a, cccadm.gcr_postal_ss b
where a.contact_id = b.contact_id
and country_of_origin = 'UK')
where rownum < 2000;

spool off
set term on
set wrap on

All of the fields are VARCHAR2, except for CONTACT_ID which is a NUMBER field.

The database is using UTF8 for NLS_CHARACTERSET.

A record contains the following in the ADDRESS_LINE_1 field:

VIA DEI SALICI N°38

This record is one character longer than all the other records (despite the 'linesize' setting). It seems to have inserted a space in the ADDRESS_LINE_1 field. When I extract a greater number of records I notice that this happens for all records with diacritical characters in them.

I will be using the spooled file to import into parsing and cleansing software (Trillium) which requires a fixed length of record. These extra spaces are therefore causing this import to fail.

Any ideas?

Thanks.
 
1) Check your NLS_LANG characterset is set correctly (vg. WE8ISO8859P15 for Europe) Remember that the file may be created correctly, but when 'viewed' with an editor or other utility, characters may be translated incorrectly.

2) If you think the address is overflowing, then either lengthen the output field or truncate it:

column ADDRESS_LINE_1 format a80
-- or --
column ADDRESS_LINE_1 format a78 tru



----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
Thanks for the reply, however:

1) I think the NLS setting is correct as UTF8. Here are the settings anyway:

PARAMETER VALUE

NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET UTF8
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS CHAR
NLS_NCHAR_CONV_EXCP FALSE

I've looked at the data in a few different editors, including: Unipad, Textpad and Notepad. All seem to show the characters correctly (identically to how they are displayed in SQL*Plus), and all have the extra space.

2) I tried both changing the length of the columns and truncating them using "tru". Neither made any difference.

Any further suggestions?



 
1) I was refering to the NLS_LANG environment variable which is set at the client level and controls the presentation on the particular client, even exports and imports!
In W2k HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE KEY=NLS_LANG...
In Unix: export NLS_LANG=TTTTT_LLLL.CCCCCC


----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
I've checked the NLS_LANG environment variable in the registry and it is: AMERICAN_AMERICA.WE8ISO8859P1

This character set includes both of the unusual characters  and °, which means this is the correct setting.

This means unfortunately my problem is not resolved.

 
Hi.
You could check the output of a dump of that field, like:
Code:
select dump(ADDRESS_LINE_1) 
from table_containing_column
where row=causes_problem;
and compare it with the result of
Code:
select dump('VIA DEI SALICI N°38') from dual;
This may reveal hidden characters.


Stefan
 
No differences I am afraid, as the following shows:

select dump(address_line_1)
rom cccadm.gcr_postal_ss
where contact_id = 15942373

DUMP(ADDRESS_LINE_1)
--------------------
Typ=1 Len=20: 86,73,65,32,68,69,73,32,83,65,76,73,67,73,32,78,194,176,51,56

select dump('VIA DEI SALICI N°38') from dual;

DUMP('VIADEISALICIN°38')
-------------------------
Typ=96 Len=20: 86,73,65,32,68,69,73,32,83,65,76,73,67,73,32,78,194,176,51,56

 
Hello,
I think the problem is with your NLS_LANG environment variable, as LKBwrnDBA suggested above:
Your frontend setting is AMERICAN_AMERICA.WE8ISO8859P1,
and that means that each character is represented by exactly one byte, and vice versa. The two bytes 194,176 (decimal) obviously mean ° .
In your database you use unicode, UTF8. And this means that each character may be represented by one ore more bytes. The two bytes 194,176 (decimal) together mean one single character. (But don't ask me, what language!)
As a short rule, in unicode, bytes greater than 128 are not a simple character, but part of a character.
So try AMERICAN_AMERICA.UTF8.
hth
 
I don't think you can set NLS_LANG environment variable to AMERICAN_AMERICA.UTF8.

Here is a quote from this Oracle Documentation

The NLS_LANG character set should reflect the setting of the operating system client. For example, if the database character set is UTF8 and the client has a Windows operating system, you should not set UTF8 as the client character set because there are no UTF8 WIN32 clients. Instead the NLS_LANG setting should reflect the code page of the client.
 
Tried changing it anyway to AMERICAN_AMERICA.UTF8 , it seems to run OK, but still has the extra space.
 
Hm, ...,

I won't argue Oracle's documentation. When they say that there are no UTF8 WIN32 clients, your new setting will perhaps not do what it should do.
My answer was inspired by our oracle client for unix, we have set NLS_LANG to AMERICAN_AMERICA.UTF8.
But I don't know much about WIN, sorry.

But to verify whether my diagnosis is correct, would you please start this query:
Code:
select dump(address_line_1), length(address_line_1)
from cccadm.gcr_postal_ss
where contact_id = 15942373

Imho the problem is not an extra space;
Imho what seems to be '°', "really" is one character only.
And I think 'N°' should be some well-known shorthand for 'Number', shouldn't it?

regards
 
I've already done DUMP(address_line_1) in an earlier post. However

Code:
select length(address_line_1)
from cccadm.gcr_postal_ss
where contact_id = 15942373

Gives...

LENGTH(ADDRESS_LINE_1)
----------------------
19

Which confirms that ° is only one character.

Not sure how this helps me... I guess I am going to have to look more closely at the Trillium software to see how it reads the text file.
 
I posted the NLS parameters above.

For NLS_LENGTH_SEMANTICS, it is CHAR.

 
I tried the following on my DB without much luck:
SELECT CONVERT('VIADEISALICIN°38', 'WE8ISO8859P1', 'UTF8') FROM DUAL;

Oracle documentation about the CONVERT function:

The funny thing is that even -
SELECT CONVERT('VIADEISALICIN°38', 'WE8ISO8859P1','WE8ISO8859P1') FROM DUAL;
takes me nowhere. It doesn't display the original string.

I do it via SQL Navigator 4.3.

Regards,
Dan
 
ddrillich, very interesting!
I tried the same in SQL*plus:
SELECT CONVERT('VIADEISALICIN°38', 'WE8ISO8859P1', 'UTF8') FROM DUAL;

And I got this:
CONVERT('VIADEISA
-----------------
VIADEISALICIN°38

This seems to be the correct output!
And I did some reading at
The WE8ISO8859P1 coding for ° is 176 (decimal),
whereas the UTF8 coding for ° is 194,176 (decimal).

hth
 
hoinz,

You are right! It looks great via SQL*Plus.

So, it seems that in johnnybee’s case, a UTF-8 value ended up in a VARCHAR2 column converted to WE8ISO8859P1 (also known as Latin-1).

johnnybee, if you want to hold UTF-8 data as is, then you should probably use the new Oracle datatype NVARCHAR2.

Regards,
Dan
 
This has indeed solved my problem. I can use the Convert function in the SQL and it does the trick. I will also talk to those with the necessary authority to look at changing the datatypes.

Thank you all for your help :)

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top