Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Number to Raw datatype conversion for DW 1

Status
Not open for further replies.

raygg

Technical User
Jun 14, 2000
397
US
I am picking up the work of another fellow engineering professional who assembled the initial data for a data warehouse, mapped out a dimensional dw and then created then initial fact and dimension tables. I am using oracle enterprise 11gR2. The person was new to oracle and used datatypes of NUMBER (without a length - defaulting to number(38) for dimension keys. The dw has 1 fact table and about 20 dimension tables at this point.

Before refining the dw further, I have to translate all these dimension tables and convert all columns of Number and Number(n) (where n=1-38) to raw datatype with a length. The goal is to compact the size of the dw database significantly. With only a few exceptions every number column is a dimension key or attribute.

The entire dw db is now sitting in a datapump dmp file. this has to be imported to the db instance and then somehow converted so all occurrences of a number datatype into raw datatypes. BTW, there are other datatypes present such as varchar2 and date.

I discovered that datapump cannot convert number to raw in an import or export, so the instance tables once loaded using impdp will be the starting point.

I found there is a utl_raw package delivered with oracle to facilitate using the raw datatype. This has a numbertoraw function. Never used it and am unsure how to incorporate this in the table conversions. I also hope to use OWB capabilities at some point but I have never used it and only know that it has a lot of analytical capabilities. As a preliminary step I have done partial imports and determined the max length of every number column so I can alter the present schema number columns tp be an apporpriate max length for each column in each table.

Right now I am not sure what the next step is. Any suggestions for the data conversion steps would be appreciated.
 
raygg,

you should not be attempting this counterproductive conversion. The oracle documentation clearly states that the RAW data type is deprecated, and only present for backwards compatibility.

shows
Code:
RAW and LONG RAW Datatypes

--------------------------------------------------------------------------------
Note: 
The LONG RAW datatype is provided for backward compatibility with existing applications. For new applications, use the BLOB and BFILE datatypes for large amounts of binary data.

--------------------------------------------------------------------------------
 

The RAW and LONG RAW datatypes are used for data that is not to be interpreted (not converted when moving data between different systems) by Oracle. These datatypes are intended for binary data or byte strings. For example, LONG RAW can be used to store graphics, sound, documents, or arrays of binary data. The interpretation depends on the use.

By converting from a current data type to raw you are doing the wrong thing.

Have you tried compression to reduce the size of the data warehouse?

Regards

T
 
I know about this and I tend to agree with you to a point.
For those who consult in the datawarehouse community RAW datatype is used a lot to substantially reduce storage space and possiubly increase performance. A primary goal of this project is to use as small a footprint for the db storage as possible. When you're talking multi-TB this gets critical and expensive.

I know of the backwards compatibility and the suggestion to use blob etc for large amount of binary data. These are not large amounts of binary data - we're talking 3-6 bytes per raw data column, although raw supports a data item length up to 2000 characters.

While raw is backwards compatible it will not soon be denigrated - I will guess not before Oracle 13 or 14 - 5-8 years away. Once that happens I am sure Oracle will add a datatype that will be even more efficient as the world of data warehouses is expanding at a very fast pace.

Once the number datatype items are converted to raw then they will only be used as dimension keys. I am told that a Number datatype without a specified length defaults to a fixed 38 bytes. With a length it is roughly 1 byte per character. Convert the Number(11) to Raw(11) and you save 5 bytes.

My problem is to get there. Any ideas on converting the tables?
 
raygg,

although I'm not answering your question, I still would like to know why compression is not being used. This will give an immediate significant benefit, with no changes needed to the database.

Regards

T
 
We will use compression also, not instead of. Again the goal is a small footprint. The compression will not improve the storage space consumed by raw usage but it will help reduce space used by varchar2 and maybe other datatypes.
 
raygg,

I'm glad to hear that you're intending to use compression. I encountered some resistance in my company from managers who thought they were an Oracle DBA. In the end, we scripted the change to compressed storage (with a reversal script if the need arose) and demonstrated the before and after sizes.

I would recommend that you do this before altering internal storage. If the compression gives enough of a saving, then there would be no need to do the conversion. That's a business decision, not a technical one. If the mass storage requirement is sufficiently reduced, stop right there, and do nothing more. It saves money, which seems to go down well with the bean counters.

As an aside, you mention that the compression won't help with raw usage - how did you determine this? From what I read in the on-line documentation, the compression algorithm is not published by Oracle - did you do some "before and after" tests, or what?

Regards

T
 
I know very little about compression as I really have not worked with it much except with zip. tar and gz files. This is more intuition than anything. THe raw items will be rather short - 2-5 or 6 bytes (allowing up to 11 digits - raw stores two digits to a byte I am told. These are mainly keys - I am not sure because I do not know the data whether there are any non-unique keys. But data compression does things like take repeating patterns and summarizes or compresses them with algorithms (if you have 500 repetitions of 'abc' then maybe the compression thingy can put a counter with a value 'abc' and save 499 occurrences if that's the way compression algorithms work. Its hard to compress 881 any further when expressed as x'1000 1000 0001 xxxx' where xxxx is probably a length or counter of some sort. But where the raw item looks like x'0000 0000 0010 xxxx' maybe there is some chance of compression due to the repeating '0' bits - I don't know.

I look at the raw items as relatively free of repeating patterns probably more because they are short - so such algorithms will not likely compress them further. I liken it to trying to zip a zip file - you ain't gunna get much compression. But with the varchar fields I know there are repeating patterns and embedded blanks etc so there I see an opportunity for compression to help there.
 
Raygg (and Thargy),

Perhaps I'm missing a salient issue here, but I understood that your consideration was to convert Oracle NUMBER to Oracle RAW for the purposes of saving space, is that correct?

In an Oracle NUMBER expression, Oracle stores only significant digits, and even then, it stores those significant digits two digits per byte. Examples:
Code:
select dump(100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000) x from dual;

X
------------------
Typ=2 Len=2: 242,2

Notice that the length of this 99-digit number is only 2 bytes.
Code:
select dump(12345678901234567890123456789012345678) x from dual;

X
--------------------------------------------------------------------------
Typ=2 Len=20: 211,13,35,57,79,91,13,35,57,79,91,13,35,57,79,91,13,35,57,79

Notice that the length of the 38-digit number, above, is only 20 bytes long.
Raygg said:
raw stores two digits to a byte I am told.
So, given your presumption of two digits per RAW byte, what are your savings by going RAW (besides not having to budget for clothing <grin>)?


[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”
 
Santa,

you just did elegantly what I failed to do, i.e. post example data. I was basically refusing to indulge in the (forgive me raygg) madness of deliberately moving to RAW storage.

I really must swot up on the internal storage mechanisms of the various data types. Your detailed knowledge shows itself once again.

Regards

T
 
@SantaMufasa
I agree that was an elegant way to make the point but there are some aspects of your demo I do not understand, although I duplicated your results perfectly on my pc. However I never ran across 'select dump' as you use it.

select dump(100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000) x from dual;


X
------------------
Typ=2 Len=2: 242,2

A few questions: Where or how can I verify your example is really using NUMBER datatype?

What does 242,2 mean?

I ran a couple of tests and it looks like the combination of characters that affects the length strongly.


SQL> select dump(10000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000) x from dual;

X
------------------
Typ=2 Len=2: 242,2

SQL>
SQL> select dump(10000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000001) x from dual;

X
-------------------
Typ=2 Len=2: 241,11

SQL>
SQL> select dump(19999900000000000000000000000000000000000000000000000000000000000000000000000
000000000000001) x from dual;

X
---------------------------
Typ=2 Len=4: 241,20,100,100

SQL> select dump(19999900000000000000000000000000000000000000000000000000000000000000000000000
00000000999991) x from dual;

X
-----------------------------
Typ=2 Len=5: 241,2,100,100,91

SQL>
SQL> select dump(11111111111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111) x from dual;

X
----------------------------------------------------------------------------
Typ=2 Len=21: 242,2,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12
So the same length item can vary from 2 bytes in your example to 21 bytes in mine depending on the repetition and combination of values.
SQL>




 
Good questions, Raygg. The Oracle DUMP function is one of ~320 functions (depending upon Oracle version) available in Oracle. (The more Oracle functions you know about, the more powerful Oracle becomes for you.) You can obtain a briefing and descriptions about each of the 320 functions by clicking on this link.

In the case of the DUMP function, you can specify as an argument to the DUMP function, any expression (e.g., literal, column name, embedded function, or any combination of these). The function produces a three-part result:[ul][li]Part 1: Oracle datatype. A numeric identifier that classifies the dumped data: 1=VARCHAR2 column data, 2=NUMBER, 96=Literal character, et cetera.[/li][li]Part 2: Length. The number of bytes that the dumped data consumes internally.[/li][li]Part 3: Internal representation. The internal contents of each byte (in internal-decimal representation, by default).[/li][/ul]For example, here is a DUMP of some literal characters:
Code:
select dump('ABCD') from dual;

DUMP('ABCD')
-------------------------
Typ=96 Len=4: 65,66,67,68

Notice the ASCII representation of the letters "ABCD"?
Here is a DUMP of two columns from an employee table:
Code:
select id,dump(id) "ID_Dumped", last_name,dump(last_name) "LAST_NAME_Dumped" from s_emp;

ID ID_Dumped            LAST_NAME    LAST_NAME_Dumped
-- -------------------- ------------ -------------------------------------------------------
 1 Typ=2 Len=2: 193,2   Jones        Typ=1 Len=5: 74,111,110,101,115
 2 Typ=2 Len=2: 193,3   Ngao         Typ=1 Len=4: 78,103,97,111
 3 Typ=2 Len=2: 193,4   Ubu          Typ=1 Len=3: 85,98,117
 4 Typ=2 Len=2: 193,5   Quick-To-See Typ=1 Len=12: 81,117,105,99,107,45,84,111,45,83,101,101
 5 Typ=2 Len=2: 193,6   Ropeburn     Typ=1 Len=8: 82,111,112,101,98,117,114,110
 6 Typ=2 Len=2: 193,7   Urguhart     Typ=1 Len=8: 85,114,103,117,104,97,114,116
 7 Typ=2 Len=2: 193,8   Menchu       Typ=1 Len=6: 77,101,110,99,104,117
 8 Typ=2 Len=2: 193,9   Biri         Typ=1 Len=4: 66,105,114,105
 9 Typ=2 Len=2: 193,10  Catchpole    Typ=1 Len=9: 67,97,116,99,104,112,111,108,101
10 Typ=2 Len=2: 193,11  Havel        Typ=1 Len=5: 72,97,118,101,108
If you have questions after reading the above links and after your own experimentation, we encourage you to ask here.

Regards,

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”
 
BTW, I failed to explain why some of your results in your earlier posting appear to be anomalous.

You posted this code:
Code:
SQL> select dump(19999900000000000000000000000000000000000000000000000000000000000000000000000
00000000999991) x from dual;

X
-----------------------------
Typ=2 Len=5: 241,2,100,100,91
For numeric values, Oracle stores a maximum of 40 digits of precision. That is why the DUMP ignores the least significant "...999991" values -- because they are beyond the most significant 40 digits of precision.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”
 
using your last example, SantaMafusa, what does this mean?
The only thing I understand is it's 5 bytes long.

-----------------------------
Typ=2 Len=5: 241,2,100,100,91
 
Raygg said:
...what does this mean?
[ul][li]Typ=2: Type 2 is Oracle's type identifier for a number expression.[/li][li]241,2,100,100,91: This is the decimal representation of those 5 bytes of Oracle's internal storage for the the exponential notation of "1.99999 x 10+91". (whew!) For a much more complete explanation, click on this link[/li][/ul]Please post again, Raygg, if the link does not answer your questions or if you have additional questions.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top