Varchar 1

gregsimpson · Feb 28, 2006

Does anyone know how the internal structure of Varchard works on LUW V8.2. I have someome proposing a VARCHAR(6) for a column which is predominantly empty. Essentially I am wondering how many bytes an empty VARCHAR takes up. Does that make sense?

Cheers
Greg

Hattusas · Mar 2, 2006

Hmm.There must be more sophisticated methods for that but..
If I were you I'd try this..
I would create an empty tablespace managed by system and would note its initial size.
Then create a table of 2 columns one is a fixed type (integer,character etc) and a column of varchar(6) with cursor or a program I'd insert a reasonable amount of rows.
100 rows,1.000 rows,10.000 rows etc.
(Varchar(6) will remain null of course)..

After this operation, I 'd check the size of the tablespace again. The difference'd give me an idea I think.

Salih Sipahi
Software Engineer.
City of Istanbul Turkey
s.sipahi@sahinlerholding.com.tr

MarcLodge · Mar 2, 2006

Greg,
I'm not sure about your specific platform, so I'll talk generically. I've always been under the impression that a VARCHAR defined as NOT NULL will, if given a length of 0, will take up no room at all. In other words, if you give it a length of 0, it will become NULL.

I've never tested this out, and it would be interesting to know the results if you do.

Marc

gregsimpson · Mar 2, 2006

Marc,

for this query, I'm on AIX. However I also think it applies in a mainframe environment. I believe there is an overhead, of approx a couple of bytes. These bytes are used internally to hold the length of the data in the field etc. Else how's the database to know whether there's data in there or not for instance and also how long it is. I don't see otherwise how it could work out from the physical location for each row on the disk, without having some sort of indicator? Surely it needs to know where a column starts and ends, so how does it do this without some sort of reserved bytes to signify how the row containing varchars is constructed?

Greg

MarcLodge · Mar 2, 2006

That's a good question Greg, and leads me to wonder how DB2 treats NULL fields. I was under the impression that NULL and VARCHAR length 0 were dealt with by DB2 as one and the same thing, so I think I will have to do some reading up on it. Does a field with a NULL value take up any space in the table? Hmmm.... food for thought. I'll try to find some time over the next couple of days to investigate and will get back to you. If you find anything out, let me know.......

Marc

gregsimpson · Mar 2, 2006

Marc,

I think with NULL, there's an indicator overhead on each field to say whether it's null or not. I've found the following after many hours of google as regards VARCHAR.

Choosing CHAR or VARCHAR: VARCHAR
saves DASD space, but costs a 2-byte
overhead for each value and the additional processing required for
varying-length records. Thus, CHAR is preferable to VARCHAR,
unless the space saved by the use of VARCHAR is significant. The
savings are not significant if the maximum length is small or the
lengths of the values do not have a significant variation. In
general, do not define a column as VARCHAR

unless n is at least
18.

Cheers
Greg

MarcLodge · Mar 2, 2006

Greg,
You're right with regards NULL. I've just had a look on Mullins and, to quote from the article at

http://www.craigsmullins.com/bp7.htm:

Let’s take a moment to clear up a common misunderstanding right here: nulls NEVER save storage space in DB2 for OS/390 and z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not.

DB2 for Linux, Unix, and Windows has a compression option that allows columns set to null to save space. Using this option causes DB2 to eliminate the unused space from a row where columns are set to null. This option is not available on the mainframe, though

I'll keep digging regards VARCHAR and let you know what I find.

Marc

MarcLodge · Mar 21, 2006

Greg,
I dug around some more, and was unable to come up with any further help, so I decided to take the bull by the horns, and dropped an email to Craig Mullins. I informed him of the general nature of the question and quoted your 'do not define a column as varchar unless at least 18' find above, and this was his reply:

I think I’ve seen that guidance (at least 18) before (either on DB2-L or elsewhere). It seems like a reasonable guideline but I would hesitate to make it a strict rule. I think the basic premise that the column should be large enough to be able to solicit at least some savings is sensible… and 18 is as good as any number to get to that target (it would be hard to argue forcibly though whether it is better than 20 or even 17). And it might make some sense to play up the variation angle. Even if the text is at least 18 bytes it wouldn’t make sense to make it variable unless the actual values varied from a few bytes in length up to 18. For example, if 90% of the actual values are 15 bytes or more, I wouldn’t use VARCHAR.

Another thing you might consider adding is to use DB2 compression instead of variable length columns (in some cases). With data compression built-in to DB2 now - and operating very efficiently - compression is frequently a better choice for conserving disk space than variable length columns. This is so because DB2 compresses automatically - behind the scenes. With variable length columns a two-byte prefix is required for each column to indicate the actual length of the data in the variable length column. This prefix must be set and maintained by application code - this is not necessary with compression. Also, DB2 compression will compress the entire row – possibly eliciting extra savings, whereas VARCHAR potentially saves space only for that specific column. However, the overhead of compression in some cases may be problematic. DB2 must incur CPU in order to compress and de-compress the data as it is modified and read. However, I/O benefits can outweigh the additional CPU because more data will be stored per page due to the smaller size of the rows.

Hope this is of some use.

Marc

gregsimpson · Mar 22, 2006

Marc,

your sheer tenacious effort alone, deserves a STAR. Thanks you very much for taking the time to speak with the main man.

Thanks Again
Greg

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Varchar 1

gregsimpson

Programmer

Hattusas

Programmer

MarcLodge

Programmer

gregsimpson

Programmer

MarcLodge

Programmer

gregsimpson

Programmer

MarcLodge

Programmer

MarcLodge

Programmer

gregsimpson

Programmer

Similar threads

Part and Inventory Search

Sponsor