Data types and sizes

Andrzejek · May 3, 2022

Moving from Oracle to SQL Server. I took some on-line classes to know more about SQL Server. Looks to me in setting up a table I should be using the smallest data type possible to do what I need. Makes sense, like with most other programming, right? So, if I have a table that I will not exceed 100 records (number of Counties in a state), I should be using [tt]tinyint[/tt] (0 to 255) or when setting a field with the County Name, the most I will ever need is 20 characters, so [tt]Varchar(20)[/tt] will do just fine.

But most of the examples I see in 'real world' is: set your number (PK field) data type to [tt]Int[/tt] (or, if you need bigger numbers, use [tt]Decimal[/tt]), for text just use [tt]Varchar(max)[/tt]. That will cover what you need and a lot more. SQL Server will use what it needs when it needs.

What’s the ‘right’ approach?

---- Andy

"Hmm...they have the internet on computers now"--Homer Simpson

Mike Lewis · May 4, 2022

Andy,

I'm not the world's greatest expert on SQL Server, but I completely agree with your approach. Given that the table will never exceed 100 rows, a tinyint is the correct choice for this particular numeric field. And if - however unlikely - the table grows to more than 255 rows, it would be the work of a moment to increase the key to an int, without having to change any code.

However, with your County Name, there would be no harm in setting the size slightly larger, given that it is a varchar. Varchar columns only take up the amount of space of the actual data, so there is no penalty in setting the size larger than you are likely to need.

Note that there is a two byte overhead with varchars compared to chars, but that doesn't outweigh the benefit of the variable length of data.

At least, that is my understanding. If I am wrong, someone will correct me.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Chris Miller · May 4, 2022

I can't confirm your observation about varchar(max), there is a big difference in storage. I do see and use short varchar or ncvarchar especially in short size ranges. I learned somewhere the benefit of variable length only starts at 8, so surely 20 works already and neither needs varchar(MAX) not char(20).

There is a detail to learn about storage of records exceeding page size, for example see

https://www.microsoftpressstore.com/articles/article.aspx?p=2225060

or

https://aboutsqlserver.com/2013/11/05/sql-server-storage-engine-lob-storage/

I'd think about MAX for 8000, to be honest, as I don't see a case for such almost page sized fields, but as you can see several so large fields don't go straight to the LOB, they can be stored in row_overflow pages. It still is more I/O to get a long record from several pages.

Regarding tinyint, I think I never used that, which says I'm okay with wasting 3 bytes and don't care for the existence of small numeric types. But you're right about that. Identity columns, for example, can also be tinyint and other numeric field types. I'd go for uniqueidentifier with merge replication in mind and then caring for 3 byte also seems awkward. I am aware there were other times and each byte used means a factor of bytes necessary in backups in, for example, a grandfather/father/son backup scheme, but that's taken more care of by compression than choice of smallest data type. I'm very fine to think about size of varchars as necessary, I'd not truncate them to the longest value I start with, give it some room, especially in varchar.

I think using large object storage only for cases they are necessary is more important, as it's not just about saving storage space, but about read access performance. Tinyints could be interesting, if there are a bunch of them and they help reducing a record size to fit into the normal page type.

And as genral tip about this topic, look for a decent optimization book by rank and reviews, as column sizes or record composition of course are just one tiny bit of what to care for in database optimization concerns. I'd first tackle the topic how indexes are stored and used, which automatically also covers column sizes, as they matter to indexes, too.

Chriss

Andrzejek · May 4, 2022

Thank you guys, I really appreciate your input.
When I said: County Name is the most 20 characters, so Varchar(20) - I always leave a little (or more than just a 'little') wiggle room and I do set them up bigger. Better safe than sorry.
And the use of tinyint as a PK field in look-up tables, that would make the FK fields in other tables also tinyint (right?) to match the data type. So, if for any reason I would have to increase the size in a look-up table, I would have to find and increase the FKs in other tables as well. Is that correct?
If it is so, I can understand why people use int and forget about it, even if it is way too large for that particular instance at this time.

As for optimization - over the years the rules have changed and what used to be a 'no-no' just a few years ago, nowdays nobody cares because the DB engines become a lot more 'sophisticated' and take care of a lot of those 'no-no's from the past. But still, it is good to organize your DB with the best optimization you can afford.

---- Andy

"Hmm...they have the internet on computers now"--Homer Simpson

Chris Miller · May 4, 2022

Right, the same type PK FK is another reason to use uniqueidentifier and don't need to care anymore. You could at some time think about newsequentialid() vs newid().

Chriss

Mike Lewis · May 4, 2022

Chris said:
I learned somewhere the benefit of variable length only starts at 8,

I don't think you can make that generalisation. The benefit of variable length must surely depend on the degree of variability. If the fields was four characters wide in 99 rows, and eight characters wide in the 100th row, then there would be a benefit in making it a varchar. But if it was 50 x 8 chars and 50 x 9 chars, then a char would be better than a varchar (if I have done the sums correctly).

Andy said:
And the use of tinyint as a PK field in look-up tables, that would make the FK fields in other tables also tinyint (right?) to match the data type. So, if for any reason I would have to increase the size in a look-up table, I would have to find and increase the FKs in other tables as well. Is that correct?

That's a good point. But my point was that it would be easy to change the data types retrospectively. Doing that wouldn't affect any client processes or applications.

Andy said:
As for optimization - over the years the rules have changed

I don't know about "rules". What has not changed is the benefit of reducing the width of each row as much as possible, with the aim of storing more rows in each physical page, which in turn means less time to read and write the pages. That will surely always be an important consideration.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Mike Lewis · May 4, 2022

Sorry to disagree, Chris. There are of course benefits in using a uniqueidentifier data type, but you have to weigh those against the fact that it will take 16 bytes, compared to one byte for a tinyint - which is relevant to my above point re maximising the number of rows stored in a physical page.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

pjw001 · May 4, 2022

I would be fairly wary of trying to keep the size of fields to the base minimum. If there is any code supporting the database you may find increasing field size more trouble than you expected.

You don't say which country you are dealing with, but there are a number of US states which seem to have more than 100 counties. (See

https://en.wikipedia.org/wiki/List_of_counties_by_U.S._state_and_territory.)

Also, depending on what exactly you mean by a county you may find that some name are longer that 20 characters. (See

https://en.wikipedia.org/wiki/List_of_United_States_counties_and_county_equivalents.)

Andrzejek · May 4, 2022

Good point pjw001,
But no matter which US State, tinyint (0 to 255) could even cover Texas with its 254 Counties according to this place, page 11 (Coincidence...? [upsidedown]

)

And County Name of 20 characters was overkill anyway. Where I live, the longest County is Pottawattamie (13 chars), but in Pennsylvania and Virginia there are Northumberland County (14 chars)

---- Andy

"Hmm...they have the internet on computers now"--Homer Simpson

Chris Miller · May 4, 2022

Mike, the comment about 8 is about a break even point in overall bytes used of varchar vs char. Each value in a varchar field has a length that's also stored and while bytes used for that don't count towards net data bytes, they take space.

There is a break even when even needing to waste bytes for trailing spaces in char is less than what varchar needs. And what I remember is that limit is 8, so you better use char

for n below 8, no matter if many of the values you store there are only a single char or they mostly use the full 7 characters, you won't save with varchar(7), despite time to not trim result vallues.

Chriss

Chris Miller · May 4, 2022

According to this the break even is at 4 chars as there are 2 bytes used for length:

https://www.tektutorialshub.com/sql-server/char-varchar-data-types-in-sql-server/

I'd like to see a more official description in SQL books online or such documentation, because for varchar(255) or shorter the length could also be stored in 1 byte. There is no need for 3 or more length bytes, but I still remember 8 as break even point, not 4 already. That would be based on further additional meta data to store. What I known from the top of my head is that all variable length fields are stored at the end of a record, there must be some information about which part of that is for which field, and that could add to the lengths bytes needed.

I rarely have the need for short char fields other than perhaps precisely char(1), so I tend to use varchar a lot more commonly than char. 4 or 8, lengths start at 10 perhaps mostly for me. I also see MySQL developers proposing to only use varchar(255) as in MySQL that's the limit anyway and length occupies 1 byte, otherwise the real length determines bytes necessary, so there is no point in limiting it down. Other sources say there is space reserved for sorting and that estimates 50% of varchar length. Which would make a difference, too, that's not occurring in storage already.

Chriss

Andrzejek · May 4, 2022

Chriss said:
I rarely have the need for short char fields other than perhaps precisely char(1)

Could you elaborate? What do you use such a field for?
'Middle Initial' comes to mind, but I would rather keep 'Middle Name' and if I need just an initial, I would just grab first letter from 'Middle Name' field.

The DB I work with (Oracle not set-up my me) uses Char(1) for 'Y' and 'N' data (for 'Yes' and 'No', True/False), which drives me up the wall. I would just use Number(1) since Oracle does not have BIT (or Boolean) which what I would use in these cases.

---- Andy

"Hmm...they have the internet on computers now"--Homer Simpson

Chris Miller · May 4, 2022

Char(1) usage example is any kind of state that's associated with a letter and therefore has more options than just a bit, which, if I remember correctly also takes 1 byte in MSSQL.
You could have a foreign key to a short table for it, too, but then this also often is printed, becomes part of a barcode (QR) or such things.

Chriss

Andrzejek · May 4, 2022

This kind of corresponds (I think) to what I have now: it is a Status of the record:[tt]
A - Active
N - New
C - Cancelled
... Etc.[/tt]
Which limits to A-Z (and maybe a-z), but I would rather use Numbers instead. No limits. Users do not see those single letters anyway, they just want to see the spelled-out Status.

---- Andy

"Hmm...they have the internet on computers now"--Homer Simpson

Mike Lewis · May 4, 2022

Chris, the page that you referenced (at TekTutorialHub) appears to support my argument. I don't see where it specifies a break-even of 4 chars.

But it also says:

Performance [of a varchar] is slower compared to the char as the SQL Server needs to calculate the size of the string

I hadn't taken in that point. But I wonder if any loss of performance wouldn't be more than offset by the benefit of an increase in the number of rows per page.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Mike Lewis · May 4, 2022

Char(1) usage example is any kind of state that's associated with a letter and therefore has more options than just a bit, which, if I remember correctly also takes 1 byte in MSSQL.

If I remember rightly a bit data type takes one bit, but up to eight of them are packed into a byte. So, if you have just one bit field in a record, it would need a full byte. But if you have eight bit fields, they would together still take a single byte.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Chris Miller · May 4, 2022

Mike,

this section:

tektutorialshub said:
Which one to Use
If String is always going to contain full Length like country code, Zip Codes, etc. then use char data type.

[highlight #FCE94F]If the string length is less than or equal to 4, it is better to use char[/highlight]. This is because the varchar will always add extra 2 bytes irrespective of its size.

Well, and again, I remember 8 as break even, though this alone does only justify 4.

PS: You're right about the bitfield, and surely I'd never use Y/N in char(1) to replace bit, even if it is the only bit field and would also take 1 byte.

Chriss

Chris Miller · May 7, 2022

Andrzejek said:
Users do not see those single letters anyway, they just want to see the spelled-out Status.

You overlooked:

myself said:
but then this also often [highlight #FCE94F]is printed[/highlight], becomes part of a barcode (QR) or such things.

It may not tell users much, but to those defining the rule for serial numbers or product codes the letters have meaning to them.

Chriss

gmmastros · May 10, 2022

As always... it depends.

What follows is my opinion. It is only opinion, which means you are free to ignore most or all of it.

As for primary keys, Int versus TinyInt.
I would never use anything less than an Int for a primary key. There are times when I would consider BigInt or UniqueIdentifier, but my go-to is usually int. If you have a table with less than 255 rows, and you are worried about optimizing the data storage, then I would easily argue that you are wasting your time. Most likely, SQL server will cache this data in RAM and access it from there. It's a tiny table and easily stored.

I do use TinyInt, SmallInt, and lots of other data types, but only for actual data. For example, "speed limit" in my database is stored in miles per hour and is a tiny int.

Regarding char versus varchar.
I kinda detest the char type because it is "space padded". What I mean is, if you have a char(8) column, with the value in one row of "Yes", SQL will actually store Y-e-s-(space)-(space)-(space)-(space)-(space). When you retrieve this data into your front end, you will have the spaces on the end, which is easy enough to accommodate with a trim, but there will likely be a bunch of places in code where you need to do this. For example, I have a license plate column in one of my tables, it is defined as "[License] [varchar](10) NULL". Most of the data has 7 characters. Do I care? Not even a little bit.

It's true that you want as many rows to fit on a data page as possible because there are speed benefits to doing so. SQL Server stores data on disk as 8k pages. Each page is always read in its entirety and written in its entirety. Therefore, the more rows you can fit on a data page, the better.

For practical purposes, I don't really try to optimize small things. I only worry about tables with more than about 100,000 rows. And in those circumstances, the optimizations usually revolve around indexes, not data types. In my 25 years of owning my own business, with a niche app, I've gone through several iterations of making fields bigger. Every time has been a pain. Best to avoid this if possible. For example, in version 1, I had "student's homeroom assignment" as a varchar(4). I needed to expand this to varchar(10) several years later. This required me to distribute a new application because the text box had a maxlength hardcoded in it.

I strongly suggest you limit the number of varchar(max) data types. There are times when you need them, and therefore should use them when appropriate. Licese Plate would not be an appropriate use for varchar(max). I do have several columns in several tables with the varchar(max) data type, but they are generally used for comment columns. I also use varchar(max) for driving directions from one stop to another. Only use this data type if you are expecting a very large amount of text. Varchar(max) columns are generally a lot slower than varchar(size) columns. There are technical reasons for this based on the way SQL Server stores the data on disk. Suffice to say, only use them when you need to.

PS. There are some county names greater than 20 characters. In the US, I see some county names up to 24 characters. There's a "SAINT JOHN THE BAPTIST" county in Louisiana and several in Alaska. In Canada, I see longer ones.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Data types and sizes

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor