Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How is the data stored? 1

Status
Not open for further replies.

tdion

MIS
Dec 18, 2003
61
US
Something doesn't make sense to me about Access I want someone to explain it.

When a text field is set to 50 and a record only has 2 characters, people in this forum have said that only 2 characters will be stored (and the 48 unused will be ignored and NOT stored.)

What happens when an update is made and the 2 character entry is replaced by a 3 character entry? There is no space to store it (right?) so does the entry get put at the end of the file?

Assuming this is true, does this make a file highly fragmented over time? - Thanks
 
The storage issue you are talking about is space....or the bytes it takes to store the data. When you define a field as being 50 characters wide, even if the field holds no data, there is space allocated in storage for all fifty characters. That is the issue. Basically, if you know a field will never be more than a certain length, set the max to that. It will keep the database smaller, thereby reducing the chance for corruption and also increasing access time to the records.

=======================================
People think it must be fun to be a super genius, but they don't realize how hard it is to put up with all the idiots in the world. (Calvin from Calvin And Hobbs)

Robert L. Johnson III
MCSA, CNA, Net+, A+
w: rljohnso@stewart.com
h: wildmage@tampabay.rr.com
 
Robert,
I believe you are wrong. It makes very little difference (in Access) if you set a text field to 10 or 100 characters. The space consumed on your hard drive is equal to the number of actual characters you are storing in the field. Editing, updating, and deleting may/will cause bloat in an MDB which can be recovered by compacting your database.

The specific way that MS uses to store the information in an MDB is not available for public consumption. I think it is "smoke and mirrors" but could be wrong. If you use SQL as a backend, then using CHAR fields vs VARCHAR will make a difference in the size of the database.

Duane
MS Access MVP
Find out how to get great answers faq219-2884.
 
Duane,

Sorry but I agree with mstrmage1768. That was the first thing I was taught within an access class during college.

Currently seeking for position around Boston, MA. kenphu@yahoo.com
 
Sorry folks. I just created two small mdb files with just one table and the below function in a module. In one MDB the field [text1] was set to 255 characters and the other [text1] was set to 2 characters. The mdb size at the start was 128kb for both files. After running the code, the mdb with the 2 character field was actually slightly larger which was due to changing the size of the field from 255 to 2 without compacting. After compacting both mdb files, they were both 908kb.

This test suggests that having a field set to 2 characters versus 255 characters has virtually no affect on the mdb size.


[Blue]
Code:
Function FillTable()
    Dim db As DAO.database
    Dim rs As DAO.Recordset
    Dim lngI As Long
    Set db = CurrentDb
    Set rs = db.OpenRecordset("SELECT * FROM Table1")
    For lngI = 1 To 50000
        rs.AddNew
        rs.Fields("text1") = "a"
        rs.Update
    Next
    rs.Close
    Set rs = Nothing
    Set db = Nothing
End Function
[/blue]

Duane
MS Access MVP
Find out how to get great answers faq219-2884.
 
I was thinking about doing that myself but didn't bother.

Perhaps Perhaps Robert gets a star?
 
I'm fairly certain Jet allocates 50 bytes for a field you declare as 50 bytes whether you put 2 or 50 bytes into it. The reason is as originally mentioned - it would create a lot of shuffling of data when an update was made from eg 2 to 36 bytes.

You can't tell this by looking at the size of the file as the way Jet allocates pages is complex in order to balance space usage versus speed. Plus there is other stuff in the mdbs apart from your data.

From what I recall Jet behaves rather like DB2, if that helps.

 
I can't believe you are all questioning this. How did my test not prove that the size of the field has little or no effect on the size of the mdb file.

Maybe one of you that thinks different could create a couple test mdb files that proves that size makes a difference. I would really be interested if the size of your text fields (rather than the amount of data actually stored) made a difference in the file size.

Duane
MS Access MVP
Find out how to get great answers faq219-2884.
 
Part of our work involves processing about 250k mailing records/wk and the supervisor was having the staff run a field size program, print out a report, and change each field size to the min required.

They told me I was nuts!

I tested about a dozen tables and occasionally found negligible space savings.(rounding errors?) The procedure has been eliminated. We save about five minutes per file imported, eliminate the risk of truncation from errors, and get rid of a mind-numbing task.

I was wondering what circumstances might make this an issue. Any change in size due to numerous updates in files over a long period or file fragmentation differences would probably be eliminated if compact/repair is run.

Does anyone know the truth?

al

 
alr0,
I think we have proved the truth through our experience that adjusting the field size saves little or nothing in mdb size. I know that the old dbf format files were larger based on the size of the fields. I don't know how MS stores data and object definitions within the mdb file and I doubt too many people outside of MS have this knowledge.

Microsoft MVPs are privy to some confidential information but file data structures isn't included in the NDA info. It isn't a secret that setting a smaller field size will not save you hard drive space. Again, if someone could prove this wrong, I would like to hear about it.

Duane
MS Access MVP
Find out how to get great answers faq219-2884.
 
dhookom I'm not 100% clear what you are saying but I think you are saying this:

1.Create Database A with Field T declared as 2 characters

2. Load A with 50,000 single characters

3. Measure size of .mdb

4.Create Database B with Field T declared as 255 characters

5. Load A with 50,000 single characters

6. Measure size of .mdb

If in this case the two files are the same size then you argue the text fields must be taking up the same space. Sounds persuasive except 1) it's completely illogical and 2) why does Access warn you when you shorten a text field and then afterwards your data is lost if you try and move the size back to what it was?

What happens with 5 million records?

 
Did you mean step 5 to say "Load B with 50,000 single characters"
 
Hi!

I just had to test myself too, out of curiosity (well, like to test things to be convinced too;-)). I've also been told (and believed) that the field sizes determines the space allocation/usage, but, both by Dhokom, alr0 and my own test, I think I'll have to abandon that view, at least when dealing with Access.

BNPMike, did do any testing? Do you have any interesting results? The please enlighten us. Your first question, what is “completely illogical”, that the databases really are the same size, or the assumption that since they are the same size, the fieldsize does not matter? As to your second question, this discussion has not been about Access storing 255 characters nomatter fieldsize. It's been related to that it doesn't matter for the size of the DB what your field size is, Access seems to store only the characters actually in the field, setting a fieldsize less than number of characters you want to store of course cut's it off.

Testenvironment:
WinXP/AXP, 1 table, 1 field, no indexes, using Dhokom's code to append 1 000 000 records with the letter "a". First example, the field size of the three databases where 1, 10 and 255 characters (as in the heading below).

All sizes in KB
[tt]
Field size 1 10 255
Before adding 112 112 112
After adding 15 812 15 812 15 812
After compact 15 820 15 820 15 820[/tt]

Using decompile did not do any further size alterations, but I'm a bit curious to why compact/repair added 8KB. Also a bit curious, 1 mill “a”'s should equal 1MB storage, what are the other 14.8 MB's?

Further testing, now appending "abcdefghij" in the two latter bases, emptying them first.
[tt]
Field size 10 255
Before adding 112 112
After adding 22 724 22 724
After compact 22 732 22 732[/tt]

Again I'm a bit uncertain as to why compact/repair adds 8KB.

Last test, this time again with 1 000 000 records, appending the previous string * 10 (=100 characters) into the databases, now having field sizes 100 and 255.
[tt]
Field size 100 255
Before adding 116 116
After adding 111 220 111 220
After compact 111 244 111 244[/tt]

Start size is a bit larger, probably because I've altered the code.

If field size matters to the allocation/size of db, I would expect there to be significant differences in size when storing the exact same date in two/three different databases with different field size. There are no difference at all in size. I don't dare to make a conclusive statement, but I think this little test supports everything said by Dhokom and alr0.

Still, there are some troubling thingies to me:
1 - should this knowledge alter the way one decides upon fieldsizes of text fields?
2 - back to tdion's original question - how is additional data stored (when adding a third character in a field which originally had 2 characters)?

Per 1 - We've only discussed the storage of the data. Would the field size on BE tables effect the network traffic? (Would there be more traffic if retrieving a table with fields having field size 255 vs field size 1 - or would also the network traffic be related to actual size/number of characters per field per record? I'm not able to do any testing on that, would be nice if someone could) Another thing to consider when deciding upon field sizes, is the possibility of your app being upsized/upscaled to run on another database. I've had more then one app, originally intended to be running only on access, but when the number of users increased -> SQL server or other db's. I think other databases might use field size in allocating space, so it might be worth considering.

Per 2 - I haven't any Idea, but I'd believe the compact/repair should do some housecleaning of that type (it should at least reindex the db)

Roy-Vidar
 
Folks,

Duane is right. Don't trust his test, run it for yourself. It's quite simple. And it is the way Access works.

Jeremy

==
Jeremy Wallace
AlphaBet City Dataworks
Access Databases for Non-Profit Organizations

Please post in the appropriate forum with a descriptive subject; code and SQL, if referenced; and expected results. See thread181-473997 for more pointers.
 
Upsizing un-necesarily large text fields to SQL Server may bite you but this depends on the field types in SQL Server. As I mentioned in my 12/30 post, there is a significant difference between CHAR and VARCHAR field types in SQL Server.

Roy-Vidar, thanks for the effort in setting up and conducting the tests.

Duane
MS Access MVP
Find out how to get great answers faq219-2884.
 
I wonder if this has got anything to do with what's happening?:

Microsoft Knowledge Base Article - 275561

"The use of the Text keyword without an accompanying length specification has been changed. It is now a synonym for Memo. This better aligns the use of the Microsoft Jet Text keyword with its use in Microsoft SQL Server. You can continue to use the Text keyword with a length specification, for example Text (100) to define a fixed-length character data field."



 
Looks like I am completely wrong. I've just done some tests. 1,022,976 rows of autonumber plus 255 byte text field only allocates 30,868,000 bytes when loaded with 1 character. The same data in a table sized at 1 byte takes up exactly the same space. Update the field to larger sizes and the database grows. Update back down and the database shrinks.

The only thing I can think is that Microsoft argued that the biggest speed constraint would be network traffic - networks were as slow as 4Mbits when it first came out.

Varying the space like this would create immense cpu and i/o load but with pcs getting ever faster it was better to load them but keep network traffic at the minimum.

The bottom line is there is absolutely no point in declaring text fields smaller than 255 characters unless you are using the field size as a data checking mechanism.

You live and learn...

 
BNP,

Yeah, and the important part is the learning! Data validation is pretty much the only thing I use field size for in Access. I'm about to launch into learning PostreSQL, where I'm sure I'll have to completely unlearn that way of thinking. Should be a good little challenge to look back at old databases and actually put some thought into how long fields should be in future databases.

Jeremy

PS: Roy, I sure wish I'd waited a few minutes to make my last post. It's nice to see test results in a thread like this.

==
Jeremy Wallace
AlphaBet City Dataworks
Access Databases for Non-Profit Organizations

Please post in the appropriate forum with a descriptive subject; code and SQL, if referenced; and expected results. See thread181-473997 for more pointers.
 
"What happens when an update is made and the 2 character entry is replaced by a 3 character entry? There is no space to store it (right?) so does the entry get put at the end of the file?"


My previous guessing in this subject was completely wrong but that's not going to stop me now.

Jet is organised into pages (4Kbytes I think). What I suspect happens is Jet first shuffles data on the page to accomodate the larger record, and I would guess it keeps the new data contiguous with the old, because this is all happenning in real memory. When the page gets full it can either create another page and insert that between the current one and the next one, or it can attempt to shuffle data onto the next page. My guess is it creates a new page. The alternative would be to check the empty space on the next page (presumably there is a free space list on each page) to see if there is enough. Anything more comprehensive would risk huge i/o as shuffling was propagated throughout the database. This means that if you had a field with 2 characters and then updated your database to three characters you might get lots of nearly empty pages.

I feel another test coming on...

 
Seems we are getting into some performance issues. I just create the proper indexes, compact regularly, build forms for efficiency, correctly structure my tables, use good coding practices, and let Access/Jet handle what's going on in the file. Tony Toews has a nice resource on performance at
Duane
MS Access MVP
Find out how to get great answers faq219-2884.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top