Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

frequency table

Status
Not open for further replies.

helloman

Technical User
Mar 3, 2004
5
AU
I am a beginner. I have to create a compression program using the huffman algorithm. My question ism before that I must create a "frequency table." That would be used to compare how many times characters are used when reading from a file input. How would I approach in creating that table?
 
First of all, I would suggest trying to implement something easier.

Now, to the frequency table:

Can we be sure that the number of character types to take statistics of will remain constant with each file? Then we can use a constant length table, essentially just an array, which we can implement as such.

If we cannot be sure of that, then we will be using a pair of data: one of them will be the character type, the other the frequency of that character type. That's a bit more complicated, but with some imagination you can implement it EASY. Just implement a routine whose argument is the character type, have it search through the table for the proper entry, and update that entry. "Information has a tendency to be free. Which means someone will always tell you something you don't want to know."
 
AmkG, thanks for your quick reply.
I understand that it will have something to do with a tree...maybe something that might resemble a binary search tree. But how will I create such a tree in Assembly? And I understand that I have to create a counter for each character, but how will I do that with only 4 available registers? Thanks
 
Whoever said you had only 4 registers? There are AX,BX,CX,DX,DS,ES,SS,CS,F,IP,SI,DI...!
And whoever said you need to use ONLY registers?? Use memory, that's what memory is for!!
And what's more, whoever said you need a binary search tree???
In assembly, it's often significantly easier to use a one-dimensional strructure, NOT a two-dimensional tree. Just use pointers to keep track of stuff.
"Information has a tendency to be free. Which means someone will always tell you something you don't want to know."
 
Thanks, a few other questions have came up.

I want to get a user input and output by using the dos command line. After I get the filenames, I will open them using INT21h.

ex. compress FILENAME1 FILENAME2

I understand the FILENAME1 would be located at 80H and FILENAME2 would be located in 81h.

would my code be correct:

.data
FILE1 DW ?

.code
MOV FILE1, 80h


then move DX, FILE1
open using 03Dh?

Thanks
 
This is the first time you REALLY programmed in Assembly, haven't you? Because what you're trying to do is to load the number 80h into DX. I would really, really, really recommend you try something far simpler than what you are attempting.

Disclaimers finished with, let me start this discussion of the command line. First. The command line is in what is called the PSP. Now, at the start of your program, DS and ES point to the PSP; HOWEVER, most programs start with something like this:
mov ax,@DATA
mov ds,ax
ASSUME ds:mad:DATA

which means that DS will no longer point to the PSP. However, most programs leave ES well enough alone, so that it can still be used to point to the PSP.

Now, the byte at PSP offset 80h (which you can refer to with: mov al, byte ptr es:[80h]) contains the length, in characters (also known as bytes) of the ENTIRE command line. The buffer at PSP 81h contains the ENTIRE command line. Note that this includes the name of your program; worse, if the guy at the keyboard typed a couple of spaces before the name of your program, then those spaces get expressed on the buffer. The buffer is COMPLETELY unformatted, so if the user used extra spaces, it's up to your program to take them out.

DOS however does SOME formatting and assumes that the first two string words (bounded by spaces) after the command itself are filenames. It then puts these filenames in two buffers, the first into PSP offset 5ch and the second into PSP offset 6ch. However I have never used these two buffers because they're not big enough for paths...

One last thing: I really think you need to do something simpler. In assembly, it's the parts which APPEAR simple that are often quite complex. "Information has a tendency to be free. Which means someone will always tell you something you don't want to know."
 
Thanks AmkG, you don't know how much you've helped me.
But here's my next problem. I know I need to read all the characters in the input file, but how would I store it?

ex. ABCDEADASDA
Character - Frequency
A - 4
B - 1
C - 1
D - 3
etc...

I know I'm suppose to create an array, ex: BUFFER DB 256 DUP (0)

and store it in there...but how would I store it?
If I store each character into BUFFER[n], where would the frequency go to represent the count of each character?

Please help!!
 
I really, really, really think you should start with something far, far, far simpler.

Why don't you just make a SEPARATE array that is parallel to your BUFFER array??
Two arrays -
BUFFER DB 256 dup (?)
FREQ DW 256 dup (?)

of course this kind of limits you to having only 256 patterns...
two dynamically allocated, parallel arrays are good... "Information has a tendency to be free. Which means someone will always tell you something you don't want to know."
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top