Hi my name is Tolga, I just start learning Java programming after all I am familiar with c++, but I have this project to do Is there anybody can help me out there with this project.
I want to read a text file, makes a tally of all the different words used in the file, and prints out a list of the words that appear most frequently, along with the number of times each word appears. This list should be printed in descending order of frequency of appearance, so that the most-frequently-used word is printed first,
then the second-most-frequently used word. The alphanumeric characters consist of upper and lower-case letters and digits. All other characters are called separator characters. A word is any sequence of consective alphanumeric characters that appears in the
file preceded immediately either by the beginning of the file or a separator character and followed immediately either by the end of the file or a separator character.
I want to take one command-line argument, which is the name of the file to be processed. It should read words
from the specified file, count how many times each word appears in the file, and produce output (on the standard output) showing all the words and their frequency of occurrence. The output, which should be sorted according to decreasing frequency of occurrence, should appear as follows:
a 1573
the 439
an 128
i 23
if 10
I want to use some of the classes from the java.io package, such as the java.io.FileReader class, java.lang.Character class for some methods useful for telling if a
character is alphanumeric or not. I also found the java.util.Hashtable class useful for storing words and their counts. I will probably also need to use the java.util.Enumeration interface in order to iterate over the words stored in a hash table.
One thing I find confusing is the difference between byte and char values in Java. Byte values are 8-bit values capable of representing, for example, the ASCII codes for text characters. In contrast, char values are 16-bit values that represent characters in Unicode. When I use, e.g. the read() method of the interface java.io.InputStream, I get back an integer value that you would cast to byte before using it as part of a word. On the other hand, the read() method of the interface java.io.Reader gives me an integer value that you must cast to char. If I don't apply the cast, my program will not interpret the characters properly and I will get strange results. For both interfaces, end of file is signalled by the return of a -1 from read(). I have to test for -1 before applying the appropriate cast.
Thanks [sig][/sig]
I want to read a text file, makes a tally of all the different words used in the file, and prints out a list of the words that appear most frequently, along with the number of times each word appears. This list should be printed in descending order of frequency of appearance, so that the most-frequently-used word is printed first,
then the second-most-frequently used word. The alphanumeric characters consist of upper and lower-case letters and digits. All other characters are called separator characters. A word is any sequence of consective alphanumeric characters that appears in the
file preceded immediately either by the beginning of the file or a separator character and followed immediately either by the end of the file or a separator character.
I want to take one command-line argument, which is the name of the file to be processed. It should read words
from the specified file, count how many times each word appears in the file, and produce output (on the standard output) showing all the words and their frequency of occurrence. The output, which should be sorted according to decreasing frequency of occurrence, should appear as follows:
a 1573
the 439
an 128
i 23
if 10
I want to use some of the classes from the java.io package, such as the java.io.FileReader class, java.lang.Character class for some methods useful for telling if a
character is alphanumeric or not. I also found the java.util.Hashtable class useful for storing words and their counts. I will probably also need to use the java.util.Enumeration interface in order to iterate over the words stored in a hash table.
One thing I find confusing is the difference between byte and char values in Java. Byte values are 8-bit values capable of representing, for example, the ASCII codes for text characters. In contrast, char values are 16-bit values that represent characters in Unicode. When I use, e.g. the read() method of the interface java.io.InputStream, I get back an integer value that you would cast to byte before using it as part of a word. On the other hand, the read() method of the interface java.io.Reader gives me an integer value that you must cast to char. If I don't apply the cast, my program will not interpret the characters properly and I will get strange results. For both interfaces, end of file is signalled by the return of a -1 from read(). I have to test for -1 before applying the appropriate cast.
Thanks [sig][/sig]