Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Autogenerating text based on analyzed text

Status
Not open for further replies.

BkevinT

Technical User
Nov 30, 2004
3
NO
Hi, all....

I'm posting here to see if anybody might be able to give me a push in the right direction.

I have created an application that reads a txt-file and analyzes the statistics of character-combinations within it. I store these statistics in a hashMap; with the character-combo as a key, and the number of times each key occurs as value. So if a file contains the text: "dentist" the hashMap stores de, en, nt, ti, st as keys and sets the value of each to 1.

I'm not quite sure how I'm gonna tackle this task, and would really appreciate any ideas you might have.
And simpler is better, of course ;)

Thanks :)
 
sorry - hit to fast <submit>.

Do you like to produce something which looks like french, but isn't french - or looks like german, but isn't - depending on the input?

I guess distinction between start of the word and ending would be helpfull too - produc(tion), consum(tion) - dent(ist), capital(ist) - (mis)informed, (mis)aligned - you name it ... - not allways 2 characters long.

don't visit my homepage:
 
Yes, I would like it to produce something that looks like french/german, without it actually being that language. The program is already set up so that it asks the user how many characters should be in the combinations when a text is analyzed, so that's taken care of. I guess 3-character combinations should work pretty well. But I don't think I'm gonna worry about whether it's at the start or the end of the word....
 
Hm.
If you got combinations of two chars, and ignore upper/lowercase and accents and german Umlaute (äüö) and ß - you will get 676=26*26 pairs to analyze.

Let's assume you analyze the frequencies, sort them by freqencie ascending, and sum the frequencies up:

Let's take a very very simplified example:

erelere

er=2
re=2
el=1
le=1
----
sum 6

sorted: sum
el=1 1
le=1 2
er=2 4
re=2 6

Now you generate multiple times
k = random.nextInt (6)+1;

for k=1 you choose 'el', k=2: le, k=3 or k=4: er, k=5,6: re

You got the idea?

don't visit my homepage:
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top