Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Number compressor

Status
Not open for further replies.

godonga

Technical User
Mar 19, 2006
25
ZA
Hi
This question may sound wierd but I really need your help. I need to come up with a program that takes in at least a 16 digit number and outputs say a 6 digit number. I should be able to do the reverse as well i.e. put in the 6 digit number and output the exact 16 digit number.

Any ideas?

G
 
What is the transformation logic? That is, are you changing radix? Are you using a cryptographic transform?

_________________
Bob Rashkin
 
I would convert the number to String... then I can check the length and add or remove the digits as required.
 
I don't know if it can be made.

If every 16 digit number must be transformed into a 6 digit number several different 16 digit numbers will be transformed to the same 6 digit number, so how would the app know which is the correct one to do the reverse transform.

This happens with hash codes like MD5, with the original one you can obtain the hash, but with the hash you cannot obtain the original.
 
Just looking at this mathematically, if the transformation is not isomorphic, that is, one-to-one in both directions, the reverse transform is undefined.

Let's say you were using truncation. Then 1234568888888888 would become 123456, but so would 1234564444444444. The best you could do on reverse is assume, say, the lowest: 1234560000000000.

_________________
Bob Rashkin
 
Well, perhaps there is a way it could be done, and it is by encoding the original number to alphabetical characters (maybe including special characters like . : , ; + - _ * / \ ! # @ % & $ etc...

This way the target set of characters is wider than the original one, so you can encode several digits into a single character (or perhaps 3 digits to 2 characters). The bigger your target set is, the more you can reduce.

What you want is like decoding a base64 string to its original form, and encoding it back to base64.

 
Just an addition to what I said above, you need to reduce 16 digit to digit/character, this is a reduction greater than 2to1, so this means you need a really big target set.

For example, if you want to reduce 16 to 8 (2to1), you need 100 characters (alphabetical, special charecters, etc...) because you need to encode from 00 to 99, all into a single a charecter.
 
If your number is represented as String, only.

For each character you normally use at least one byte, which may represent 256 differnt numbers - not just the chiffres 0-9.

You may not reduce any number in stringform of length 8 like "19421887" to a String of length 5 in the same encoding.
Or store any possible long as int.

don't visit my homepage:
 
You might as well convert it then directly to a byte array. Since the input value is a long then this should do the trick to convert a 16 digit number to 8 bytes.

import java.nio.LongBuffer;
import java.nio.ByteBuffer;

public class LongToByte {
public static byte[] longToByteArray(long l) {
byte[] bArray = new byte[8];
ByteBuffer bBuffer = ByteBuffer.wrap(bArray);
LongBuffer lBuffer = bBuffer.asLongBuffer();
lBuffer.put(0, l);
return bArray;
}
}

 
Thanks guys for all your input. Morefeo raised an interesting point about MD5. The logic of this would be to have something like a failed implementation of MD5 where one can get the original number from the hash message. the following is an example of what I am looking for


12345678912345678919 --> f(x) --> 234679

37677772399998633091 --> f(x) --> 346723

234679 --> f(x) --> 12345678912345678919

346723 --> f(x) --> 37677772399998633091

The resultant must'nt be limited to an Integer , it can be anything as long as it be reconverted to Integer in the reverse.

Your thoughts???

 
Your inverse function should be labelled f'(x) or g(x), or shall it be the same as before?

If the resultant might be as long as the input, a good candidate would be the identity-function.

i = f(i)
i = f(i) = f(f(i))

don't visit my homepage:
 
Well, take a look at Base64 encoding/decoding algorithm.

It makes the reverse of what you need (it encodes 3 into 4, and decodes 4 into 3), so you just need to change the order (base64 decoding is your encoding, and base64 encoding is your decoding).

The only problem is that it is a 4to3, so you could encode 16 digit/character into 12 digit/character, not as much compression as you need, but perhaps it can give you a clue on how to do it.
 
I assuming that your 16 digit numbers are in base 10 (i.e. decimal numbers) then as implied by MoreFeo, Bong and tom62 above you could use converting from base 10 to a higher base (e.g. 16,64,256,... 1000) as a method of reducing the number of digits required to represent your numbers.

I found this problem interesting and had a further look and gathered some thoughts below. Please accept my apologies if i repeat the same things as the others in this thread have already pointed out.


Converting a decimal number to base 256

An unsigned long number has 64 bits (on a 32 bit platform). So the biggest unsigned (i.e. positive) decimal number than can be stored in an unsigned long is: (2^64)-1=18,446,744,073,709,551,615.If you count the digits of this biggest number you will see that there are 20 digits. Your decimal numbers have 16 digits so they are well within the above range be able to be stored in an unsigned long. In fact you could even use simply long (signed) if you are also dealing with negative numbers. So if you represent an input number using a long variabe as suggested by tom62 you could then simply retreive each byte of of the long variable and then treat the symbol(character) it represents as a single digit. However this would give you up to 8 digits so i guess it is not good for you as you are requested to have 6.

The biggest 16 digit decimal number 9999,9999,9999,9999 in hexadecimal base would be 23 86 F2 6F C0 FF FF (so it would require 7 bytes). Therefore the above method can only be used if your task was to convert the numbers on up to 7 digit, or if your numbers were less than 2^48 and greater than zero which would all fit into 6 bytes.

The above method could be seen as a method of converting numbers from base 10 to base 256 (i.e. from a number base where only 10 symbols (digits) exist to a number base where 256 symbols(digits) exist)




Converting a decimal number to base 4096

However if you group the hexadecimal digits of the max number above into groups of 3 you would get 023 86F 26F C0F FFF and then treat each group as a single digit then you have max of 5 digits (giving you one reserve symbol for the number sign should you need it). You can then try to use the method suggested by MoreFeo and create one single symbol for each group. This has the downside of you having to somehow have 2^12=4096 symbols (perhaps by mapping to 4096) characters.


The above could be seen as a method of converting numbers from base 10 to base 4096 (i.e. from a number base where only 10 symbols (digits) exist to a number base where 4096 symbols(digits) exist).



Converting a decimal number to base 65536

In fact you could group them into groups of four and then you would have 0023 86F2 6FC0 FFFF. If you then treat each group as one digit then you could represent each of your 16 digit numbers with maximum 4 digits only. Similarly here you have to choose a set of symbols. Because each group is 2 bytes you could use unicode characters as your symbols. So each group would be represented by a single unicode character. (There are 2^16=65536 unicode characters)



Choosing smallest possible base to convert to

I am not sure if any of the above provide you with a method towards the solution that you require however they did raise a question in my mind. "What is the smallest base required to represent a 16 digits decimal number into a 6 digit number?" Certainly converting to base 65536 or 4069 would do the trick but could there be a smaller number base that would work just as well. Choosing a smaller base will reduce the number of symbols required and therefore choosing the smallest possible base is perhaps the most elegant solution. The answer to this question may be found by first solving the following equation: X^6 = 10^16

X^6 = 10^16
X = 10^(16/6)
X = 464.159

So the smallest base we could probably choose would be 465. So you will need at least 465 different symbols to convert a 16 digit decimal number into a 6 (465mal) digit number. Algorithms for converting from one base to another are simple and you should be able to work out by looking at some sample.


I hope the above helps (or at least I hope it does not lead to the wrong direction).

Good luck




"It is in our collective behaviour that we are most mysterious" Lewis Thomas
 
Interesting stuff from Bledazami and I liked your analysis with the different bases. This then led me to think that assuming on a 32 bit system where we can have an input of up to 20 digits and using base 4096 I can get an output of 6 digits right? Whats the maximum number of 16 or 20 digit records that I can convert to say base 4096 without having diferent 16 or 20 digit giving me the same output of 6 digit?

Is one able to manipulate the base 4096 resultant as one is able to do with base 10 digits in a mathematical formula? e.g. in base 10 .... (20*30)/100 =6

to take a simple formula , of the quadratic equation e.g f(x)= ax^2+bx + c , x= (-b +/- squareroot (b^2 -4ac))/2a

can I substitute the base 4096 equivalence of base 10 in the formula and put the output through a base 4096 to base 10 conversion and get the same answer had I just used base 10 values?

thaks in advance
 
StefanWagner is right converting numbers to a higher base does not compress the number (which is held in a long [64 bits] in computer storage or memory). It does however provide a way for reducing the number of digits to required to represent the number when shown on screen or on paper. True compressing would require the use of techniques of compressing this sequence of zeros and ones on computer storage as suggested by links that tom62 already provided.


However it seemed to me from the problem descriptions by Godonga (regardless of the title of the thread) that maybe Godonga was perhaps interested in reducing the number of digits required to represent the number. Anyway just to follow on the last questions of Godonhga.

"Whats the maximum number of 16 or 20 digit records that I can convert to say base 4096 without having diferent 16 or 20 digit giving me the same output of 6 digit"

For each number in base 10 there will be a unique corresponding number in base 4096. So there will be no case where two different numbers on base 10 give you the same number on base 4069. This would be true if you were converting to any other base.

"Is one able to manipulate the base 4096 resultant as one is able to do with base 10 digits in a mathematical formula?"
Yes. The numbers of any base can be used to do calculations and then revert the result to the previous base. Thats exactly what happens when we perform any calculations in the computer (the numbers we represent as decimal symbols are in fact converted and used in processor calculations as binary numbers, the results are then converted back to us in base 10 when we show them on the screen).

As an example consider the following simple calculations on base 10

(200*300)/1000 = 60


Converting the above numbers from base 10 to base 16
200 = C8
300 = 12C
1000 = 3E8
60 = 3C


Performing the calculation on base 16 (try it using Hex option on Calculator program that comes with Windows)
(C8*12C)/3E8 = 3C --> 3C in base 16 is equivalent to 60 in base 10


I hope the above helps a bit more (or at least I hope it does not lead more to the wrong direction).









"It is in our collective behaviour that we are most mysterious" Lewis Thomas
 
I'm a bit confused by this thread. Computers use bits to represent data, regardless of what number base you want to use to display it. So if you have a sixteen digit base-10 value, you are going to need 54 bits to hold that number of discrete values, and 55 if you want to allow negative numbers. Assuming we aren't bothered about wasting one or two bits for each value, this equates to seven bytes, as bledazimi has already noted.

What is the purpose of this transformation? If it is for display, then using anything other than base 10 won't mean anything to the reader, so it must be for saving space either on disk or in transmission? In which case holding it as a long is probably quite adequate, or a 7-byte bit string if you want to make life hard for yourself.



Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
I fully agree with you steveff. Like you said we still have no clue about the real requirements and why it would be necessary to do some magical tricks with numbers. Godonga, who started this thread probably already gave up. Unless he'd give us some real examples about what is needed, we might as well close this thread.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top