Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Assembly to C/C++ compiler 2

Status
Not open for further replies.

theenlighted

Programmer
Apr 25, 2003
1
GR
Hello,
Is there a compiler that can generate c or c++ source code from assembly code ?
Can we write one or is it too difficult ? Would it worth the struggle ?
 
Hi,

Is there a compiler that can generate c or c++ source code from assembly code ?
If you mean a code converter from asm to c/c++, I don't think you will find any.

Can we write one or is it too difficult ? Would it worth the struggle ?
It is not worth. Asm code is way too flexible. What I mean is you have to analyze more than one line of code to see what is it trying to do. And in my opinion it is much more easier if you make a converter from c/c++ to asm (I'm not saying this is an easy task), but I've seen Delphi2Asm, VB2Asm

PS: Don't take that personally, it's just my opinion :)

Regards
 
If you mean converting source code from assembler to C (or C++) for support issues then I am not sure such a beast exists....but I could be wrong.

If you mean to take a program module and reverse engineer it back to an assembler source deck, then you may be more successful, however I have no idea if there is anything out there to reverse engineer a program back to C or C++ source though. Let the ethic debate commence! :)
 
a converter from ASM to C++ is imposible.

there are so many tricks and shortcuts in ASM that cant be acheived in higher level languages.

a converter from C++ to ASM ????????

thats a bit silly considering c++ is a compiler and we all know compilers convert high level languages to machine code.
maybe a disassembler will convert the MC to ASM.



"There are 10 types of people in this world, those who know binary and those who don't!"
 
most c++ compilers allow you to use asm in the actual program but using the function

asm{
//insert assembly code here
};

or

_asm{
//inser assembly code here
};

depending on what compiler you use...microsoft visual c++ uses _asm, and i believe borland c++ compilers just use asm
 
Because of the nature of the 2 languages, I seriously do not believe it is possible...

ASSEMBLY (ASM) is a low level language... It is the closest language to machine language (opCodes) that a *Normal* person can read and understand (with out a Super Photographic Memory)... these 2 sets of code are directly related to each other...
ASM contains REGISTERS, SEGMENTS, Indexes, Pointers, flags, and very few other limited data storage elements.

C/C++ is a High level language, the next step up from ASM...
C contains variables, procedures calls, constants, types, objects, definitions, etc... which are all lost when compiled...

for example...

Y = 2
Z = 4
...
X = (Y + Z) / 2

... means nothing to a computer, where, in assembly, it might read...

PUSH CX
PUSH BX
...
POP BX
MOV AX, BX
POP BX
ADD AX, BX
MOV CX, 02h
DIV AX, DX

...which has exactly 1 set of opCodes that complement it.

the point is, the term assembler and compiler have 2 totally different yet still similar meanings...

Assembler takes mnemoic code and directly converts it to opcodes, straight accross, there is no optimizing, parsing, or any thing like all the steps a compiler goes through...
just translation from assembly to opCodes...

Compilers on the otherhand (such as C/C++) convert all of the variables into memory addresses, converts your code into assembly code, optimizes your code, and many other steps... to produce opCodes...

in conclusion, going from C/C++ to (true) assembly code will remove all of the variables, and many other luxuries from your high level language that can NOT be reversed...

for example...

int X, Y, Z
char a, b, c

in C/C++ might be converted to

ds:[si+0]
ds:[si+2]
ds:[si+4]
ds:[si+6]
ds:[si+7]
ds:[si+8]

in assembly...

BUT if int F is used in 1 function and char G is used in anothter...

they might both be located at

ds:[si+9]

so going the other way...
A program written to reverse the process of compiling would have no Idea what ds:[si+9] was in the first place...

The best you might get would be a C/C++ program that looks just like an ASM program... which would be more or less pointless...

A compiler is specially designed to break down high level code into lower levels, step by step...

there is virtually NO WAY to rebuild the lower level code into a higher level, based on the "inteligent" information lost during compilation.

now for assemblers such as masm and tasm... these are MACRO assemblers... which is like a cross between a high level language and an assembler... these upper level assemblers alow you to use variables, arrays, etc... that are translated the same way as the compiler variables are... but are still translated to machine code straight across...
yet when uncompiled back to assembly code the variables will seem as though they never existed...

you MIGHT be able to make a program convert a macro asm program into a c/c++ program with a little bit of use... but still...

due to the fact that assemblers DO NOT optimize code...
and Compilers do...

you can take a risk either way you go, on having slower code...

after all, the main reason for using asm in the first place is for speed...

where depending on your skills in both asm and c/c++, converting the asm to c/c++ could either slow the program down buy alowing the compiler to add unessisary steps or alow the compiler to optimize the code and remove unneeded steps...

I am getting tired of typing now... ;-)... so...

for more in depth info...
try a google search on "Compiler Theory" or "Compiler Design"
Most of them will explain the difference between a compiler and an assembler and the reasons why the process is irreversable...

Good Luck

Have Fun, Be Young... Code BASIC
-Josh Stribling
cubee101.gif

 
<<<
straiph
thats a bit silly considering c++ is a compiler and we all know compilers convert high level languages to machine code.
maybe a disassembler will convert the MC to ASM
>>>
I don't think it's silly. If you said a disassembler will convert MC to ASM, most programmers knows that. There are a disassembler, debugger, sourcer or whatever (you name it), but the point here is all the sources from RE are meaningless (not many can read it).

Source(C) to source(Asm) converter will be much more meaningful!!. But as I already stated this thing is not an easy job! It really isn't. And if someone ask me is it worth doing it, I will say no. But the fact is some people want some tools as a bridge between Language (Converter not an RE). And they expected the results would be easy to read



<<<
Cube
A program written to reverse the process of compiling would have no Idea what ds:[si+9] was in the first place...

The best you might get would be a C/C++ program that looks just like an ASM program... which would be more or less pointless...
>>>
That's exactly what I have in mind.

Regards

-- AirCon --
 
Hm, I'm not certain about quite a lot that's being written here.
(1) An experienced assembler/C programmer could definitely take an assembler programme and convert the vast majority of it to a readable C program. And if s/he can do it, then someone, somewhere, could probably automate the process and make a computer do it (though I'm not denying the task would be horrendous..) If they can't then Turing was barking up the wrong tree...

(2) But it would be utterly pointless! If you could have written the original stuff more easily in C, then you shouldn't have written it in assembler.

When I write assembler I already use a procedure/function sort of format, because spaghetti is hard to maintain. It wouldn't be difficult to keep the same structure on translating to C. Of course I refer to everything I possibly can by name rather than offset. It's much better to have labels than try to remember ds:[0018]. These names could be retained in the C version.

The harder task is to convert compiled-code to C.

For a flavour of how you might go about doing things like this, look at variables. If you have a block of code starting after a ret or retf instruction, then it can only be reached by jumps or calls. If the block ends in a ret or retf, then it must be called with the return address on the stack. Therefore it is reasonable to convert it into a function. All code not handled that way would have to be left as a spaghetti-code feature of the main function.
Within the blocks we've now identified, we need to find memory references and registers that are not forced to fixed values by the local code. These are clearly variables or passed parameters. Anything referred to by stack segment, or handled in registers, is best treated as a passed parameter or local variable... and so on.
The real problem in producing readable C is that we need to play psychology and guess names for the variables. But even here a &quot;clever&quot; computer could be a little helpful. Yes, it could call the variables a, b, c etc (unhelpful) but if it notes that one is set once, and then decremented and tested against zero, it would be logical to call it &quot;loop&quot; or something. So already little bits of high-level help are emerging from a careful analysis of the low-level code...
The other problem would be the library procedures and things. But these are actually quite simple for the reverse-compiler. If it is aware what libraries you might have used, it just has to match the sequences.

But I'm not trying to code any of this!
 
i'm with lionehill concerning (1). normally final executable has clear traces of hl compiler used - if you ever tried IDA u have a feeling what i mean. sure, such dissassemblers do not restore hl source, but normally one gets standard library references, some variables and structures determined. with some experience one can clearly see from assembly if it was, say, FOR or WHILE statement so on..
concerning (2), i feel puzzled a bit, probably misunderstood the point. not sure how it goes now in old good days one just added a switch to compiler saying to generate ams file. have things changed??
 
No, sorry to be confusing. So far as I know, most compilers still happily generate asm-type stuff if asked. I was just wibbling and thinking that it's a good deal harder to go from an .exe file to C than to go from a well-written bit of assembly to C.

Every time I see this sort of question come up I feel for the person posting it: usually it means &quot;help, I've lost my source code...&quot;
 
Every time I see this sort of question come up I feel for the person posting it: usually it means &quot;help, I've lost my source code...&quot;

I guess it's because of me this thread becoming more into a debate [blush]. Ok then my apology for making this trouble. Lets get to the topic and get back to the person (theenlighted) who threw the thread from the start. which is:

Is it worth to struggle or not ? :)

Again I'm sorry
Regards

-- AirCon --
 
Sorry, to carry on a bit like a debate rather than answering the question, I don't think you're going to see this sort of utility get written.
(1) It would be a real security hazard for the authors of commercial programmes, and would encourage tinkering by users, hacking by cheats
(2) It would be hell to write, and would sell to not a lot of people (mostly geeks who'd probably copy it off each other anyway).
(3) It would probably produce C code that was nearly as hard to read as the original assembler, so the chances are that anyone with enough skill to use the C code probably wouldn't have needed to....
(4) If anyone disagreed with the above 3 points they'd probably be selling it by now.

But I'll regret writing that when an .exe to C translator is selling for thousands this time next year.

It's an interesting question.
 
OK... the question was...
Is there a compiler that can generate c or c++ source code from assembly code ?
Can we write one or is it too difficult ? Would it worth the struggle ?


>>is there a compiler
Since by definition a compiler takes high level code and generates low level and/or machine code...
No...
the closest thing is a disasembler...
which unassembles machine code back to asm code from a com, obj, exe, or similar binary file.
So assuming that a decompiler was the program in question...
No, there is no program that can take a standard exe and create C/C++ code, in the way that you could read any better than an asm program...

Now I do agree with lionelhill
An experienced assembler/C programmer could definitely take an assembler programme and convert the vast majority of it to a readable C program

But I do not believe that any one could automate it... do to the fact that there are so many variations of programming styles, language variations, 16 bit / 32 bit variations (which might or might not be a big deal) registers that a program would have no clue what they were in the first place...

why would you want to go from assembly to c/c++?

usually if it was written in c/c++ it was for a reason (namely, speed)

The only LOGICAL reason to do this is, like mentioned above, to regain your lost sorcecode.
if this is what you are looking for, the hard truth is, you will probably NEVER see your source code again from a program generated, reverse engineered, executable code.

like mentioned above, you can always use...
asm{
//assembly code
};

so technically you might be able to use a disassembler to get ASM code, then insert it into a C program like so...

int main{
asm{
//insert disassembled code here...
};
}

and, technically, it would be decompiled back to C/C++ code... Though, I seriously DOUBT that is what was intended in the original post...

If such A program, to do the intended, does exist... MicroSoft probably owns and operates it, and as you can see, they aren't sharing ;-) It is probably locked in a closet somewhere waiting for someone to try to create a program like it, wait 10 years through the development process, then when the person starts to brag, the can release it, patend it, sue the other person, buy out Intel, and create the Pentium 25, all at the same day ;-)
...but I still seriously doubt it

If anyone thinks they feel up to the task...
GOOD LUCK...
and MORE POWER to ya... ;-)
but, when you go around and round in circles after months and find out you were in the same place that you started...
Don't say I didn't tell you so... ;-)

Have Fun, Be Young... Code BASIC
-Josh Stribling
cubee101.gif

 
While I was at school (a mere few months ago) I actually asked this question of one of my professors. He told me that people have been working on programs that would construct procedural C code from assembly, but at this point they are mainly an academic exercise and they don't work well at all.

So if academic researchers devoting all their time and effort can only make a crappy one, I doubt that it will be worth your while to try and develop one.

Rob
[atom]
 
Resource Hacker or
ResHack, takes an executable and
gives a C-program!
/laskar
 
laskar,

Which &quot;Resource Hacker&quot; can do that ? Are you sure ?????

Because what I know about Resource Hacker or Resource whatever can only get a RESOURCE from EXE and not the SOURCE CODE.

Resource file (RC) is not the same with SOURCE CODE

What am I missing here ???

-- AirCon --
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top