Why won't this bootsector code work?

phipple · Jan 20, 2005

Hi

I am pretty new to assembly. I am trying to write some real mode bootsector code. I am using NASM.

I seem to be having trouble understanding how NASM or x86 real mode handle segments and offsets. Or something. In any case, something is going wrong somewhere and I don't understand why.

For the moment, I am mixing my code and data into one plain binary file. So what I want to do is set DS to the same as whatever CS is, so that I can use the location of data in the binary as the actual offset when referring to it in my code, which is what NASM does if you don't give it an origin, right? Although the bootsector code always gets loaded up at linear address 0x07C00, I like the idea of having my code be able to sit anywhere in memory.

Aaanyway, to my mind, the following code should work:

Code:

[BITS 16]                      ; real mode is 16 bit
entry:
    mov        ax, cs          ; copy CS to AX
    mov        ds, ax          ; DS is now same as CS
    mov        al, [message]   ; load AL with first byte of
                               ;     message
    mov        ah, 0x0E        ; print byte in AL using BIOS
                               ;     interrupt 10h:0Eh
    mov        bx, 0x07        ; set character attributes 
    int        0x10            ; call BIOS interrupt
hang:
    jmp        hang
message:
    db         'Hello World', 0x00

    ; plus code to fill up to 512 bytes with 0x55AA at the
    ; end - works fine, not important here

But it doesn't work.

When I boot using this code, my Vmware BIOS prints a single 'S' at the top left of the screen. I haven't tried it on other BIOS's yet. In any case, this is not the intended result. Shouldn't it print an 'H'?

I am aware of an easy workaround. I can just tell NASM to start at 0x07C00 (instead of default 0?) by adding a [ORG 0x07C00] to the start of the NASM code, and then setting DS to 0x0000, rather than setting it to CS. That's fine, and that's what I do for the moment, but Id still like to know why the above code doesn't work - what am I missing?

When I make a memory reference such as [message] in my NASM code, the segment is going to be DS, right?

Hmmm, I just had a bit of a think about it, and I think part of the problem is that I'm assuming that the IP starts at 0 when the bootsector code is loaded. But I can't get at the contents of IP, can I. NASM doesn't recognise it as a register. Hmmmmmmmmmm

Thanks in advance,
Phipple

Salem · Jan 20, 2005

Try setting SS:SP to something useful as well.

Just had a quick look at the bootloader on this machine, and it begins with to set the stack to just below where the program is loaded.

Code:

CLI
XOR     AX,AX
MOV     SS,AX
MOV     SP,7C00
STI

http://www.nondot.org/sabre/os/articles/TheBootProcess/

--

phipple · Jan 20, 2005

Thanks Salem - the link was awesome, just what I needed.
I tried setting SS and SP but it didn't seem to help.
I DID however come up with a bit of a hack that seems to work.
It's not ideal, but at least I think I understand what's going on, which is the important bit (to me, anyway

)
As it turns out, the IP is NOT necessarily 0 when the bootsector is loaded (duh, because if it WAS 0, then the code segment would HAVE to be 0x07C0 in order to have the code loaded to linear address 0x07C00 - only realised this as I type it now).
Now I essentially want DS:BP or DS:BX or something to be an exact reflection of CS:IP when the program starts. But we can't access IP directly, as far as I can tell, so I had to use some trickery by calling a subroutine exclusively to get at the IP that is pushed onto the stack

Code:

entry:
  call  putipinbp   ; Puts IP into BP
  sub   bp, ($-$$)  ; The IP would have been pointing at
                       ; THIS line, not the first line, so
                       ; we subtract (from BP) the distance
                       ; into the code of this line ($-$$)
  push  cs
  pop   ds          ; DS now reflects CS. So now DS:BP
                       ; should look exactly like CS:IP
                       ; did when the code first started -
                       ; Happy Days!
putmessage:
  mov   al, [bp+message] ; Load up the (next) character of
                            ; message
  or    al, al      ; Test if zero
  jz    hang        ; Jump to hang if zero (end of message)
  mov   ah, 0x0E    ; BIOS vid service 0x0E = print teletype
                       ; character
  mov   bx, 0x07    ; char colour, 0x07 = white on black
  int   0x10        ; BIOS vid service interrupt
  inc   bp          ; Point to next char in message
  jmp   putmessage  ; Do it all again
hang:
  jmp   hang
putipinbp:
  pop   bp          ; Here's the trickery - pop the IP off
                        ; the stack into BP. Using BP
                        ; because BX will be used in
                        ; putmessage loop, and I'm not
                        ; really doing much with the stack
                        ; that requires use of BP anyway
  push  bp          ; Better put the IP back on the stack,
                        ; else things will get funky. But a
                        ; copy of IP is still in BP. Woot.
  ret
message:
  db    "Hello World", 0x00

And it works too. Coolness
I'm a little annoyed that I couldn't access IP directly and had to resort to trickery, but at least I understand what's going on now (At least I think I do - tell me if I'm wrong). It seems quite hard to make NASM make a "push ip" statement - the manual mentioned some trickery with macros, but I couldn't work it out. Usually it just says something like "error: symbol `ip' undefined". I guess it would be possible to write the machine code for "push ip" directly at the start, with something like

Code:

    db   0x50

but I need to add the 'register' to the 0x50, and I'm struggling to find resources that tell me the numbers I need - any hints anyone?

Aiyaiyai
Well I'll sleep a little easier tonight I think.

Any help with 'cleaner' ways of getting at IP would be appreciated, or links to resources telling me about the actual machine code numbers to use. Otherwise I think (hope) this little problem is wrapped up. Although DO tell me if I've got it wrong and was just lucky.

Cheers,
Phipple

TessaBonting · Jan 21, 2005

You can use an offset from ip if you just
specify that MESSAGE is also in the code segment.
This work with MASM and I think also with NASM.

entry:
xor bp,bp
putmessage:
mov al, cs:[bp+message] ; Load up the (next) character of
; message
or al, al ; Test if zero
jz hang ; Jump to hang if zero (end of message)
mov ah, 0x0E ; BIOS vid service 0x0E = print teletype
; character
mov bx, 0x07 ; char colour, 0x07 = white on black
int 0x10 ; BIOS vid service interrupt
inc bp ; Point to next char in message
jmp putmessage ; Do it all again
hang:
jmp hang
message:
db "Hello World", 0x00

Or like this ( si is incremented by the lodsb )

entry:
mov si,messageoffset
putmessage:
lodsb cs:[si] ; Load up the (next) character of
; message
or al, al ; Test if zero
jz hang ; Jump to hang if zero (end of message)
mov ah, 0x0E ; BIOS vid service 0x0E = print teletype
; character
mov bx, 0x07 ; char colour, 0x07 = white on black
int 0x10 ; BIOS vid service interrupt
jmp putmessage ; Do it all again
hang:
jmp hang
messageoffset equ $
message:
db "Hello World", 0x00

succes, Tessa

phipple · Jan 23, 2005

Thanks Tessa, but unfortunately neither of these code samples worked for me. I'm not sure why. I am using NASM, so I had to change the syntax for 1 line in each of the samples:
4: mov al, cs:[bp+message] -> mov al, [bp+cs:message]
and
4: lodsb cs:[si] -> cs lodsb
But I don't think either of these syntax changes should have affected it.
I think the problem is that we can't assume that the IP register is set to 0 when the code is loaded by the BIOS, ie the start of the code is NOT necessarily at offset 0 from CS. Therefore, the offset of a piece of data in the raw binary code will not necessarily work as an offset from CS, but only from CS:IP. I found that adding the following code to the start fixed the problem:

Code:

entry:
    jmp    0x07C0:start
start:
    <rest of code here>

This code will cause CS and IP to be set to known values, and further more the IP will be set such that the offset (from CS) of the first command in the code will be 0.
I believe this technique is relatively common. It's either this or using the "[ORG 0x07C00] command and then setting DS to 0" technique.

Oh and to respond to myself about getting at and manipulating IP, it appears that there is no direct way of doing this, according to Intel, so there won't be any command in assembly that will allow you to do it either, presumably. Nor will you be able to generate your own machine code by putting raw bytes into the code. However, the "Introduction to the Intel architecture" document published by Intel says the way to get the value of IP, if you really need to, is to make a CALL and then read it off the stack (just like I did in the code in my second post in this thread). You can also modify it on the stack and call a RET if you need to change it for some reason, although I guess a JMP would also do that.

Cheers,
Phipple

TessaBonting · Jan 24, 2005

Dear Phipple,

I read your command and I think that your trick is working
like it does:

- Normaly a program is loaded thue the loader in the
operating system, so the offset will be filled in by
that loader just before it will be run.

- By jumping to a far address cs and ip are loaded with
the segment and offset of that routine.
CS = address/16 and IP = 0 if the routine is placed on
a 16 byte boundery.

The 80x86 famaly has indeed no way of addressing data thrue
the IP register.
They say it isn't nessesary if you are using a loader that
adapts your code to make it position independent.
Althoug position independent programming with the 80x86
processors it not possible.

Hopefully I didn't confuse you more,

greetings, Tessa

lionelhill · Jan 25, 2005

Have a look at Salem again! Where do you think all those addresses go when you do "int" or "call"? Someone, somewhere, ought to know what ss:sp points to!

The reasons for the segment system were (1) to allow a 16 bit system a lot more address space than should fit into 2 words) and (2) to allow position-independent code and data. All you need is to point the segment registers at the right place, and your code can go anywhere that starts with a round multiple of 16, and the offset registers don't need to know a thing about it.

That's one good reason why you don't need to read the ip register.

Note also the cli/sti instructions around Salem's stack-setting code. Imagine what would happen if there were an interrupt when you've just changed sp but not ss.... And remember that interrupts happen all the time whether you want them or not, anyway...

Good luck!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Why won't this bootsector code work?

phipple

Technical User

Salem

Programmer

phipple

Technical User

TessaBonting

Technical User

phipple

Technical User

TessaBonting

Technical User

lionelhill

Technical User

Similar threads

Part and Inventory Search

Sponsor