NASM + STABS = ☠

I was shown that apparently GDB fails to properly decode NASM generated STABS debug sections, or NASM can’t produce correct STABS information. So what does a reasonable person with many other more important things on their todo-list do? Sit down and figure out what the issue is. Right??

STABS Debugging Format

The STABS format is a properly ancient format to encode additional debugging information in binaries, traditionally a.out binaries. Since then we entered the realm of Middle-Earth and started producing ELF binaries instead, using DWARF as the debugging format. However, ELF still can use STABS and especially on 32 bit architectures it is still relatively wide-spread, despite the Sun having finally settled on the company who invented the format ¹ (sobs).

Debugging formats usually encode the name of the file corresponding to the generated object code, plus the specific line for one or more instructions. You can either use the GNU tool addr2line to view this manually, or use a debugger or the like for more bells and whistles.

But, for some reason, the combination of NASM+STABS results in cruel death, total destruction and absolute oblivion. Let’s dive in an save the Middle-Earth.

MWE

We use the AMD64 GNU/Linux platform, producing ELF64 binaries with the minimal program using _start as entry point and executing sys_exit(0) immediately. The object code is generated using NASM/GAS respectively, and linked with the same flags.

They produce identical object code. Reproduced here in both common x86 syntaxes²:

If you want to follow along what I’m gonna be doing, you can find the source and binaries I used here. I’ve taken care to list every precise command I used.

GDB

If we ask GDB to print the source code listing of the GAS produced file, everything works fine:

ADDR2LINE

GDB uses the same tooling as the GNU/ADDR2LINE program [citation needed], so let’s check how this program behaves ³:

To conclude, while addr2line at least seems to work for the first source code line (or first assembly instruction), unlike GDB which doesn’t work at all, for the other lines/instructions we are presented with wrong information.

OBJDUMP

GNU/OBJDUMP can use STABS information as well to annotate the disassembly, let’s try this ⁴:

At least objdump and addr2line are consistent in what they are displaying. Let’s do the same with the gas produced file:

STAB Section

Without almost any knowledge of .stab, my educated guess is that SLINE in the n_type column refers to the source code line with the address n_value – so far, so good. Now let’s check the NASM generated binary:

This… doesn’t look right. The first address does match, but the second source code line doesn’t match to address 0x40100a (the address of the last instruction) and 0x401014 isn’t part of our program at all. Also, we have an additional SO entry with n_value 0x0, but this may be something that’s allowed by the (inexistant) specification.

STABS Binary Represenation

We can also look at the binary representation of the files. First we need to figure out the starting offset of the .stab section, and ideally its length as well, so we ask objdump to dump the headers of the our executables ⁶:

So the NASM generated file has a .stab section starting at 0x100c of 0x48 bytes size, whereas with GAS it’s only 0x3c bytes of size, with the same starting point. We can now use od(1) to dump this precise data in two-byte hex units ⁷:

Alternatively, we could use objdump again, dumping the full-contents of the .stab section with objdump -s -j .stab, yielding almost analogous output (try yourself!).

Comparing this binary dump with the -G interpreted dump of the .stab section from before, we can guess that the byte 0x1016 (0x05 in NASM, 0x04 in GAS) refers to the n_desc column in table. The byte at 0x01018 is encoding the lowest byte of the n_value. Searching for the other table entries with their repective addresses 0x401000, 0x401005, 0x40100a and 0x401014 from the GAS/NASM files we deduce that the four bytes following the n_desc byte are little-endian encoded n_value.

This gives us the following format (on the example of the GAS dump), where T refers to the bytes (likely) identifying n_type, O n_othr, D n_desc, V n_value, and S n_strx:

The String field is stored in the stabstr section and which we aren’t interested in.

Editing The STABS Section

You can use a hex editor such as radare2/rizen and fire it up as writable (-w) in raw mode (-n):

The goal is to edit the SLINE entries with the n_desc values of 6 and 7 to point to the correct n_value. They are stored in the addresses 0x1038–103b and 0x1044–0x1047. We will modify the bytes [0x0a,0x10,0x40,0x00] (little-endian encoded 0x40100a) and [0x14,0x10,0x40,0x00] (little-endian encoded 0x401014) to be 0x401005 and 0x40100a respectively.

After saving, a quick objdump -G confirms our changes were successful, and indeed addr2line now seems to work:

But we still have the confusing additional STABS entry left in our .stabs section:

What if we could resize the section and thus ignore this bogus entry? That is, instead of having a 0x48 byte sized .stabs section, reduce it by the 12 bytes required to have a 0x3c sized section (just as with the GAS produced file)?

from the radare2 shell ⁸. For me, this didn’t alter the file though, so we need to do some more magic, doing it all manually.

Wikipedia has a nice overview of the offsets of the ELF binary header. The ELF header itself links the section header (which may reside almost anywhere in the binary). Specifically, the field e_shoff at address 0x28 of the ELF header contains the starting address of the section header.

We then seek to this address and can print a small hexdump again. It is the binary representation of the section header which we pretty-printed earlier using objdump -h.

Again, from Wikipedia we can gather that each entry in the section header table is of 64 (0x40) Bytes size. With the default settings for our hex editor, this amounts to four lines for each entry in the able.

The first four lines containing zeroes signify an empty entry. The second four lines are of no interest to us, but the third ones contain the sequence 0x48 (the “wrong” size of our .stabs section!). So the third entry with address from 0x11c8 is likely to be our .stabs entry. Indeed, Wikipedia documents that the sh_size is at offset 0x20 which would mean that the 8 bytes from 0x11e8 really encode our size!

Checking with objdump -h and objdump -G confirms our change to have been successful. Now firing up GDB:

Success! It does take almost three times as much “user” time, measured with time(1) though, so probably it still chokes a bit on the additional data in the section. Things do look much better now though.

Yet Another Assembler

In the quest of further understanding the weird additional entry, let’s compare the NASM output to the one created by yet another assembler (mostly compatible with NASM input):

We do recognize the final line – in this case, however, it does contain a more meanifingful value than 0x0.

So we undo any modifications we did before and instead of deleting the last entry in the STABS table or resizing the STABS section, we simply modify it to also contain this 0x40100c address, as YASM does:

And oh, indeed now gdb is able to list our code, even with correct line numbers! Except…

Well, while in the source code listing GDB did somehow figure out that the line numbers provided in the STABS couldn’t be right, when setting breakpoints at specific line numbers it still gets confused. So we do need both edits, first correcting the line numbers, and second, either removing the additional SO entry, or putting a more meanifingful value there.

Summary

Without deeper knowledge it’s hard to judge whether GDB is at fault for not ignoring the last zero entry or whether such thing is indeed illegal. However, the line number information that NASM computes seems to be wrong in any case, and YASM does simply behave better. Time to switch?

https://en.wikipedia.org/wiki/Sun_Microsystems#Acquisition_by_Oracle↩︎
The intel syntax was invented by intel specifically for the x86 microprocessor series, while the AT&T syntax was used by the company of the same name also for other architectures. They differ in mnemonics (opcode names), use of prefixes and suffixes, and even operand order! The GNU objdump program can disassemble (-d) a binary and display the result in either syntax.↩︎
For those not acquiantanced with addr2line, we can specify an executable using the -e option, followed by an address in the binary to find out what address maps to which line in the original source code – provided debug information is present.↩︎
Additionally to the -d disassembly-option, we use the -l option to list the corresponding source code lines.↩︎
A helpful look into the manual of objdump tells us, -G can be used to display the table of the STABS section.↩︎
Using the -h option to print all headers.↩︎
od(1) is the POSIX specified octal dump program. This is why we need so many additional options to force the output to actually be in hex. You can use the GNU hexdump tool instead, if you feel fancy.↩︎
For more details on what binary operations are, in theory, possible, run rabin2 -O help.↩︎
radare2 has a built-in help mechanism, use p? in the shell to get more information on the p command.↩︎