disassemble.library General 'disassemble.library' is a shareable AmigaDOS library which is a disassembler for the MC68000 family of processors. It disassembles code for the MC68000, MC68010, MC68020 and MC68030 processors, for the MC68851 memory management unit and for the MC68881 and MC68882 floating point coprocessors. It is capable of symbolic disassembly, will generate labels at referenced locations, and is highly controllable through a set of style flags. The library's single entry point, Disassemble, will attempt to disassemble one instruction per call. It communicates with its caller through a passed information vector, which includes pointers to routines to call to process text output, access symbolic information, record label locations, etc. There are two main reasons why I separated this functionality into a shareable library. One is that I wanted to share the code (which is fairly bulky) between a file disassembler/dumper and a debugger. The second is that I plan to write an entire set of such shared libraries, and this one has given me experience in how to go about it, and some of the consequences of doing it. In order to use this library you must copy file 'disassemble.library' to your LIBS: directory. This is where AmigaDOS looks when it needs to load the library in response to an OpenLibrary system call. To use the library with your own programs, you will need a set of interface stubs or definitions, depending on the language and compiler you use. The needed information is in the accompanying 'fd' file (disassemble_lib.fd) and in this document. I have included a defining include file and an interface library for Draco users. I have tested the library, as used by my disassembler/dumper, Dis, fairly extensively. There are bound to be some bugs left, however. Please let me know at one of the following electronic mail addresses if you find any: Chris Gray usenet: {uunet,alberta}!myrias!ami-cg!cg CIS: 74007,1165 Sending me physical mail works, but I am VERY slow at answering (up to 6 months on one occasion!). Trying to telephone me can be expensive - you are more likely to get my modem. Interfacing to the Library All communications to and from the library is done through an information structure, the address of which is passed in register A0. The structure is declared (in Draco) as follows: type DisassemblerState_t = struct { proc(/* ulong address(d0) */)uint ds_readWord; proc(/* char ch(d0) */)void ds_putChar; proc(/* ulong addr(d0) */)*char ds_findLabel; proc(/* ulong addr(d0), refAt(d1); *ulong pTrueAddr(a0) */)*char ds_findAbsSymbol; proc(/* long offset(d0); ulong refAt(d1) */)*char ds_findRelCode; proc(/* long offset(d0); ulong refAt(d1);*long pTrueOffset(a0) */)*char ds_findRelData; proc(/* ulong addr(d0) */)void ds_labelAt; proc(/* ulong addr(d0) */)void ds_branchTo; proc(/* ulong addr(d0) */)bool ds_isLabel; ulong ds_address; ulong ds_relativeBase; *char ds_errorMessage; uint ds_operandColumn; uint ds_column; uint ds_extraWord; bool ds_putPosition; bool ds_absoluteAddress; bool ds_putErrors; bool ds_capExtended; bool ds_putAddress; bool ds_putRelForm; bool ds_extended; bool ds_extendedNow; bool ds_illegal; bool ds_hadExtraWord; }; The first few fields are the addresses of functions which the library can call to perform various needed operations. All such addresses are 32 bit values. Fields of type 'ulong' are 32 bit unsigned integers. Fields of type 'uint' are 16 bit unsigned integers. Fields of type 'bool' are 8 bit 1/0 true/false values. In more detail: ds_readWord - this function is passed a 32 bit address or offset in register D0. It should return the 16 bit contents of that location in register D0. The addresses passed are all based on the value given in field 'ds_address', thus they can be real addresses or offsets into a buffer or hunk, depending on what the caller does. This routine MUST be supplied. The library does not try to reference any memory directly - all references will be through this function. ds_putChar - this function is passed a character in the low 8 bits of register D0. That character is part of the disassembled instruction. All output from the library will go through this function. If this function is not present (value is nil, a 32 bit, 0 value), then no output is done. This mode of operation runs slightly faster, and can be used to simply check for valid instructions or for a pre-scan to find label references. ds_findLabel - this function is passed a 32 bit address in register D0, and should return nil or the address of a symbol which is a symbolic label for that address. If no symbolic information is being used, this routine can be omitted. Any pointer returned must be valid until this call of Disassemble returns, but not beyond. ds_findAbsSymbol - this function is used to find symbolic names for addresses that are referenced as 32 bit absolute addresses. The address in question is passed in register D0. The address or offset within the code being disassembled (based on ds_address) at which the reference occurs is passed in register D1. This information can be used with relocation information supplied in AmigaDOS object files. Register A0 contains the address of a 32 bit value which should be filled in with the true address of the symbolic value. The pointer returned in D0 should be nil if no appropriate symbol was found or the address of a null-terminated string. As an example, suppose that label 'Fred' represents offset 0x208 in the code being disassembled, and a call to 'ds_findAbsSymbol' is made by the library with the following parameters: D0 - 0x20d D1 - 0x32 A0 - ???? It would be appropriate to return the string 'Fred', and to store the value 0x208 into the region pointed to by A0. The library would then show a reference like 'Fred+0x5'. As usual, this routine can be omitted if no symbolic information is available. ds_findRelCode - this function is used for references that are PC- relative, so what it should return are labels within the code. No ability is provided on this function to provide the closest label - most code doesn't branch to just past a label. ds_findRelData - this function is used for references that are relative to register A4. This allows symbolic disassembly of small-model data references generated by the Lattice and Aztec C compilers. ds_labelAt - this function is called when a PC relative data refererence is found in the code. A user program would supply an address here if it wanted to keep track of where labels should be. A bitmap is a good way of doing this. Keeping track of labels this way is generally only of use if a two-pass disassembly is going to be used. ds_branchTo - this function is similar to 'ds_labelAt' except that it is called only for branch and jump targets. In other words, the address given must be a code address, since it is a branch target. ds_isLabel - this function, if present, is called to determine if there was a reference to the given address. This is used to know whether or not to generate a label in front of the instruction being disassembled. ds_address - this is the address or offset that disassembly is occurring at. It is the value which will be given to 'ds_readWord' to get the first word of the instruction. The field is properly updated as disassembly occurs, so it need only be set before the first call to Disassemble. If multi-pass disassembly is being used (e.g. to produce labels), it should be reset before each pass. ds_relativeBase - this is the current base address for disassembly. Labels will be relative to this base. E.g. if an instruction 8 bytes past this address needed a label, the label would be either 'L008' or 'L00000008', depending on label size. It will be updated by the libary to the current value of 'ds_address' if 'ds_findLabel' yields a label for the current value of 'ds_address'. Even though the field is maintained, it is not always used. See the description of 'ds_absoluteAddress'. ds_errorMessage - this field is occasionally filled in by the library with a specific error message concerning the disassembly. It is cleared at the start of each call, so if the field is non-null when Disassemble returns, it points to an error message. The message is not dynamically allocated, so it should not be freed or modified by the caller. ds_operandColumn - this 16 bit field should be filled in with the column at which the caller wants instruction operands to start. Spacing with blanks will be used to pad out to the desired column. If the instruction field, etc. already extends past the target column, no spacing will be used. A reasonable value for this field is 20 if initial addresses are not enabled, or 31 if they are. ds_column - this 16 bit field is used internally to count columns ds_extraWord - this 16 bit field is used internally to remember a second word of an invalid instruction, so that it can be dumped in hexadecimal. ds_putPosition - this 8 bit flag field controls whether or not the library will display hexadecimal addresses at the beginning of the output lines. As with the other flag fields, a value of 0 is treated as 'false', and any other value as 'true'. The addresses will either be 32 bit absolute ones (the value of 'ds_address') or will be 16 bit relative ones ('ds_address' - 'ds_relativeBase') depending on whether or not 'ds_absoluteAddress' is set. ds_absolueAddress - this flag field controls the form of labels and of the position display. If it is set, they are 32 bit values taken direct from 'ds_address'. If not set, they are 16 bit relative values computed as (address - 'ds_relativeBase'). For most purposes, the relative form is tidier. ds_putErrors - this flag controls whether or not the library will output error messages that are returned in 'ds_errorMessage'. Tighter formatting control can be obtained if this option is not used. ds_capExtended - this flag controls whether or not the library will capitalize instructions and modes that are not available on the MC68000. This is useful to make the non-68000 instructions stand out. ds_putAddress - this flag controls whether or not the hex address is displayed along with a symbolic or label form. It is useful if the symbolic or label forms are confused for some reason, and would be of value to a debugger, where all addresses are real. ds_putRelForm - this flag controls whether or not the relative form of PC-relative and A4-relative addressing is displayed along with any symbolic or label form. This is useful for those who wish to see the actual encoded form of the instructions, or if the symbolic or label forms are confused. ds_extended - this flag is initially cleared by the library and is set whenever a non-68000 instruction or mode is seen. Thus, after each call to Disassemble, this flag can be checked to see if a non-68000 form was seen. ds_extendedNow - this flag is used internally to know whether or not output should be capitalized. Note that symbolic names are never capitalized. ds_illegal - this flag, initially cleared, is set whenever any illegal instruction or mode is encountered. There will not always be an accompanying error message. Note also that I have not gone to the trouble of checking each addressing mode for each instruction, thus there are instruction forms which will not cause 'ds_illegal' to be set but which the actual processor will not execute. Also, the specific 'illegal' instruction, opcode 0x4afc, will not cause this flag to be set. ds_hadExtraWord - this flag is used internally to indicate that an illegal instruction encountered had a second or extended opcode word that should also be printed in hex. As an example, here is a simple one-pass disassembly of a small hunk of code: #drinc:disassemble.g uint R_D0 = 0, R_FP = 6, OP_MOVEB = 0x1000, OP_MOVEL = 0x2000, M_DDIR = 0, M_DISP = 5; proc readWord(/* ulong address */)uint: ulong address; code( OP_MOVEL | R_FP << 9 | M_DISP << 6 | M_DDIR << 3 | R_D0, address ); pretend(address, *uint)* corp; proc putChar(/* char ch */)void: char ch; code( OP_MOVEB | R_FP << 9 | M_DISP << 6 | M_DDIR << 3 | R_D0, ch ); if ch = '\n' then writeln(); else write(ch); fi; corp; proc main()void: extern tail()void; DisassemblerState_t ds; if OpenDisassembleLibrary(0) ~= nil then ds.ds_readWord := readWord; ds.ds_putChar := putChar; ds.ds_findLabel := nil; ds.ds_findAbsSymbol := nil; ds.ds_findRelCode := nil; ds.ds_findRelData := nil; ds.ds_labelAt := nil; ds.ds_branchTo := nil; ds.ds_isLabel := nil; ds.ds_address := pretend(readWord, ulong); ds.ds_relativeBase := 0; ds.ds_operandColumn := 31; ds.ds_putPosition := true; ds.ds_absoluteAddress := true; ds.ds_putErrors := true; ds.ds_capExtended := true; ds.ds_putAddress := false; ds.ds_putRelForm := false; while ds.ds_address < pretend(tail, ulong) do ignore Disassemble(&ds); od; CloseDisassembleLibrary(); else writeln("Can't open Disassemble.library"); fi; corp; proc tail()void: corp; Note the use of the 'code' construct to retrieve parameters passed in registers. Slightly different tricks would be needed to do this in other languages/compilers. For an example of using the library for full symbolic disassembly with label generation, see the source to the 'Dis' file disassembler/dumper, which is included in this archive.