Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A accumulator - The accumulator is the register that is most often used for math calculations and is slightly more efficient than all other registers at doing said math calculations. Some people say only EAX is the accumulator. Not true. EAX, AX, and AL are all considered accumulator registers. Notice that even though AH is a component of EAX, AH does not have the advantages of the accumulator. assemble - To assemble means "to compile assembly code". An assembler is any program that acts as an ASM compiler. B base - Inside a memory operand, the base is the second register that doesn't get multiplied. In DWORD PTR DS:[EAX*8+EDX+3029], the base is EDX since it is not being multiplied by a number, unlike EAX. binary copy & paste - Binary copy and paste is an OllyDbg feature that lets you copy and paste ASM code from the debugger window in pure hexadecimal format. Personally, I think this should be renamed "hexadecimal copy & paste". binary table file - The file binary_table.dat is the heart of the Doukutsu Assembler. It specifies the majority of the 32-bit x86 binary instruction encodings. C comma separator - The comma separator is used to separate operands in many assembly languages. For example - IMUL EAX,ECX,1A has 2 comma separators. D Doukutsu Assembler - An assembler used to compile 32-bit x86 assembly code and patch pre-existing executables with that new code. I created this program in order to simplify and enhance Cave Story .exe modification. It is intended that you use the Doukutsu Assembler along with OllyDbg. E efficiency - Efficiency measures how fast code can execute or how compact it is (space-efficiency). Despite what the experts say, high-level code compilers almost never produce machine code nearly as efficient as hand-written assembly. encoding - An encoding is the binary representation of an instruction. The encoding for ADD (register1),(register2) is: 0000000<w>11<register2><register1>To create the instruction ADD EDX,EAX, we have to realize that 000 means EAX and 010 means EDX. We also have to set the w-bit to 1 to show that the wideness of this instruction is 32-bits. You get: 0000000111000010 = ADD EDX,EAXF G H high-level language - Any programming language that tends to completely hide or mask the underlying machine code that the CPU executes. These are often easier to use than low-level languages such as assembly. One good example of a high level language is Python. I immediate data - Any 8-bit, 16-bit, 32-bit, or 64-bit number represented as a series of bytes. 0x4050ABFF and 0x0129 are both examples of immediate data. index - Inside a memory operand, the index is the register that gets multiplied. In DWORD PTR DS:[EAX*8+EDX+3029], the index is EAX because it is being multiplied by 8. Intel - A big company well known for their computer chips. They invented the x86 family of microprocessors, which are widely used today. J K L label - Labels are identifiers that mark locations/addresses in a program. In some languages, you can use a goto statement to jump to a label and continue the code execution from that point. Goto statements are considered very bad practice in nearly all programming languages because overusing them creates difficult-to-read code. However, in ASM labels are convenient because ASM does not have if-statements, while-loops, for-loops, etc. All of those control-flow statements are instead represented with jumps and addresses. low-level language - A programming language that directly represents machine code. All assembly languages are low-level languages. Assembly is sometimes taught in computer science courses to teach people how computers "really work". M machine code - Hexadecimal version of assembly code. Pretty much the same thing as native code. magic number - A magic number is a unique hexadecimal number that is used to clearly identify something in a program. Usually this is a "readable" hex-string such as 0xDEADBEEF (dead beef) or 0xCAFEBABE (cafe babe). The Doukutsu Assembler, while parsing labels, will use the 32-bit number 0xBADC0DE (bad code) to replace a label in an instruction in order to figure out the instruction's expected size. If you see something like MOV AX,WORD [BADC0DE] in an error message, then you know where that number comes from. medium-level language - Any programming language that hides or masks the underlying machine code, but still offers some features of low-level ASM. C++ can be considered a medium-level language because it allows the user to directly control pointers. memory operand - A memory operand refers to an address that holds some data, such as a 32-bit, 16-bit, or 8-bit integer. Also see pointer. meta-token - This is a term I invented. (actually, I looked it up on Google, and I got some results. So I guess I didn't invent it). Meta-tokens are tokens that have been partially compiled into hex codes. In meta-tokens, jumps and instructions with labels in them have not been compiled because label addresses have not been calculated yet. Also see token. mnemonic - The mnemonic is the "name" of each assembly instruction. For example, the instruction MOV AL,CH has the mnemonic MOV. N native code - Hexadecimal version of assembly code. O OllyDbg - A 32-bit x86-assembly Windows debugger. Download it here. If you're on a 64-bit OS, you can only run OllyDbg version 2.00 or greater. If you're on 64-bit Windows 7, I highly recommend version 2.01 alpha 3 because otherwise you can't use the "Copy to Executable" feature. operand - Operands are the "arguments" of each assembly instruction (sort of). For example, the instruction SUB EAX,200 has two operands: EAX and 200. Operand literally means "operated upon". P pipe separator - The pipe separator is used in Doukutsu Assembler code in order to write multiple instructions on the same line. For example, to end a function, you can write MOV ESP,EBP | POP EBP | RETN. pointer - A value that points to an address. In assembly, a pointer dereference is written with square brackets. ADD DWORD PTR DS:[499000],0A means "add 10 to the 32-bit integer variable located at address 499000". Pointer dereferences are also called memory or memory operands. protected mode - Protected mode is a certain operating mode of the x86 that allows for stuff like virtual memory and paging. Pretty much all modern operating systems that use the x86 run in protected mode, and so you'll be writing all your ASM code in protected-mode format. People often say, "if it can be done on the computer, it can be done in assembly". Well yes, that's true, but not for you. In protected mode, you are not allowed to have full control over the computer. The operation system is shielded from you. You can't change the video settings (at least not directly). And no, you can't enter real mode. Programs can do some fairly amazing things. But the operating system almost always reigns supreme. References: [What does protected mode mean?] [PC Assembly Language Manual] Q R real mode - Real mode for the x86 certainly has its limitations, but is also extremely powerful. You have full control over the computer and you can access any memory address, including memory belonging to other programs. Back in the days when DOS was used, people sometimes wrote entire programs in real mode. S scale - Inside a memory operand, the scale is the number that a register gets multiplied by. In DWORD PTR DS:[EAX*8+EDX+3029], the scale is the number 8. In binary, the scale is written with only 2 bits. There are only four possible scales: 1,2,4, or 8. T tilde operator - The tilde operator (~) is used in Doukutsu Assembler code to transform the decimal number right after it into a hex number. For example, if you wanted to store the decimal number 529 into EAX and were too lazy to use a hex calculator, just write MOV EAX,~529. token - A token is a small piece of text that the Doukutsu Assembler uses while parsing ASM code. The instruction LEA EAX,[ECX*4+500] can be split into 3 tokens: LEA, EAX, and [ECX*4+500]. U V variation - Another term I invented. A variation is a particular combination of an instruction and its operands. MOV (register),(register) is a different variation of MOV than MOV (memory),(hex number). I believe Intel would call these "encodings". W X x86 assembly (32-bit and 64-bit) - x86 assembly is the ASM language used for Intel's x86 family of processors. The 32-bit version uses mainly 32-bit registers like EAX and ECX. The 64-bit version has 64-bit registers like RAX and RCX. 32-bit applications can still run on the 64-bit version, usually. x64 assembly - x64 is exactly the same thing as 64-bit x86. Y Z z80 assembly - (Trivia only) - The z80 is an 8-bit microprocessor used in modern graphing calculators (especially ones made by Texas Instruments). Processors very similar to the z80 were also used in the original Game Boy and Game Boy Color. The z80 ASM language is ridiculously simple and uses mostly 8-bit registers. If you're learning ASM for the first time, this is actually a good place to start understanding the basics. Back to the Table of Contents |