Chapter 1

 

Table of Contents

  • 1.1  Compiler overview
    • 1.1.1  System requirements
    • 1.1.2  Executing compiler from command line
    • 1.1.3  Compiler messages
    • 1.1.4  Output formats
  • 1.2  Assembly syntax
    • 1.2.1  Instruction syntax
    • 1.2.2  Data definitions
    • 1.2.3  Constants and labels
    • 1.2.4  Numerical expressions
    • 1.2.5  Jumps and calls
    • 1.2.6  Size settings

Chapter 1
Introduction

This chapter contains all the most important information you need to begin using the flat assembler. If you are experienced assembly language programmer, you should read at least this chapter before using this compiler.

1.1 Compiler overview

Flat assembler is a fast assembly language compiler for the x86 architecture processors, which does multiple passes to optimize the size of generated machine code. It is self-compilable and versions for different operating systems are provided. All the versions are designed to be used from the system command line and they should not differ in behavior.

1.1.1 System requirements

All versions require the x86 architecture 32-bit processor (at least 80386), although they can produce programs for the x86 architecture 16-bit processors, too. DOS version requires an OS compatible with MS DOS 2.0 and either true real mode environment or DPMI. Windows version requires a Win32 console compatible with 3.1 version.

1.1.2 Executing compiler from command line

To execute flat assembler from the command line you need to provide two parameters - first should be name of source file, second should be name of destination file. If no second parameter is given, the name for output file will be guessed automatically. After displaying short information about the program name and version, compiler will read the data from source file and compile it. When the compilation is successful, compiler will write the generated code to the destination file and display the summary of compilation process; otherwise it will display the information about error that occurred.

In the command line you can also include -m option followed by a number, which specifies how many kilobytes of memory flat assembler should maximally use. In case of DOS version this options limits only the usage of extended memory. The -p option followed by a number can be used to specify the limit for number of passes the assembler performs. If code cannot be generated within specified amount of passes, the assembly will be terminated with an error message. The maximum value of this setting is 65536, while the default limit, used when no such option is included in command line, is 100.

The source file should be a text file, and can be created in any text editor. Line breaks are accepted in both DOS and Unix standards, tabulators are treated as spaces.

There are no command line options that would affect the output of compiler, flat assembler requires only the source code to include the information it really needs. For example, to specify output format you specify it by using the format directive at the beginning of source.

1.1.3 Compiler messages

As it is stated above, after the successful compilation, the compiler displays the compilation summary. It includes the information of how many passes was done, how much time it took, and how many bytes were written into the destination file. The following is an example of the compilation summary:

  1. flat assembler  version 1.69 (16384 kilobytes memory)
  2. 38 passes, 5.3 seconds, 77824 bytes.

In case of error during the compilation process, the program will display an error message. For example, when compiler can't find the input file, it will display the following message:

  1. flat assembler  version 1.69 (16384 kilobytes memory)
  2. error: source file not found.

If the error is connected with a specific part of source code, the source line that caused the error will be also displayed. Also placement of this line in the source is given to help you finding this error, for example:

  1. flat assembler  version 1.69 (16384 kilobytes memory)
  2. example.asm [3]:
  3.         mob     ax,1
  4. error: illegal instruction.

It means that in the third line of the example.asm file compiler has encountered an unrecognized instruction. When the line that caused error contains a macroinstruction, also the line in macroinstruction definition that generated the erroneous instruction is displayed:

  1. flat assembler  version 1.69 (16384 kilobytes memory)
  2. example.asm [6]:
  3.         stoschar 7
  4. example.asm [3] stoschar [1]:
  5.         mob     al,char
  6. error: illegal instruction.

It means that the macroinstruction in the sixth line of the example.asm file generated an unrecognized instruction with the first line of its definition.

1.1.4 Output formats

By default, when there is no format directive in source file, flat assembler simply puts generated instruction codes into output, creating this way flat binary file. By default it generates 16-bit code, but you can always turn it into the 16-bit or 32-bit mode by using use16 or use32 directive. Some of the output formats switch into 32-bit mode, when selected - more information about formats which you can choose can be found in 2.4.

All output code is always in the order in which it was entered into the source file.

1.2 Assembly syntax

The information provided below is intended mainly for the assembler programmers that have been using some other assembly compilers before. If you are beginner, you should look for the assembly programming tutorials.

Flat assembler by default uses the Intel syntax for the assembly instructions, although you can customize it using the preprocessor capabilities (macroinstructions and symbolic constants). It also has its own set of the directives - the instructions for compiler.

All symbols defined inside the sources are case-sensitive.

1.2.1 Instruction syntax

Instructions in assembly language are separated by line breaks, and one instruction is expected to fill the one line of text. If a line contains a semicolon, except for the semicolons inside the quoted strings, the rest of this line is the comment and compiler ignores it. If a line ends with \character (eventually the semicolon and comment may follow it), the next line is attached at this point.

Each line in source is the sequence of items, which may be one of the three types. One type are the symbol characters, which are the special characters that are individual items even when are not spaced from the other ones. Any of the +-/*=<>()[]{}:,|&~#`is the symbol character. The sequence of other characters, separated from other items with either blank spaces or symbol characters, is a symbol. If the first character of symbol is either a single or double quote, it integrates any sequence of characters following it, even the special ones, into a quoted string, which should end with the same character, with which it began (the single or double quote) - however if there are two such characters in a row (without any other character between them), they are integrated into quoted string as just one of them and the quoted string continues then. The symbols other than symbol characters and quoted strings can be used as names, so are also called the name symbols.

Every instruction consists of the mnemonic and the various number of operands, separated with commas. The operand can be register, immediate value or a data addressed in memory, it can also be preceded by size operator to define or override its size (table 1.1). Names of available registers you can find in table 1.2, their sizes cannot be overridden. Immediate value can be specified by any numerical expression.

When operand is a data in memory, the address of that data (also any numerical expression, but it may contain registers) should be enclosed in square brackets or preceded by ptroperator. For example instruction mov eax,3will put the immediate value 3 into the EAX register, instruction mov eax,[7]will put the 32-bit value from the address 7 into EAX and the instruction mov byte [7],3will put the immediate value 3 into the byte at address 7, it can also be written as mov byte ptr 7,3. To specify which segment register should be used for addressing, segment register name followed by a colon should be put just before the address value (inside the square brackets or after the ptroperator).

Table 1.1 Size operators

Operator Bits Bytes
byte 8 1
word 16 2
dword 32 4
fword 48 6
pword 48 6
qword 64 8
tbyte 80 10
tword 80 10
dqword 128 16
xword 128 16
qqword 256 32
yword 256 32

Table 1.2 Registers

Type Bits  
General 8
al cl dl bl ah ch dh bh
16
ax cx dx bx sp bp si di
32
eax ecx edx ebx esp ebp esi edi
Segment 16
es cs ss ds fs gs    
Control 32
cr0   cr2 cr3 cr4      
Debug 32
dr0 dr1 dr2 dr3     dr6 dr7
FPU 80
st0 st1 st2 st3 st4 st5 st6 st7
MMX 64
mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7
SSE 128
xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7
AVX 256
ymm0 ymm1 ymm2 ymm3 ymm4 ymm5 ymm6 ymm7

1.2.2 Data definitions

To define data or reserve a space for it, use one of the directives listed in table 1.3. The data definition directive should be followed by one or more of numerical expressions, separated with commas. These expressions define the values for data cells of size depending on which directive is used. For example db 1,2,3 will define the three bytes of values 1, 2 and 3 respectively.

The db and du directives also accept the quoted string values of any length, which will be converted into chain of bytes when db is used and into chain of words with zeroed high byte when du is used. For example db 'abc' will define the three bytes of values 61, 62 and 63.

The dp directive and its synonym df accept the values consisting of two numerical expressions separated with colon, the first value will become the high word and the second value will become the low double word of the far pointer value. Also dd accepts such pointers consisting of two word values separated with colon, and dt accepts the word and quad word value separated with colon, the quad word is stored first. The dt directive with single expression as parameter accepts only floating point values and creates data in FPU double extended precision format.

Any of the above directive allows the usage of special dup operator to make multiple copies of given values. The count of duplicates should precede this operator and the value to duplicate should follow - it can even be the chain of values separated with commas, but such set of values needs to be enclosed with parenthesis, like db 5 dup (1,2), which defines five copies of the given two byte sequence.

The file is a special directive and its syntax is different. This directive includes a chain of bytes from file and it should be followed by the quoted file name, then optionally numerical expression specifying offset in file preceded by the colon, then - also optionally - comma and numerical expression specifying count of bytes to include (if no count is specified, all data up to the end of file is included). For example file 'data.bin' will include the whole file as binary data and file 'data.bin':10h,4 will include only four bytes starting at offset 10h.

The data reservation directive should be followed by only one numerical expression, and this value defines how many cells of the specified size should be reserved. All data definition directives also accept the ? value, which means that this cell should not be initialized to any value and the effect is the same as by using the data reservation directive. The uninitialized data may not be included in the output file, so its values should be always considered unknown.

Table 1.3 Data directives

Size (bytes) Define data Reserve data
1
db
file
rb
2
dw
du
rw
4 dd rd
6
dp
df
rp
rf
8 dq rq
10 dt rt

1.2.3 Constants and labels

In the numerical expressions you can also use constants or labels instead of numbers. To define the constant or label you should use the specific directives. Each label can be defined only once and it is accessible from the any place of source (even before it was defined). Constant can be redefined many times, but in this case it is accessible only after it was defined, and is always equal to the value from last definition before the place where it's used. When a constant is defined only once in source, it is - like the label - accessible from anywhere.

The definition of constant consists of name of the constant followed by the = character and numerical expression, which after calculation will become the value of constant. This value is always calculated at the time the constant is defined. For example you can define count constant by using the directive count = 17 and then use it in the assembly instructions, like mov cx,count - which will become mov cx,17 during the compilation process.

There are different ways to define labels. The simplest is to follow the name of label by the colon, this directive can even be followed by the other instruction in the same line. It defines the label whose value is equal to offset of the point where it's defined. This method is usually used to label the places in code. The other way is to follow the name of label (without a colon) by some data directive. It defines the label with value equal to offset of the beginning of defined data, and remembered as a label for data with cell size as specified for that data directive in table 1.3.

The label can be treated as constant of value equal to offset of labeled code or data. For example when you define data using the labeled directive char db 224, to put the offset of this data into BX register you should use mov bx,char instruction, and to put the value of byte addressed by char label to DL register, you should use mov dl,[char] (or mov dl,ptr char). But when you try to assemble mov ax,[char], it will cause an error, because fasm compares the sizes of operands, which should be equal. You can force assembling that instruction by using size override: mov ax,word [char], but remember that this instruction will read the two bytes beginning at char address, while it was defined as a one byte.

The last and the most flexible way to define labels is to use label directive. This directive should be followed by the name of label, then optionally size operator (it can be preceded by a colon) and then - also optionally at operator and the numerical expression defining the address at which this label should be defined. For example label wchar word at char will define a new label for the 16-bit data at the address of char. Now the instruction mov ax,[wchar] will be after compilation the same as mov ax,word [char]. If no address is specified, label directive defines the label at current offset. Thus mov [wchar],57568 will copy two bytes while mov [char],224 will copy one byte to the same address.

The label whose name begins with dot is treated as local label, and its name is attached to the name of last global label (with name beginning with anything but dot) to make the full name of this label. So you can use the short name (beginning with dot) of this label anywhere before the next global label is defined, and in the other places you have to use the full name. Label beginning with two dots are the exception - they are like global, but they don't become the new prefix for local labels.

The @@ name means anonymous label, you can have defined many of them in the source. Symbol @b (or equivalent @r) references the nearest preceding anonymous label, symbol @f references the nearest following anonymous label. These special symbol are case-insensitive.

1.2.4 Numerical expressions

In the above examples all the numerical expressions were the simple numbers, constants or labels. But they can be more complex, by using the arithmetical or logical operators for calculations at compile time. All these operators with their priority values are listed in table 1.4. The operations with higher priority value will be calculated first, you can of course change this behavior by putting some parts of expression into parenthesis. The +, -, * and / are standard arithmetical operations, mod calculates the remainder from division. The and, or, xor, shl, shr and not perform the same logical operations as assembly instructions of those names. The rva performs the conversion of an address into the relocatable offset and is specific to some of the output formats (see 2.4).

The numbers in the expression are by default treated as a decimal, binary numbers should have the b letter attached at the end, octal number should end with o letter, hexadecimal numbers should begin with 0x characters (like in C language) or with the $ character (like in Pascal language) or they should end with h letter. Also quoted string, when encountered in expression, will be converted into number - the first character will become the least significant byte of number.

The numerical expression used as an address value can also contain any of general registers used for addressing, they can be added and multiplied by appropriate values, as it is allowed for the x86 architecture instructions.

There are also some special symbols that can be used inside the numerical expression. First is $, which is always equal to the value of current offset, while $$ is equal to base address of current addressing space. The other one is %, which is the number of current repeat in parts of code that are repeated using some special directives (see 2.2). There's also %t symbol, which is always equal to the current time stamp.

Any numerical expression can also consist of single floating point value (flat assembler does not allow any floating point operations at compilation time) in the scientific notation, they can end with the f letter to be recognized, otherwise they should contain at least one of the . or E characters. So 1.0, 1E0 and 1f define the same floating point value, while simple 1 defines an integer value.

Table 1.4 Arithmetical and logical operators by priority

Priority Operators
0 + -
1 * /
2 mod
3 and or xor
4 shl shr
5 not
6 rva

1.2.5 Jumps and calls

The operand of any jump or call instruction can be preceded not only by the size operator, but also by one of the operators specifying type of the jump: short, near of far. For example, when assembler is in 16-bit mode, instruction jmp dword [0] will become the far jump and when assembler is in 32-bit mode, it will become the near jump. To force this instruction to be treated differently, use the jmp near dword [0] or jmp far dword [0] form.

When operand of near jump is the immediate value, assembler will generate the shortest variant of this jump instruction if possible (but won't create 32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode, unless there is a size operator stating it). By specifying the jump type you can force it to always generate long variant (for example jmp near 0) or to always generate short variant and terminate with an error when it's impossible (for example jmp short 0).

1.2.6 Size settings

When instruction uses some memory addressing, by default the smallest form of instruction is generated by using the short displacement if only address value fits in the range. This can be overridden using the word or dword operator before the address inside the square brackets (or after the ptr operator), which forces the long displacement of appropriate size to be made. In case when address is not relative to any registers, those operators allow also to choose the appropriate mode of absolute addressing.

Instructions adc, add, and, cmp, or, sbb, sub and xor with first operand being 16-bit or 32-bit are by default generated in shortened 8-bit form when the second operand is immediate value fitting in the range for signed 8-bit values. It also can be overridden by putting the word or dword operator before the immediate value. The similar rules applies to the imul instruction with the last operand being immediate value.

Immediate value as an operand for push instruction without a size operator is by default treated as a word value if assembler is in 16-bit mode and as a double word value if assembler is in 32-bit mode, shorter 8-bit form of this instruction is used if possible, word or dword size operator forces the push instruction to be generated in longer form for specified size. pushw and pushd mnemonics force assembler to generate 16-bit or 32-bit code without forcing it to use the longer form of instruction.