Chapter 2.2 - Control Directives

Table of Contents

  • 2.2  Control directives
    • 2.2.1  Numerical constants
    • 2.2.2  Conditional assembly
    • 2.2.3  Repeating blocks of instructions
    • 2.2.4  Addressing spaces
    • 2.2.5  Other directives
    • 2.2.6  Multiple passes

 

2.2 Control directives

This section describes the directives that control the assembly process, they are processed during the assembly and may cause some blocks of instructions to be assembled differently or not assembled at all.

2.2.1 Numerical constants

The = directive allows to define the numerical constant. It should be preceded by the name for the constant and followed by the numerical expression providing the value. The value of such constants can be a number or an address, but - unlike labels - the numerical constants are not allowed to hold the register-based addresses. Besides this difference, in their basic variant numerical constants behave very much like labels and you can even forward-reference them (access their values before they actually get defined).

There is, however, a second variant of numerical constants, which is recognized by assembler when you try to define the constant of name, under which there already was a numerical constant defined. In such case assembler treats that constant as an assembly-time variable and allows it to be assigned with new value, but forbids forward-referencing it (for obvious reasons). Let's see both the variant of numerical constants in one example:

  1.     dd sum
  2.     x = 1
  3.     x = x+2
  4.     sum = x

Here the x is an assembly-time variable, and every time it is accessed, the value that was assigned to it the most recently is used. Thus if we tried to access the x before it gets defined the first time, like if we wrote dd x in place of the dd sum instruction, it would cause an error. And when it is re-defined with the x = x+2 directive, the previous value of x is used to calculate the new one. So when the sum constant gets defined, the x has value of 3, and this value is assigned to the sum. Since this one is defined only once in source, it is the standard numerical constant, and can be forward-referenced. So the dd sum is assembled as dd 3. To read more about how the assembler is able to resolve this, see section 2.2.6.

The value of numerical constant can be preceded by size operator, which can ensure that the value will fit in the range for the specified size, and can affect also how some of the calculations inside the numerical expression are performed. This example:

  1.     c8 = byte -1
  2.     c32 = dword -1

defines two different constants, the first one fits in 8 bits, the second one fits in 32 bits.

When you need to define constant with the value of address, which may be register-based (and thus you cannot employ numerical constant for this purpose), you can use the extended syntax of label directive (already described in section 1.2.3), like:

  1.     label myaddr at ebp+4

which declares label placed at ebp+4 address. However remember that labels, unlike numerical constants, cannot become assembly-time variables.

2.2.2 Conditional assembly

if directive causes come block of instructions to be assembled only under certain condition. It should be followed by logical expression specifying the condition, instructions in next lines will be assembled only when this condition is met, otherwise they will be skipped. The optional else if directive followed with logical expression specifying additional condition begins the next block of instructions that will be assembled if previous conditions were not met, and the additional condition is met. The optional else directive begins the block of instructions that will be assembled if all the conditions were not met. The end if directive ends the last block of instructions.

You should note that if directive is processed at assembly stage and therefore it doesn't affect any preprocessor directives, like the definitions of symbolic constants and macroinstructions - when the assembler recognizes the if directive, all the preprocessing has been already finished.

The logical expression consist of logical values and logical operators. The logical operators are ~ for logical negation, & for logical and, | for logical or. The negation has the highest priority. Logical value can be a numerical expression, it will be false if it is equal to zero, otherwise it will be true. Two numerical expression can be compared using one of the following operators to make the logical value: = (equal), < (less), > (greater), <= (less or equal), >= (greater or equal), <> (not equal).

The used operator followed by a symbol name, is the logical value that checks whether the given symbol is used somewhere (it returns correct result even if symbol is used only after this check). The defined operator can be followed by any expression, usually just by a single symbol name; it checks whether the given expression contains only symbols that are defined in the source and accessible from the current position.

The following simple example uses the count constant that should be defined somewhere in source:

  1.     if count&gt;0
  2.         mov cx,count
  3.         rep movsb
  4.     end if

These two assembly instructions will be assembled only if the count constant is greater than 0. The next sample shows more complex conditional structure:

  1.     if count &amp; ~ count mod 4
  2.         mov cx,count/4
  3.         rep movsd
  4.     else if count&gt;4
  5.         mov cx,count/4
  6.         rep movsd
  7.         mov cx,count mod 4
  8.         rep movsb
  9.     else
  10.         mov cx,count
  11.         rep movsb
  12.     end if

The first block of instructions gets assembled when the count is non zero and divisible by four, if this condition is not met, the second logical expression, which follows the else if, is evaluated and if it's true, the second block of instructions get assembled, otherwise the last block of instructions, which follows the line containing only else, is assembled.

There are also operators that allow comparison of values being any chains of symbols. The eq compares two such values whether they are exactly the same. The in operator checks whether given value is a member of the list of values following this operator, the list should be enclosed between < and > characters, its members should be separated with commas. The symbols are considered the same when they have the same meaning for the assembler - for example pword and fword for assembler are the same and thus are not distinguished by the above operators. In the same way 16 eq 10h is the true condition, however 16 eq 10+4 is not.

The eqtype operator checks whether the two compared values have the same structure, and whether the structural elements are of the same type. The distinguished types include numerical expressions, individual quoted strings, floating point numbers, address expressions (the expressions enclosed in square brackets or preceded by ptr operator), instruction mnemonics, registers, size operators, jump type and code type operators. And each of the special characters that act as a separators, like comma or colon, is the separate type itself. For example, two values, each one consisting of register name followed by comma and numerical expression, will be regarded as of the same type, no matter what kind of register and how complicated numerical expression is used; with exception for the quoted strings and floating point values, which are the special kinds of numerical expressions and are treated as different types. Thus eax,16 eqtype fs,3+7 condition is true, but eax,16 eqtype eax,1.6 is false.

2.2.3 Repeating blocks of instructions

times directive repeats one instruction specified number of times. It should be followed by numerical expression specifying number of repeats and the instruction to repeat (optionally colon can be used to separate number and instruction). When special symbol % is used inside the instruction, it is equal to the number of current repeat. For example times 5 db % will define five bytes with values 1, 2, 3, 4, 5. Recursive use of times directive is also allowed, so times 3 times % db % will define six bytes with values 1, 1, 2, 1, 2, 3.

repeat directive repeats the whole block of instructions. It should be followed by numerical expression specifying number of repeats. Instructions to repeat are expected in next lines, ended with the end repeat directive, for example:

  1.     repeat 8
  2.         mov byte [bx],%
  3.         inc bx
  4.     end repeat

The generated code will store byte values from one to eight in the memory addressed by BX register.

Number of repeats can be zero, in that case the instructions are not assembled at all.

The break directive allows to stop repeating earlier and continue assembly from the first line after the end repeat. Combined with the if directive it allows to stop repeating under some special condition, like:

  1.     s = x/2
  2.     repeat 100
  3.         if x/s = s
  4.             break
  5.         end if
  6.         s = (s+x/s)/2
  7.     end repeat

The while directive repeats the block of instructions as long as the condition specified by the logical expression following it is true. The block of instructions to be repeated should end with the end while directive. Before each repetition the logical expression is evaluated and when its value is false, the assembly is continued starting from the first line after the end while. Also in this case the % symbol holds the number of current repeat. The break directive can be used to stop this kind of loop in the same way as with repeat directive. The previous sample can be rewritten to use the while instead of repeat this way:

  1.     s = x/2
  2.     while x/s &lt;&gt; s
  3.         s = (s+x/s)/2
  4.         if % = 100
  5.             break
  6.         end if
  7.     end while

The blocks defined with if, repeat and while can be nested in any order, however they should be closed in the same order in which they were started. The break directive always stops processing the block that was started last with either the repeat or while directive.

2.2.4 Addressing spaces

org directive sets address at which the following code is expected to appear in memory. It should be followed by numerical expression specifying the address. This directive begins the new addressing space, the following code itself is not moved in any way, but all the labels defined within it and the value of $ symbol are affected as if it was put at the given address. However it's the responsibility of programmer to put the code at correct address at run-time.

The load directive allows to define constant with a binary value loaded from the already assembled code. This directive should be followed by the name of the constant, then optionally size operator, then from operator and a numerical expression specifying a valid address in current addressing space. The size operator has unusual meaning in this case - it states how many bytes (up to 8) have to be loaded to form the binary value of constant. If no size operator is specified, one byte is loaded (thus value is in range from 0 to 255). The loaded data cannot exceed current offset.

The store directive can modify the already generated code by replacing some of the previously generated data with the value defined by given numerical expression, which follow. The expression can be preceded by the optional size operator to specify how large value the expression defines, and therefore how much bytes will be stored, if there is no size operator, the size of one byte is assumed. Then the at operator and the numerical expression defining the valid address in current addressing code space, at which the given value have to be stored should follow. This is a directive for advanced appliances and should be used carefully.

Both load and store directives are limited to operate on places in current addressing space. The $$ symbol is always equal to the base address of current addressing space, and the $ symbol is the address of current position in that addressing space, therefore these two values define limits of the area, where load and store can operate.

Combining the load and store directives allows to do things like encoding some of the already generated code. For example to encode the whole code generated in current addressing space you can use such block of directives:

  1.     repeat $-$$
  2.         load a byte from $$+%-1
  3.         store byte a xor c at $$+%-1
  4.     end repeat

and each byte of code will be xored with the value defined by c constant.

virtual defines virtual data at specified address. This data won't be included in the output file, but labels defined there can be used in other parts of source. This directive can be followed by at operator and the numerical expression specifying the address for virtual data, otherwise is uses current address, the same as virtual at $. Instructions defining data are expected in next lines, ended with end virtual directive. The block of virtual instructions itself is an independent addressing space, after it's ended, the context of previous addressing space is restored.

The virtual directive can be used to create union of some variables, for example:

  1.     GDTR dp ?
  2.     virtual at GDTR
  3.         GDT_limit dw ?
  4.         GDT_address dd ?
  5.     end virtual

It defines two labels for parts of the 48-bit variable at GDTR address.

It can be also used to define labels for some structures addressed by a register, for example:

  1.     virtual at bx
  2.         LDT_limit dw ?
  3.         LDT_address dd ?
  4.     end virtual

With such definition instruction mov ax,[LDT_limit] will be assembled to mov ax,[bx].

Declaring defined data values or instructions inside the virtual block would also be useful, because the load directive can be used to load the values from the virtually generated code into a constants. This directive should be used after the code it loads but before the virtual block ends, because it can only load the values from the same addressing space. For example:

  1.     virtual at 0
  2.         xor eax,eax
  3.         and edx,eax
  4.         load zeroq dword from 0
  5.     end virtual

The above piece of code will define the zeroq constant containing four bytes of the machine code of the instructions defined inside the virtual block. This method can be also used to load some binary value from external file. For example this code:

  1.     virtual at 0
  2.         file &#39;a.txt&#39;:10h,1
  3.         load char from 0
  4.     end virtual

loads the single byte from offset 10h in file a.txt into the char constant.

Any of the section directives described in 2.4 also begins a new addressing space.

2.2.5 Other directives

align directive aligns code or data to the specified boundary. It should be followed by a numerical expression specifying the number of bytes, to the multiply of which the current address has to be aligned. The boundary value has to be the power of two.

The align directive fills the bytes that had to be skipped to perform the alignment with the nop instructions and at the same time marks this area as uninitialized data, so if it is placed among other uninitialized data that wouldn't take space in the output file, the alignment bytes will act the same way. If you need to fill the alignment area with some other values, you can combine align with virtual to get the size of alignment needed and then create the alignment yourself, like:

  1.     virtual
  2.         align 16
  3.         a = $ - $$
  4.     end virtual
  5.     db a dup 0

The a constant is defined to be the difference between address after alignment and address of the virtual block (see previous section), so it is equal to the size of needed alignment space.

display directive displays the message at the assembly time. It should be followed by the quoted strings or byte values, separated with commas. It can be used to display values of some constants, for example:

  1.     bits = 16
  2.     display &#39;Current offset is 0x&#39;
  3.     repeat bits/4
  4.         d = &#39;0&#39; + $ shr (bits-%*4) and 0Fh
  5.         if d &gt; &#39;9&#39;
  6.             d = d + &#39;A&#39;-&#39;9&#39;-1
  7.         end if
  8.         display d
  9.     end repeat
  10.     display 13,10

This block of directives calculates the four hexadecimal digits of 16-bit value and converts them into characters for displaying. Note that this won't work if the adresses in current addressing space are relocatable (as it might happen with PE or object output formats), since only absolute values can be used this way. The absolute value may be obtained by calculating the relative address, like $-$$, or rva $ in case of PE format.

The err directive immediately terminates the assembly process when it is encountered by assembler.

2.2.6 Multiple passes

Because the assembler allows to reference some of the labels or constants before they get actually defined, it has to predict the values of such labels and if there is even a suspicion that prediction failed in at least one case, it does one more pass, assembling the whole source, this time doing better prediction based on the values the labels got in the previous pass.

The changing values of labels can cause some instructions to have encodings of different length, and this can cause the change in values of labels again. And since the labels and constants can also be used inside the expressions that affect the behavior of control directives, the whole block of source can be processed completely differently during the new pass. Thus the assembler does more and more passes, each time trying to do better predictions to approach the final solution, when all the values get predicted correctly. It uses various method for predicting the values, which has been chosen to allow finding in a few passes the solution of possibly smallest length for the most of the programs.

Some of the errors, like the values not fitting in required boundaries, are not signaled during those intermediate passes, since it may happen that when some of the values are predicted better, these errors will disappear. However if assembler meets some illegal syntax construction or unknown instruction, it always stops immediately. Also defining some label more than once causes such error, because it makes the predictions groundless.

Only the messages created with the display directive during the last performed pass get actually displayed. In case when the assembly has been stopped due to an error, these messages may reflect the predicted values that are not yet resolved correctly.

The solution may sometimes not exist and in such cases the assembler will never manage to make correct predictions - for this reason there is a limit for a number of passes, and when assembler reaches this limit, it stops and displays the message that it is not able to generate the correct output. Consider the following example:

  1.     if ~ defined alpha
  2.         alpha:
  3.     end if

The defined operator gives the true value when the expression following it could be calculated in this place, what in this case means that the alpha label is defined somewhere. But the above block causes this label to be defined only when the value given by defined operator is false, what leads to an antynomy and makes it impossible to resolve such code. When processing the if directive assembler has to predict whether the alpha label will be defined somewhere (it wouldn't have to predict only if the label was already defined earlier in this pass), and whatever the prediction is, the opposite always happens. Thus the assembly will fail, unless the alpha label is defined somewhere in source preceding the above block of instructions - in such case, as it was already noted, the prediction is not needed and the block will just get skipped.

The above sample might have been written as a try to define the label only when it was not yet defined. It fails, because the defined operator does check whether the label is defined anywhere, and this includes the definition inside this conditionally processed block. However adding some additional condition may make it possible to get it resolved:

  1.     if ~ defined alpha | defined @f
  2.         alpha:
  3.         @@:
  4.     end if

The @f is always the same label as the nearest @@ symbol in the source following it, so the above sample would mean the same if any unique name was used instead of the anonymous label. When alpha is not defined in any other place in source, the only possible solution is when this block gets defined, and this time this doesn't lead to the antynomy, because of the anonymous label which makes this block self-establishing. To better understand this, look at the blocks that has nothing more than this self-establishing:

  1.     if defined @f
  2.         @@:
  3.     end if

This is an example of source that may have more than one solution, as both cases when this block gets processed or not are equally correct. Which one of those two solutions we get depends on the algorithm on the assembler, in case of flat assembler - on the algorithm of predictions. Back to the previous sample, when alpha is not defined anywhere else, the condition for if block cannot be false, so we are left with only one possible solution, and we can hope the assembler will arrive at it. On the other hand, when alpha is defined in some other place, we've got two possible solutions again, but one of them causes alpha to be defined twice, and such an error causes assembler to abort the assembly immediately, as this is the kind of error that deeply disturbs the process of resolving. So we can get such source either correctly resolved or causing an error, and what we get may depend on the internal choices made by the assembler.

However there are some facts about such choices that are certain. When assembler has to check whether the given symbol is defined and it was already defined in the current pass, no prediction is needed - it was already noted above. And when the given symbol has been defined never before, including all the already finished passes, the assembler predicts it to be not defined. Knowing this, we can expect that the simple self-establishing block shown above will not be assembled at all and that the previous sample will resolve correctly when alpha is defined somewhere before our conditional block, while it will itself define alpha when it's not already defined earlier, thus potentially causing the error because of double definition if the alpha is also defined somewhere later.

The used operator may be expected to behave in a similar manner in analogous cases, however any other kinds of predictions my not be so simple and you should never rely on them this way.

The err directive, usually used to stop the assembly when some condition is met, stops the assembly immediatelly, regardless of whether the currect pass is final or intermediate. So even when the condition that caused this directive to be interpreted is temporary, and would eventually disappear in the later passes, the assembly is stopped anyway. If it's needed to stop the assembly only when the condition is permanent, and not just occuring in the intermediate assembly passes, the trick with rb -1 can be used instead. The rb directive does not cause an error when it is provided with negative value in the intermediate passes.