The Intel® C++ Compiler supports Microsoft-style inline assembly with the -use-msasm compiler option. See your Microsoft documentation for the proper syntax.
The Intel® C++ Compiler supports GNU-like style inline assembly. The syntax is as follows:
asm-keyword [ volatile-keyword ] ( asm-template [ asm-interface ] ) ;
The Intel C++ Compiler also supports mixing UNIX and Microsoft style asms. Use the __asm__ keyword for GNU-style ASM when using the -use_msasm switch.
The Intel C++ Compiler supports gcc-style inline ASM if the assembler code uses AT&T* System V/386 syntax.
When compiling an assembly statement on Linux*, the compiler simply emits the asm-template to the assembly file after making any necessary operand substitutions. The compiler then calls the GNU assembler to generate machine code. In contrast, on Windows* the compiler itself must assemble the text contained in the asm-template string into machine code. In essence, the compiler contains a built-in assembler.
The compiler’s built-in assembler supports the GNU .byte directive but does not support other functionality of the GNU assembler, so there are limitations in the contents of the asm-template. The following assembler features are not currently supported.
Directives other than the .byte directive
Symbols*
* Direct symbol references in the asm-template are not supported. To access a C++ object, use the asm-interface with a substitution directive.
Incorrect method for accessing a C++ object:
__asm__("addl $5, _x");
Proper method for accessing a C++ object:
__asm__("addl $5, %0" : "+rm" (x));
Additionally, there are some restrictions on the usage of labels. The compiler only allows local labels, and only references to labels within the same assembly statement are permitted. A local label has the form “N:”, where N is a non-negative integer. N does not have to be unique, even within the same assembly statement. To reference the most recent definition of label N, use “Nb”. To reference the next definition of label N, use “Nf”. In this context, “b” means backward and “f” means forward. For more information, refer to the GNU assembler documentation.
GNU-style inline assembly statements on Windows* use the same assembly instruction format as on Linux*. This means that destination operands are on the right and source operands are on the left. This operand order is the reverse of Intel assembly syntax.
Due to the limitations of the compiler's built-in assembler, many assembly statements that compile and run on Linux* will not compile on Windows*. On the other hand, assembly statements that compile and run on Windows* should also compile and run on Linux*.
This feature provides a high-performance alternative to Microsoft-style inline assembly statements when portability between Windows*, Linux*, and Mac OS* X is important. Its intended use is in small primitives where high-performance integration with the surrounding C++ code is essential.
#ifdef _WIN64
#define INT64_PRINTF_FORMAT "I64"
#else
#define __int64 long long
#define INT64_PRINTF_FORMAT "L"
#endif
#include <stdio.h>
typedef struct {
__int64 lo64;
__int64 hi64;
} my_i128;
#define ADD128(out, in1, in2) \
__asm__("addq %2, %0; adcq %3, %1" : \
"=r"(out.lo64), "=r"(out.hi64) : \
"emr" (in2.lo64), "emr"(in2.hi64), \
"0" (in1.lo64), "1" (in1.hi64));
extern int
main()
{
my_i128 val1, val2, result;
val1.lo64 = ~0;
val1.hi64 = 0;
val2.hi64 = 65;
ADD128(result, val1, val2);
printf("0x%016" INT64_PRINTF_FORMAT "x%016" INT64_PRINTF_FORMAT "x\n",
val1.hi64, val1.lo64);
printf("+0x%016" INT64_PRINTF_FORMAT "x%016" INT64_PRINTF_FORMAT "x\n",
val2.hi64, val2.lo64);
printf("------------------------------------\n");
printf("0x%016" INT64_PRINTF_FORMAT "x%016" INT64_PRINTF_FORMAT "x\n",
result.hi64, result.lo64);
return 0;
}
This example, written for Intel(R) 64 architecture, shows how to use a GNU-style inline assembly statement to add two 128-bit integers. In this example, a 128-bit integer is represented as two __int64 objects in the my_i128 structure. The inline assembly statement used to implement the addition is contained in the ADD128 macro, which takes 3 my_i128 arguments representing 3 128-bit integers. The first argument is the output. The next two arguments are the inputs. The example compiles and runs using the Intel Compiler on Linux* or Windows*, producing the following output.
0x0000000000000000ffffffffffffffff
+ 0x00000000000000410000000000000001
------------------------------------
+ 0x00000000000000420000000000000000
In the GNU-style inline assembly implementation, the asm interface specifies all the inputs, outputs, and side effects of the asm statement, enabling the compiler to generate very efficient code.
mov r13, 0xffffffffffffffff
mov r12, 0x000000000
add r13, 1
adc r12, 65
It is worth noting that when the compiler generates an assembly file on Windows*, it uses Intel syntax even though the assembly statement was written using Linux* assembly syntax.
The compiler moves in1.lo64 into a register to match the constraint of operand 4. Operand 4's constraint of "0" indicates that it must be assigned the same location as output operand 0. And operand 0's constraint is "=r", indicating that it must be assigned an integer register. In this case, the compiler chooses r13. In the same way, the compiler moves in1.hi64 into register r12.
The constraints for input operands 2 and 3 allow the operands to be assigned a register location ("r"), a memory location ("m"), or a constant signed 32-bit integer value ("e"). In this case, the compiler chooses to match operands 2 and 3 with the constant values 1 and 65, enabling the add and adc instructions to utilize the "register-immediate" forms.
The same operation is much more expensive using a Microsoft-style inline assembly statement, because the interface between the assembly statement and the surrounding C++ code is entirely through memory. Using Microsoft assembly, the ADD128 macro might be written as follows.
#define ADD128(out, in1, in2) \
{ \
__asm mov rax, in1.lo64 \
__asm mov rdx, in1.hi64 \
__asm add rax, in2.lo64 \
__asm adc rdx, in2.hi64 \
__asm mov out.lo64, rax \
__asm mov out.hi64, rdx \
}
The compiler must add code before the assembly statement to move the inputs into memory, and it must add code after the assembly statement to retrieve the outputs from memory. This prevents the compiler from exploiting some optimization opportunities. Thus, the following assembly code is produced.
mov QWORD PTR [rsp+32], -1
mov QWORD PTR [rsp+40], 0
mov QWORD PTR [rsp+48], 1
mov QWORD PTR [rsp+56], 65
; Begin ASM
mov rax, QWORD PTR [rsp+32]
mov rdx, QWORD PTR [rsp+40]
add rax, QWORD PTR [rsp+48]
adc rdx, QWORD PTR [rsp+56]
mov QWORD PTR [rsp+64], rax
mov QWORD PTR [rsp+72], rdx
; End ASM
mov rdx, QWORD PTR [rsp+72]
mov r8, QWORD PTR [rsp+64]
The operation that took only 4 instructions and 0 memory references using GNU-style inline assembly takes 12 instructions with 12 memory references using Microsoft-style inline assembly.
Copyright © 1996-2010, Intel Corporation. All rights reserved.