diff options
Diffstat (limited to 'docs/MIRLangRef.rst')
-rw-r--r-- | docs/MIRLangRef.rst | 495 |
1 files changed, 495 insertions, 0 deletions
diff --git a/docs/MIRLangRef.rst b/docs/MIRLangRef.rst new file mode 100644 index 000000000000..a5f8c8c743ab --- /dev/null +++ b/docs/MIRLangRef.rst @@ -0,0 +1,495 @@ +======================================== +Machine IR (MIR) Format Reference Manual +======================================== + +.. contents:: + :local: + +.. warning:: + This is a work in progress. + +Introduction +============ + +This document is a reference manual for the Machine IR (MIR) serialization +format. MIR is a human readable serialization format that is used to represent +LLVM's :ref:`machine specific intermediate representation +<machine code representation>`. + +The MIR serialization format is designed to be used for testing the code +generation passes in LLVM. + +Overview +======== + +The MIR serialization format uses a YAML container. YAML is a standard +data serialization language, and the full YAML language spec can be read at +`yaml.org +<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_. + +A MIR file is split up into a series of `YAML documents`_. The first document +can contain an optional embedded LLVM IR module, and the rest of the documents +contain the serialized machine functions. + +.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132 + +MIR Testing Guide +================= + +You can use the MIR format for testing in two different ways: + +- You can write MIR tests that invoke a single code generation pass using the + ``run-pass`` option in llc. + +- You can use llc's ``stop-after`` option with existing or new LLVM assembly + tests and check the MIR output of a specific code generation pass. + +Testing Individual Code Generation Passes +----------------------------------------- + +The ``run-pass`` option in llc allows you to create MIR tests that invoke +just a single code generation pass. When this option is used, llc will parse +an input MIR file, run the specified code generation pass, and print the +resulting MIR to the standard output stream. + +You can generate an input MIR file for the test by using the ``stop-after`` +option in llc. For example, if you would like to write a test for the +post register allocation pseudo instruction expansion pass, you can specify +the machine copy propagation pass in the ``stop-after`` option, as it runs +just before the pass that we are trying to test: + + ``llc -stop-after machine-cp bug-trigger.ll > test.mir`` + +After generating the input MIR file, you'll have to add a run line that uses +the ``-run-pass`` option to it. In order to test the post register allocation +pseudo instruction expansion pass on X86-64, a run line like the one shown +below can be used: + + ``# RUN: llc -run-pass postrapseudos -march=x86-64 %s -o /dev/null | FileCheck %s`` + +The MIR files are target dependent, so they have to be placed in the target +specific test directories. They also need to specify a target triple or a +target architecture either in the run line or in the embedded LLVM IR module. + +Limitations +----------- + +Currently the MIR format has several limitations in terms of which state it +can serialize: + +- The target-specific state in the target-specific ``MachineFunctionInfo`` + subclasses isn't serialized at the moment. + +- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and + SystemZ backends) aren't serialized at the moment. + +- The ``MCSymbol`` machine operands are only printed, they can't be parsed. + +- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI + instructions and the variable debug information from MMI is serialized right + now. + +These limitations impose restrictions on what you can test with the MIR format. +For now, tests that would like to test some behaviour that depends on the state +of certain ``MCSymbol`` operands or the exception handling state in MMI, can't +use the MIR format. As well as that, tests that test some behaviour that +depends on the state of the target specific ``MachineFunctionInfo`` or +``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment. + +High Level Structure +==================== + +.. _embedded-module: + +Embedded Module +--------------- + +When the first YAML document contains a `YAML block literal string`_, the MIR +parser will treat this string as an LLVM assembly language string that +represents an embedded LLVM IR module. +Here is an example of a YAML document that contains an LLVM module: + +.. code-block:: llvm + + --- | + define i32 @inc(i32* %x) { + entry: + %0 = load i32, i32* %x + %1 = add i32 %0, 1 + store i32 %1, i32* %x + ret i32 %1 + } + ... + +.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688 + +Machine Functions +----------------- + +The remaining YAML documents contain the machine functions. This is an example +of such YAML document: + +.. code-block:: llvm + + --- + name: inc + tracksRegLiveness: true + liveins: + - { reg: '%rdi' } + body: | + bb.0.entry: + liveins: %rdi + + %eax = MOV32rm %rdi, 1, _, 0, _ + %eax = INC32r killed %eax, implicit-def dead %eflags + MOV32mr killed %rdi, 1, _, 0, _, %eax + RETQ %eax + ... + +The document above consists of attributes that represent the various +properties and data structures in a machine function. + +The attribute ``name`` is required, and its value should be identical to the +name of a function that this machine function is based on. + +The attribute ``body`` is a `YAML block literal string`_. Its value represents +the function's machine basic blocks and their machine instructions. + +Machine Instructions Format Reference +===================================== + +The machine basic blocks and their instructions are represented using a custom, +human readable serialization language. This language is used in the +`YAML block literal string`_ that corresponds to the machine function's body. + +A source string that uses this language contains a list of machine basic +blocks, which are described in the section below. + +Machine Basic Blocks +-------------------- + +A machine basic block is defined in a single block definition source construct +that contains the block's ID. +The example below defines two blocks that have an ID of zero and one: + +.. code-block:: llvm + + bb.0: + <instructions> + bb.1: + <instructions> + +A machine basic block can also have a name. It should be specified after the ID +in the block's definition: + +.. code-block:: llvm + + bb.0.entry: ; This block's name is "entry" + <instructions> + +The block's name should be identical to the name of the IR block that this +machine block is based on. + +Block References +^^^^^^^^^^^^^^^^ + +The machine basic blocks are identified by their ID numbers. Individual +blocks are referenced using the following syntax: + +.. code-block:: llvm + + %bb.<id>[.<name>] + +Examples: + +.. code-block:: llvm + + %bb.0 + %bb.1.then + +Successors +^^^^^^^^^^ + +The machine basic block's successors have to be specified before any of the +instructions: + +.. code-block:: llvm + + bb.0.entry: + successors: %bb.1.then, %bb.2.else + <instructions> + bb.1.then: + <instructions> + bb.2.else: + <instructions> + +The branch weights can be specified in brackets after the successor blocks. +The example below defines a block that has two successors with branch weights +of 32 and 16: + +.. code-block:: llvm + + bb.0.entry: + successors: %bb.1.then(32), %bb.2.else(16) + +.. _bb-liveins: + +Live In Registers +^^^^^^^^^^^^^^^^^ + +The machine basic block's live in registers have to be specified before any of +the instructions: + +.. code-block:: llvm + + bb.0.entry: + liveins: %edi, %esi + +The list of live in registers and successors can be empty. The language also +allows multiple live in register and successor lists - they are combined into +one list by the parser. + +Miscellaneous Attributes +^^^^^^^^^^^^^^^^^^^^^^^^ + +The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be +specified in brackets after the block's definition: + +.. code-block:: llvm + + bb.0.entry (address-taken): + <instructions> + bb.2.else (align 4): + <instructions> + bb.3(landing-pad, align 4): + <instructions> + +.. TODO: Describe the way the reference to an unnamed LLVM IR block can be + preserved. + +Machine Instructions +-------------------- + +A machine instruction is composed of a name, +:ref:`machine operands <machine-operands>`, +:ref:`instruction flags <instruction-flags>`, and machine memory operands. + +The instruction's name is usually specified before the operands. The example +below shows an instance of the X86 ``RETQ`` instruction with a single machine +operand: + +.. code-block:: llvm + + RETQ %eax + +However, if the machine instruction has one or more explicitly defined register +operands, the instruction's name has to be specified after them. The example +below shows an instance of the AArch64 ``LDPXpost`` instruction with three +defined register operands: + +.. code-block:: llvm + + %sp, %fp, %lr = LDPXpost %sp, 2 + +The instruction names are serialized using the exact definitions from the +target's ``*InstrInfo.td`` files, and they are case sensitive. This means that +similar instruction names like ``TSTri`` and ``tSTRi`` represent different +machine instructions. + +.. _instruction-flags: + +Instruction Flags +^^^^^^^^^^^^^^^^^ + +The flag ``frame-setup`` can be specified before the instruction's name: + +.. code-block:: llvm + + %fp = frame-setup ADDXri %sp, 0, 0 + +.. _registers: + +Registers +--------- + +Registers are one of the key primitives in the machine instructions +serialization language. They are primarly used in the +:ref:`register machine operands <register-operands>`, +but they can also be used in a number of other places, like the +:ref:`basic block's live in list <bb-liveins>`. + +The physical registers are identified by their name. They use the following +syntax: + +.. code-block:: llvm + + %<name> + +The example below shows three X86 physical registers: + +.. code-block:: llvm + + %eax + %r15 + %eflags + +The virtual registers are identified by their ID number. They use the following +syntax: + +.. code-block:: llvm + + %<id> + +Example: + +.. code-block:: llvm + + %0 + +The null registers are represented using an underscore ('``_``'). They can also be +represented using a '``%noreg``' named register, although the former syntax +is preferred. + +.. _machine-operands: + +Machine Operands +---------------- + +There are seventeen different kinds of machine operands, and all of them, except +the ``MCSymbol`` operand, can be serialized. The ``MCSymbol`` operands are +just printed out - they can't be parsed back yet. + +Immediate Operands +^^^^^^^^^^^^^^^^^^ + +The immediate machine operands are untyped, 64-bit signed integers. The +example below shows an instance of the X86 ``MOV32ri`` instruction that has an +immediate machine operand ``-42``: + +.. code-block:: llvm + + %eax = MOV32ri -42 + +.. TODO: Describe the CIMM (Rare) and FPIMM immediate operands. + +.. _register-operands: + +Register Operands +^^^^^^^^^^^^^^^^^ + +The :ref:`register <registers>` primitive is used to represent the register +machine operands. The register operands can also have optional +:ref:`register flags <register-flags>`, +:ref:`a subregister index <subregister-indices>`, +and a reference to the tied register operand. +The full syntax of a register operand is shown below: + +.. code-block:: llvm + + [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ] + +This example shows an instance of the X86 ``XOR32rr`` instruction that has +5 register operands with different register flags: + +.. code-block:: llvm + + dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al + +.. _register-flags: + +Register Flags +~~~~~~~~~~~~~~ + +The table below shows all of the possible register flags along with the +corresponding internal ``llvm::RegState`` representation: + +.. list-table:: + :header-rows: 1 + + * - Flag + - Internal Value + + * - ``implicit`` + - ``RegState::Implicit`` + + * - ``implicit-def`` + - ``RegState::ImplicitDefine`` + + * - ``def`` + - ``RegState::Define`` + + * - ``dead`` + - ``RegState::Dead`` + + * - ``killed`` + - ``RegState::Kill`` + + * - ``undef`` + - ``RegState::Undef`` + + * - ``internal`` + - ``RegState::InternalRead`` + + * - ``early-clobber`` + - ``RegState::EarlyClobber`` + + * - ``debug-use`` + - ``RegState::Debug`` + +.. _subregister-indices: + +Subregister Indices +~~~~~~~~~~~~~~~~~~~ + +The register machine operands can reference a portion of a register by using +the subregister indices. The example below shows an instance of the ``COPY`` +pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8 +lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1: + +.. code-block:: llvm + + %1 = COPY %0:sub_8bit + +The names of the subregister indices are target specific, and are typically +defined in the target's ``*RegisterInfo.td`` file. + +Global Value Operands +^^^^^^^^^^^^^^^^^^^^^ + +The global value machine operands reference the global values from the +:ref:`embedded LLVM IR module <embedded-module>`. +The example below shows an instance of the X86 ``MOV64rm`` instruction that has +a global value operand named ``G``: + +.. code-block:: llvm + + %rax = MOV64rm %rip, 1, _, @G, _ + +The named global values are represented using an identifier with the '@' prefix. +If the identifier doesn't match the regular expression +`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted. + +The unnamed global values are represented using an unsigned numeric value with +the '@' prefix, like in the following examples: ``@0``, ``@989``. + +.. TODO: Describe the parsers default behaviour when optional YAML attributes + are missing. +.. TODO: Describe the syntax for the bundled instructions. +.. TODO: Describe the syntax for virtual register YAML definitions. +.. TODO: Describe the machine function's YAML flag attributes. +.. TODO: Describe the syntax for the external symbol and register + mask machine operands. +.. TODO: Describe the frame information YAML mapping. +.. TODO: Describe the syntax of the stack object machine operands and their + YAML definitions. +.. TODO: Describe the syntax of the constant pool machine operands and their + YAML definitions. +.. TODO: Describe the syntax of the jump table machine operands and their + YAML definitions. +.. TODO: Describe the syntax of the block address machine operands. +.. TODO: Describe the syntax of the CFI index machine operands. +.. TODO: Describe the syntax of the metadata machine operands, and the + instructions debug location attribute. +.. TODO: Describe the syntax of the target index machine operands. +.. TODO: Describe the syntax of the register live out machine operands. +.. TODO: Describe the syntax of the machine memory operands. |