aboutsummaryrefslogtreecommitdiff
path: root/docs/MIRLangRef.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/MIRLangRef.rst')
-rw-r--r--docs/MIRLangRef.rst495
1 files changed, 495 insertions, 0 deletions
diff --git a/docs/MIRLangRef.rst b/docs/MIRLangRef.rst
new file mode 100644
index 000000000000..a5f8c8c743ab
--- /dev/null
+++ b/docs/MIRLangRef.rst
@@ -0,0 +1,495 @@
+========================================
+Machine IR (MIR) Format Reference Manual
+========================================
+
+.. contents::
+ :local:
+
+.. warning::
+ This is a work in progress.
+
+Introduction
+============
+
+This document is a reference manual for the Machine IR (MIR) serialization
+format. MIR is a human readable serialization format that is used to represent
+LLVM's :ref:`machine specific intermediate representation
+<machine code representation>`.
+
+The MIR serialization format is designed to be used for testing the code
+generation passes in LLVM.
+
+Overview
+========
+
+The MIR serialization format uses a YAML container. YAML is a standard
+data serialization language, and the full YAML language spec can be read at
+`yaml.org
+<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_.
+
+A MIR file is split up into a series of `YAML documents`_. The first document
+can contain an optional embedded LLVM IR module, and the rest of the documents
+contain the serialized machine functions.
+
+.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132
+
+MIR Testing Guide
+=================
+
+You can use the MIR format for testing in two different ways:
+
+- You can write MIR tests that invoke a single code generation pass using the
+ ``run-pass`` option in llc.
+
+- You can use llc's ``stop-after`` option with existing or new LLVM assembly
+ tests and check the MIR output of a specific code generation pass.
+
+Testing Individual Code Generation Passes
+-----------------------------------------
+
+The ``run-pass`` option in llc allows you to create MIR tests that invoke
+just a single code generation pass. When this option is used, llc will parse
+an input MIR file, run the specified code generation pass, and print the
+resulting MIR to the standard output stream.
+
+You can generate an input MIR file for the test by using the ``stop-after``
+option in llc. For example, if you would like to write a test for the
+post register allocation pseudo instruction expansion pass, you can specify
+the machine copy propagation pass in the ``stop-after`` option, as it runs
+just before the pass that we are trying to test:
+
+ ``llc -stop-after machine-cp bug-trigger.ll > test.mir``
+
+After generating the input MIR file, you'll have to add a run line that uses
+the ``-run-pass`` option to it. In order to test the post register allocation
+pseudo instruction expansion pass on X86-64, a run line like the one shown
+below can be used:
+
+ ``# RUN: llc -run-pass postrapseudos -march=x86-64 %s -o /dev/null | FileCheck %s``
+
+The MIR files are target dependent, so they have to be placed in the target
+specific test directories. They also need to specify a target triple or a
+target architecture either in the run line or in the embedded LLVM IR module.
+
+Limitations
+-----------
+
+Currently the MIR format has several limitations in terms of which state it
+can serialize:
+
+- The target-specific state in the target-specific ``MachineFunctionInfo``
+ subclasses isn't serialized at the moment.
+
+- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and
+ SystemZ backends) aren't serialized at the moment.
+
+- The ``MCSymbol`` machine operands are only printed, they can't be parsed.
+
+- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI
+ instructions and the variable debug information from MMI is serialized right
+ now.
+
+These limitations impose restrictions on what you can test with the MIR format.
+For now, tests that would like to test some behaviour that depends on the state
+of certain ``MCSymbol`` operands or the exception handling state in MMI, can't
+use the MIR format. As well as that, tests that test some behaviour that
+depends on the state of the target specific ``MachineFunctionInfo`` or
+``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment.
+
+High Level Structure
+====================
+
+.. _embedded-module:
+
+Embedded Module
+---------------
+
+When the first YAML document contains a `YAML block literal string`_, the MIR
+parser will treat this string as an LLVM assembly language string that
+represents an embedded LLVM IR module.
+Here is an example of a YAML document that contains an LLVM module:
+
+.. code-block:: llvm
+
+ --- |
+ define i32 @inc(i32* %x) {
+ entry:
+ %0 = load i32, i32* %x
+ %1 = add i32 %0, 1
+ store i32 %1, i32* %x
+ ret i32 %1
+ }
+ ...
+
+.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688
+
+Machine Functions
+-----------------
+
+The remaining YAML documents contain the machine functions. This is an example
+of such YAML document:
+
+.. code-block:: llvm
+
+ ---
+ name: inc
+ tracksRegLiveness: true
+ liveins:
+ - { reg: '%rdi' }
+ body: |
+ bb.0.entry:
+ liveins: %rdi
+
+ %eax = MOV32rm %rdi, 1, _, 0, _
+ %eax = INC32r killed %eax, implicit-def dead %eflags
+ MOV32mr killed %rdi, 1, _, 0, _, %eax
+ RETQ %eax
+ ...
+
+The document above consists of attributes that represent the various
+properties and data structures in a machine function.
+
+The attribute ``name`` is required, and its value should be identical to the
+name of a function that this machine function is based on.
+
+The attribute ``body`` is a `YAML block literal string`_. Its value represents
+the function's machine basic blocks and their machine instructions.
+
+Machine Instructions Format Reference
+=====================================
+
+The machine basic blocks and their instructions are represented using a custom,
+human readable serialization language. This language is used in the
+`YAML block literal string`_ that corresponds to the machine function's body.
+
+A source string that uses this language contains a list of machine basic
+blocks, which are described in the section below.
+
+Machine Basic Blocks
+--------------------
+
+A machine basic block is defined in a single block definition source construct
+that contains the block's ID.
+The example below defines two blocks that have an ID of zero and one:
+
+.. code-block:: llvm
+
+ bb.0:
+ <instructions>
+ bb.1:
+ <instructions>
+
+A machine basic block can also have a name. It should be specified after the ID
+in the block's definition:
+
+.. code-block:: llvm
+
+ bb.0.entry: ; This block's name is "entry"
+ <instructions>
+
+The block's name should be identical to the name of the IR block that this
+machine block is based on.
+
+Block References
+^^^^^^^^^^^^^^^^
+
+The machine basic blocks are identified by their ID numbers. Individual
+blocks are referenced using the following syntax:
+
+.. code-block:: llvm
+
+ %bb.<id>[.<name>]
+
+Examples:
+
+.. code-block:: llvm
+
+ %bb.0
+ %bb.1.then
+
+Successors
+^^^^^^^^^^
+
+The machine basic block's successors have to be specified before any of the
+instructions:
+
+.. code-block:: llvm
+
+ bb.0.entry:
+ successors: %bb.1.then, %bb.2.else
+ <instructions>
+ bb.1.then:
+ <instructions>
+ bb.2.else:
+ <instructions>
+
+The branch weights can be specified in brackets after the successor blocks.
+The example below defines a block that has two successors with branch weights
+of 32 and 16:
+
+.. code-block:: llvm
+
+ bb.0.entry:
+ successors: %bb.1.then(32), %bb.2.else(16)
+
+.. _bb-liveins:
+
+Live In Registers
+^^^^^^^^^^^^^^^^^
+
+The machine basic block's live in registers have to be specified before any of
+the instructions:
+
+.. code-block:: llvm
+
+ bb.0.entry:
+ liveins: %edi, %esi
+
+The list of live in registers and successors can be empty. The language also
+allows multiple live in register and successor lists - they are combined into
+one list by the parser.
+
+Miscellaneous Attributes
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be
+specified in brackets after the block's definition:
+
+.. code-block:: llvm
+
+ bb.0.entry (address-taken):
+ <instructions>
+ bb.2.else (align 4):
+ <instructions>
+ bb.3(landing-pad, align 4):
+ <instructions>
+
+.. TODO: Describe the way the reference to an unnamed LLVM IR block can be
+ preserved.
+
+Machine Instructions
+--------------------
+
+A machine instruction is composed of a name,
+:ref:`machine operands <machine-operands>`,
+:ref:`instruction flags <instruction-flags>`, and machine memory operands.
+
+The instruction's name is usually specified before the operands. The example
+below shows an instance of the X86 ``RETQ`` instruction with a single machine
+operand:
+
+.. code-block:: llvm
+
+ RETQ %eax
+
+However, if the machine instruction has one or more explicitly defined register
+operands, the instruction's name has to be specified after them. The example
+below shows an instance of the AArch64 ``LDPXpost`` instruction with three
+defined register operands:
+
+.. code-block:: llvm
+
+ %sp, %fp, %lr = LDPXpost %sp, 2
+
+The instruction names are serialized using the exact definitions from the
+target's ``*InstrInfo.td`` files, and they are case sensitive. This means that
+similar instruction names like ``TSTri`` and ``tSTRi`` represent different
+machine instructions.
+
+.. _instruction-flags:
+
+Instruction Flags
+^^^^^^^^^^^^^^^^^
+
+The flag ``frame-setup`` can be specified before the instruction's name:
+
+.. code-block:: llvm
+
+ %fp = frame-setup ADDXri %sp, 0, 0
+
+.. _registers:
+
+Registers
+---------
+
+Registers are one of the key primitives in the machine instructions
+serialization language. They are primarly used in the
+:ref:`register machine operands <register-operands>`,
+but they can also be used in a number of other places, like the
+:ref:`basic block's live in list <bb-liveins>`.
+
+The physical registers are identified by their name. They use the following
+syntax:
+
+.. code-block:: llvm
+
+ %<name>
+
+The example below shows three X86 physical registers:
+
+.. code-block:: llvm
+
+ %eax
+ %r15
+ %eflags
+
+The virtual registers are identified by their ID number. They use the following
+syntax:
+
+.. code-block:: llvm
+
+ %<id>
+
+Example:
+
+.. code-block:: llvm
+
+ %0
+
+The null registers are represented using an underscore ('``_``'). They can also be
+represented using a '``%noreg``' named register, although the former syntax
+is preferred.
+
+.. _machine-operands:
+
+Machine Operands
+----------------
+
+There are seventeen different kinds of machine operands, and all of them, except
+the ``MCSymbol`` operand, can be serialized. The ``MCSymbol`` operands are
+just printed out - they can't be parsed back yet.
+
+Immediate Operands
+^^^^^^^^^^^^^^^^^^
+
+The immediate machine operands are untyped, 64-bit signed integers. The
+example below shows an instance of the X86 ``MOV32ri`` instruction that has an
+immediate machine operand ``-42``:
+
+.. code-block:: llvm
+
+ %eax = MOV32ri -42
+
+.. TODO: Describe the CIMM (Rare) and FPIMM immediate operands.
+
+.. _register-operands:
+
+Register Operands
+^^^^^^^^^^^^^^^^^
+
+The :ref:`register <registers>` primitive is used to represent the register
+machine operands. The register operands can also have optional
+:ref:`register flags <register-flags>`,
+:ref:`a subregister index <subregister-indices>`,
+and a reference to the tied register operand.
+The full syntax of a register operand is shown below:
+
+.. code-block:: llvm
+
+ [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ]
+
+This example shows an instance of the X86 ``XOR32rr`` instruction that has
+5 register operands with different register flags:
+
+.. code-block:: llvm
+
+ dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al
+
+.. _register-flags:
+
+Register Flags
+~~~~~~~~~~~~~~
+
+The table below shows all of the possible register flags along with the
+corresponding internal ``llvm::RegState`` representation:
+
+.. list-table::
+ :header-rows: 1
+
+ * - Flag
+ - Internal Value
+
+ * - ``implicit``
+ - ``RegState::Implicit``
+
+ * - ``implicit-def``
+ - ``RegState::ImplicitDefine``
+
+ * - ``def``
+ - ``RegState::Define``
+
+ * - ``dead``
+ - ``RegState::Dead``
+
+ * - ``killed``
+ - ``RegState::Kill``
+
+ * - ``undef``
+ - ``RegState::Undef``
+
+ * - ``internal``
+ - ``RegState::InternalRead``
+
+ * - ``early-clobber``
+ - ``RegState::EarlyClobber``
+
+ * - ``debug-use``
+ - ``RegState::Debug``
+
+.. _subregister-indices:
+
+Subregister Indices
+~~~~~~~~~~~~~~~~~~~
+
+The register machine operands can reference a portion of a register by using
+the subregister indices. The example below shows an instance of the ``COPY``
+pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8
+lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1:
+
+.. code-block:: llvm
+
+ %1 = COPY %0:sub_8bit
+
+The names of the subregister indices are target specific, and are typically
+defined in the target's ``*RegisterInfo.td`` file.
+
+Global Value Operands
+^^^^^^^^^^^^^^^^^^^^^
+
+The global value machine operands reference the global values from the
+:ref:`embedded LLVM IR module <embedded-module>`.
+The example below shows an instance of the X86 ``MOV64rm`` instruction that has
+a global value operand named ``G``:
+
+.. code-block:: llvm
+
+ %rax = MOV64rm %rip, 1, _, @G, _
+
+The named global values are represented using an identifier with the '@' prefix.
+If the identifier doesn't match the regular expression
+`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted.
+
+The unnamed global values are represented using an unsigned numeric value with
+the '@' prefix, like in the following examples: ``@0``, ``@989``.
+
+.. TODO: Describe the parsers default behaviour when optional YAML attributes
+ are missing.
+.. TODO: Describe the syntax for the bundled instructions.
+.. TODO: Describe the syntax for virtual register YAML definitions.
+.. TODO: Describe the machine function's YAML flag attributes.
+.. TODO: Describe the syntax for the external symbol and register
+ mask machine operands.
+.. TODO: Describe the frame information YAML mapping.
+.. TODO: Describe the syntax of the stack object machine operands and their
+ YAML definitions.
+.. TODO: Describe the syntax of the constant pool machine operands and their
+ YAML definitions.
+.. TODO: Describe the syntax of the jump table machine operands and their
+ YAML definitions.
+.. TODO: Describe the syntax of the block address machine operands.
+.. TODO: Describe the syntax of the CFI index machine operands.
+.. TODO: Describe the syntax of the metadata machine operands, and the
+ instructions debug location attribute.
+.. TODO: Describe the syntax of the target index machine operands.
+.. TODO: Describe the syntax of the register live out machine operands.
+.. TODO: Describe the syntax of the machine memory operands.