aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/AMDGPUAsmGFX7.rst1255
-rw-r--r--docs/AMDGPUAsmGFX8.rst1672
-rw-r--r--docs/AMDGPUAsmGFX9.rst1906
-rw-r--r--docs/AMDGPUOperandSyntax.rst1055
-rw-r--r--docs/AMDGPUUsage.rst1050
-rw-r--r--docs/AdvancedBuilds.rst2
-rw-r--r--docs/AliasAnalysis.rst33
-rw-r--r--docs/BitCodeFormat.rst20
-rw-r--r--docs/Bugpoint.rst16
-rw-r--r--docs/CFIVerify.rst8
-rw-r--r--docs/CMake.rst32
-rw-r--r--docs/CodeGenerator.rst4
-rw-r--r--docs/CodingStandards.rst94
-rw-r--r--docs/CommandGuide/FileCheck.rst59
-rw-r--r--docs/CommandGuide/dsymutil.rst26
-rw-r--r--docs/CommandGuide/index.rst2
-rw-r--r--docs/CommandGuide/lit.rst4
-rw-r--r--docs/CommandGuide/llc.rst2
-rw-r--r--docs/CommandGuide/llvm-cov.rst21
-rw-r--r--docs/CommandGuide/llvm-exegesis-analysis.pngbin0 -> 34817 bytes
-rw-r--r--docs/CommandGuide/llvm-exegesis.rst186
-rw-r--r--docs/CommandGuide/llvm-mca.rst551
-rw-r--r--docs/CommandGuide/llvm-nm.rst4
-rw-r--r--docs/CommandGuide/opt.rst2
-rw-r--r--docs/CommandGuide/tblgen.rst13
-rw-r--r--docs/CommandLine.rst8
-rw-r--r--docs/CompilerWriterInfo.rst2
-rw-r--r--docs/Contributing.rst127
-rw-r--r--docs/Coroutines.rst26
-rw-r--r--docs/Docker.rst58
-rw-r--r--docs/ExceptionHandling.rst65
-rw-r--r--docs/Extensions.rst204
-rw-r--r--docs/GarbageCollection.rst18
-rw-r--r--docs/GettingStarted.rst18
-rw-r--r--docs/GoldPlugin.rst57
-rw-r--r--docs/HowToSubmitABug.rst2
-rw-r--r--docs/LangRef.rst1455
-rw-r--r--docs/Lexicon.rst6
-rw-r--r--docs/LibFuzzer.rst14
-rw-r--r--docs/MIRLangRef.rst81
-rw-r--r--docs/MemorySSA.rst2
-rw-r--r--docs/OptBisect.rst2
-rw-r--r--docs/PDB/MsfFile.rst82
-rw-r--r--docs/Passes.rst38
-rw-r--r--docs/Phabricator.rst18
-rw-r--r--docs/ProgrammersManual.rst53
-rw-r--r--docs/Proposals/VectorizationPlan.rst2
-rw-r--r--docs/ReleaseNotes.rst107
-rw-r--r--docs/ReleaseProcess.rst139
-rw-r--r--docs/ScudoHardenedAllocator.rst72
-rw-r--r--docs/SourceLevelDebugging.rst181
-rw-r--r--docs/SpeculativeLoadHardening.md1099
-rw-r--r--docs/SystemLibrary.rst9
-rw-r--r--docs/TableGen/BackEnds.rst137
-rw-r--r--docs/TableGen/LangIntro.rst100
-rw-r--r--docs/TableGen/LangRef.rst110
-rw-r--r--docs/TableGen/index.rst16
-rw-r--r--docs/TestingGuide.rst5
-rw-r--r--docs/Vectorizers.rst9
-rw-r--r--docs/XRay.rst139
-rw-r--r--docs/XRayExample.rst34
-rw-r--r--docs/XRayFDRFormat.rst12
-rw-r--r--docs/YamlIO.rst2
-rw-r--r--docs/conf.py4
-rw-r--r--docs/doxygen.cfg.in2
-rw-r--r--docs/index.rst6
-rw-r--r--docs/speculative_load_hardening_microbenchmarks.pngbin0 -> 112926 bytes
-rw-r--r--docs/tutorial/BuildingAJIT1.rst5
-rw-r--r--docs/tutorial/BuildingAJIT2.rst7
-rw-r--r--docs/tutorial/BuildingAJIT3.rst5
-rw-r--r--docs/tutorial/LangImpl02.rst12
-rw-r--r--docs/tutorial/LangImpl03.rst2
-rw-r--r--docs/tutorial/LangImpl04.rst10
-rw-r--r--docs/tutorial/LangImpl05.rst14
-rw-r--r--docs/tutorial/LangImpl06.rst2
-rw-r--r--docs/tutorial/LangImpl08.rst2
-rw-r--r--docs/tutorial/OCamlLangImpl1.rst2
77 files changed, 11413 insertions, 1186 deletions
diff --git a/docs/AMDGPUAsmGFX7.rst b/docs/AMDGPUAsmGFX7.rst
new file mode 100644
index 000000000000..8973c5035ad6
--- /dev/null
+++ b/docs/AMDGPUAsmGFX7.rst
@@ -0,0 +1,1255 @@
+..
+ **************************************************
+ * *
+ * Automatically generated file, do not edit! *
+ * *
+ **************************************************
+
+===========================
+Syntax of GFX7 Instructions
+===========================
+
+.. contents::
+ :local:
+
+
+DS
+===========================
+
+.. parsed-literal::
+
+ ds_add_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_append dst :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_b32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_b64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_f32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_f64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_f32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_f64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_condxchg32_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_consume dst :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_barrier src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_init src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_br src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_p src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_release_all src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_v src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_f32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_f64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_i32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_i64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_f32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_f64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_i32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_i64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_f32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_f64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_i32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_i64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_f32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_f64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_i32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_i64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_f32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_f64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_i32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_i64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_f32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_f64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_i32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_i64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_b32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_b64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_nop src0
+ ds_or_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_ordered_count dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2_b32 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2_b64 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2st64_b32 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2st64_b64 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b128 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b32 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b64 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b96 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_i16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_i8 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u8 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_swizzle_b32 dst, src0 :ref:`sw_offset16<amdgpu_synid_sw_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrap_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2_b32 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2_b64 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2st64_b32 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2st64_b64 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b128 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b16 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b8 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b96 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2st64_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2st64_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+
+EXP
+===========================
+
+.. parsed-literal::
+
+ exp dst, src0, src1, src2, src3 :ref:`done<amdgpu_synid_done>` :ref:`compr<amdgpu_synid_compr>` :ref:`vm<amdgpu_synid_vm>`
+
+FLAT
+===========================
+
+.. parsed-literal::
+
+ flat_atomic_add dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_add_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_and dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_and_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_cmpswap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_cmpswap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_dec dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_dec_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_fcmpswap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_fcmpswap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_fmax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_fmax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_fmin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_fmin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_inc dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_inc_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_or dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_or_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_sub dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_sub_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_swap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_swap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_xor dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_xor_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dword dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx2 dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx3 dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx4 dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_sbyte dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_sshort dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_ubyte dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_ushort dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_byte src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dword src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx2 src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx3 src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx4 src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_short src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+
+MIMG
+===========================
+
+.. parsed-literal::
+
+ image_atomic_add src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_and src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_cmpswap src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_dec src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_inc src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_or src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_smax src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_smin src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_sub src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_swap src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_umax src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_umin src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_xor src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4 dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_b_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_b_cl_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_b_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_b_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_b_cl_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_b_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_cl_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_l_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_lz_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_c_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_cl_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_l_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_lz_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_get_lod dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_get_resinfo dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_mip dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_mip_pck dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_mip_pck_sgn dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_pck dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_pck_sgn dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_b_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_c dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_c_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_c_b_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_c_cd dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_c_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_c_d dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_c_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_c_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_store src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_store_mip src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_store_mip_pck src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_store_pck src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+
+MUBUF
+===========================
+
+.. parsed-literal::
+
+ buffer_atomic_add src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_add_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_and src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_and_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_cmpswap src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_cmpswap_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_dec src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_dec_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_inc src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_inc_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_or src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_or_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smax src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smax_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smin src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smin_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_sub src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_sub_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_swap src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_swap_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umax src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umax_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umin src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umin_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_xor src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_xor_x2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dword dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_dwordx2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dwordx3 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dwordx4 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_x dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_format_xy dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_xyz dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_xyzw dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_sbyte dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_sshort dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_ubyte dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_ushort dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_store_byte src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dword src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx3 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx4 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_x src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xy src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xyz src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xyzw src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_short src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`addr64<amdgpu_synid_addr64>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_wbinvl1
+ buffer_wbinvl1_vol
+
+SMRD
+===========================
+
+.. parsed-literal::
+
+ s_buffer_load_dword dst, src0, src1
+ s_buffer_load_dwordx16 dst, src0, src1
+ s_buffer_load_dwordx2 dst, src0, src1
+ s_buffer_load_dwordx4 dst, src0, src1
+ s_buffer_load_dwordx8 dst, src0, src1
+ s_dcache_inv
+ s_dcache_inv_vol
+ s_load_dword dst, src0, src1
+ s_load_dwordx16 dst, src0, src1
+ s_load_dwordx2 dst, src0, src1
+ s_load_dwordx4 dst, src0, src1
+ s_load_dwordx8 dst, src0, src1
+ s_memtime dst
+
+SOP1
+===========================
+
+.. parsed-literal::
+
+ s_abs_i32 dst, src0
+ s_and_saveexec_b64 dst, src0
+ s_andn2_saveexec_b64 dst, src0
+ s_bcnt0_i32_b32 dst, src0
+ s_bcnt0_i32_b64 dst, src0
+ s_bcnt1_i32_b32 dst, src0
+ s_bcnt1_i32_b64 dst, src0
+ s_bitset0_b32 dst, src0
+ s_bitset0_b64 dst, src0
+ s_bitset1_b32 dst, src0
+ s_bitset1_b64 dst, src0
+ s_brev_b32 dst, src0
+ s_brev_b64 dst, src0
+ s_cbranch_join src0
+ s_cmov_b32 dst, src0
+ s_cmov_b64 dst, src0
+ s_ff0_i32_b32 dst, src0
+ s_ff0_i32_b64 dst, src0
+ s_ff1_i32_b32 dst, src0
+ s_ff1_i32_b64 dst, src0
+ s_flbit_i32 dst, src0
+ s_flbit_i32_b32 dst, src0
+ s_flbit_i32_b64 dst, src0
+ s_flbit_i32_i64 dst, src0
+ s_getpc_b64 dst
+ s_mov_b32 dst, src0
+ s_mov_b64 dst, src0
+ s_mov_fed_b32 dst, src0
+ s_movreld_b32 dst, src0
+ s_movreld_b64 dst, src0
+ s_movrels_b32 dst, src0
+ s_movrels_b64 dst, src0
+ s_nand_saveexec_b64 dst, src0
+ s_nor_saveexec_b64 dst, src0
+ s_not_b32 dst, src0
+ s_not_b64 dst, src0
+ s_or_saveexec_b64 dst, src0
+ s_orn2_saveexec_b64 dst, src0
+ s_quadmask_b32 dst, src0
+ s_quadmask_b64 dst, src0
+ s_rfe_b64 src0
+ s_setpc_b64 src0
+ s_sext_i32_i16 dst, src0
+ s_sext_i32_i8 dst, src0
+ s_swappc_b64 dst, src0
+ s_wqm_b32 dst, src0
+ s_wqm_b64 dst, src0
+ s_xnor_saveexec_b64 dst, src0
+ s_xor_saveexec_b64 dst, src0
+
+SOP2
+===========================
+
+.. parsed-literal::
+
+ s_absdiff_i32 dst, src0, src1
+ s_add_i32 dst, src0, src1
+ s_add_u32 dst, src0, src1
+ s_addc_u32 dst, src0, src1
+ s_and_b32 dst, src0, src1
+ s_and_b64 dst, src0, src1
+ s_andn2_b32 dst, src0, src1
+ s_andn2_b64 dst, src0, src1
+ s_ashr_i32 dst, src0, src1
+ s_ashr_i64 dst, src0, src1
+ s_bfe_i32 dst, src0, src1
+ s_bfe_i64 dst, src0, src1
+ s_bfe_u32 dst, src0, src1
+ s_bfe_u64 dst, src0, src1
+ s_bfm_b32 dst, src0, src1
+ s_bfm_b64 dst, src0, src1
+ s_cbranch_g_fork src0, src1
+ s_cselect_b32 dst, src0, src1
+ s_cselect_b64 dst, src0, src1
+ s_lshl_b32 dst, src0, src1
+ s_lshl_b64 dst, src0, src1
+ s_lshr_b32 dst, src0, src1
+ s_lshr_b64 dst, src0, src1
+ s_max_i32 dst, src0, src1
+ s_max_u32 dst, src0, src1
+ s_min_i32 dst, src0, src1
+ s_min_u32 dst, src0, src1
+ s_mul_i32 dst, src0, src1
+ s_nand_b32 dst, src0, src1
+ s_nand_b64 dst, src0, src1
+ s_nor_b32 dst, src0, src1
+ s_nor_b64 dst, src0, src1
+ s_or_b32 dst, src0, src1
+ s_or_b64 dst, src0, src1
+ s_orn2_b32 dst, src0, src1
+ s_orn2_b64 dst, src0, src1
+ s_sub_i32 dst, src0, src1
+ s_sub_u32 dst, src0, src1
+ s_subb_u32 dst, src0, src1
+ s_xnor_b32 dst, src0, src1
+ s_xnor_b64 dst, src0, src1
+ s_xor_b32 dst, src0, src1
+ s_xor_b64 dst, src0, src1
+
+SOPC
+===========================
+
+.. parsed-literal::
+
+ s_bitcmp0_b32 src0, src1
+ s_bitcmp0_b64 src0, src1
+ s_bitcmp1_b32 src0, src1
+ s_bitcmp1_b64 src0, src1
+ s_cmp_eq_i32 src0, src1
+ s_cmp_eq_u32 src0, src1
+ s_cmp_ge_i32 src0, src1
+ s_cmp_ge_u32 src0, src1
+ s_cmp_gt_i32 src0, src1
+ s_cmp_gt_u32 src0, src1
+ s_cmp_le_i32 src0, src1
+ s_cmp_le_u32 src0, src1
+ s_cmp_lg_i32 src0, src1
+ s_cmp_lg_u32 src0, src1
+ s_cmp_lt_i32 src0, src1
+ s_cmp_lt_u32 src0, src1
+ s_setvskip src0, src1
+
+SOPK
+===========================
+
+.. parsed-literal::
+
+ s_addk_i32 dst, src0
+ s_cbranch_i_fork src0, src1
+ s_cmovk_i32 dst, src0
+ s_cmpk_eq_i32 src0, src1
+ s_cmpk_eq_u32 src0, src1
+ s_cmpk_ge_i32 src0, src1
+ s_cmpk_ge_u32 src0, src1
+ s_cmpk_gt_i32 src0, src1
+ s_cmpk_gt_u32 src0, src1
+ s_cmpk_le_i32 src0, src1
+ s_cmpk_le_u32 src0, src1
+ s_cmpk_lg_i32 src0, src1
+ s_cmpk_lg_u32 src0, src1
+ s_cmpk_lt_i32 src0, src1
+ s_cmpk_lt_u32 src0, src1
+ s_getreg_b32 dst, src0
+ s_movk_i32 dst, src0
+ s_mulk_i32 dst, src0
+ s_setreg_b32 dst, src0
+ s_setreg_imm32_b32 dst, src0
+
+SOPP
+===========================
+
+.. parsed-literal::
+
+ s_barrier
+ s_branch src0
+ s_cbranch_cdbgsys src0
+ s_cbranch_cdbgsys_and_user src0
+ s_cbranch_cdbgsys_or_user src0
+ s_cbranch_cdbguser src0
+ s_cbranch_execnz src0
+ s_cbranch_execz src0
+ s_cbranch_scc0 src0
+ s_cbranch_scc1 src0
+ s_cbranch_vccnz src0
+ s_cbranch_vccz src0
+ s_decperflevel src0
+ s_endpgm
+ s_icache_inv
+ s_incperflevel src0
+ s_nop src0
+ s_sendmsg src0
+ s_sendmsghalt src0
+ s_sethalt src0
+ s_setkill src0
+ s_setprio src0
+ s_sleep src0
+ s_trap src0
+ s_ttracedata
+ s_waitcnt src0
+
+VINTRP
+===========================
+
+.. parsed-literal::
+
+ v_interp_mov_f32 dst, src0, src1
+ v_interp_p1_f32 dst, src0, src1
+ v_interp_p2_f32 dst, src0, src1
+
+VOP1
+===========================
+
+.. parsed-literal::
+
+ v_bfrev_b32 dst, src0
+ v_ceil_f32 dst, src0
+ v_ceil_f64 dst, src0
+ v_clrexcp
+ v_cos_f32 dst, src0
+ v_cvt_f16_f32 dst, src0
+ v_cvt_f32_f16 dst, src0
+ v_cvt_f32_f64 dst, src0
+ v_cvt_f32_i32 dst, src0
+ v_cvt_f32_u32 dst, src0
+ v_cvt_f32_ubyte0 dst, src0
+ v_cvt_f32_ubyte1 dst, src0
+ v_cvt_f32_ubyte2 dst, src0
+ v_cvt_f32_ubyte3 dst, src0
+ v_cvt_f64_f32 dst, src0
+ v_cvt_f64_i32 dst, src0
+ v_cvt_f64_u32 dst, src0
+ v_cvt_flr_i32_f32 dst, src0
+ v_cvt_i32_f32 dst, src0
+ v_cvt_i32_f64 dst, src0
+ v_cvt_off_f32_i4 dst, src0
+ v_cvt_rpi_i32_f32 dst, src0
+ v_cvt_u32_f32 dst, src0
+ v_cvt_u32_f64 dst, src0
+ v_exp_f32 dst, src0
+ v_exp_legacy_f32 dst, src0
+ v_ffbh_i32 dst, src0
+ v_ffbh_u32 dst, src0
+ v_ffbl_b32 dst, src0
+ v_floor_f32 dst, src0
+ v_floor_f64 dst, src0
+ v_fract_f32 dst, src0
+ v_fract_f64 dst, src0
+ v_frexp_exp_i32_f32 dst, src0
+ v_frexp_exp_i32_f64 dst, src0
+ v_frexp_mant_f32 dst, src0
+ v_frexp_mant_f64 dst, src0
+ v_log_clamp_f32 dst, src0
+ v_log_f32 dst, src0
+ v_log_legacy_f32 dst, src0
+ v_mov_b32 dst, src0
+ v_mov_fed_b32 dst, src0
+ v_movreld_b32 dst, src0
+ v_movrels_b32 dst, src0
+ v_movrelsd_b32 dst, src0
+ v_nop
+ v_not_b32 dst, src0
+ v_rcp_clamp_f32 dst, src0
+ v_rcp_clamp_f64 dst, src0
+ v_rcp_f32 dst, src0
+ v_rcp_f64 dst, src0
+ v_rcp_iflag_f32 dst, src0
+ v_rcp_legacy_f32 dst, src0
+ v_readfirstlane_b32 dst, src0
+ v_rndne_f32 dst, src0
+ v_rndne_f64 dst, src0
+ v_rsq_clamp_f32 dst, src0
+ v_rsq_clamp_f64 dst, src0
+ v_rsq_f32 dst, src0
+ v_rsq_f64 dst, src0
+ v_rsq_legacy_f32 dst, src0
+ v_sin_f32 dst, src0
+ v_sqrt_f32 dst, src0
+ v_sqrt_f64 dst, src0
+ v_trunc_f32 dst, src0
+ v_trunc_f64 dst, src0
+
+VOP2
+===========================
+
+.. parsed-literal::
+
+ v_add_f32 dst, src0, src1
+ v_add_i32 dst0, dst1, src0, src1
+ v_addc_u32 dst0, dst1, src0, src1, src2
+ v_and_b32 dst, src0, src1
+ v_ashr_i32 dst, src0, src1
+ v_ashrrev_i32 dst, src0, src1
+ v_bcnt_u32_b32 dst, src0, src1
+ v_bfm_b32 dst, src0, src1
+ v_cndmask_b32 dst, src0, src1, src2
+ v_cvt_pk_i16_i32 dst, src0, src1
+ v_cvt_pk_u16_u32 dst, src0, src1
+ v_cvt_pkaccum_u8_f32 dst, src0, src1
+ v_cvt_pknorm_i16_f32 dst, src0, src1
+ v_cvt_pknorm_u16_f32 dst, src0, src1
+ v_cvt_pkrtz_f16_f32 dst, src0, src1
+ v_ldexp_f32 dst, src0, src1
+ v_lshl_b32 dst, src0, src1
+ v_lshlrev_b32 dst, src0, src1
+ v_lshr_b32 dst, src0, src1
+ v_lshrrev_b32 dst, src0, src1
+ v_mac_f32 dst, src0, src1
+ v_mac_legacy_f32 dst, src0, src1
+ v_madak_f32 dst, src0, src1, src2
+ v_madmk_f32 dst, src0, src1, src2
+ v_max_f32 dst, src0, src1
+ v_max_i32 dst, src0, src1
+ v_max_legacy_f32 dst, src0, src1
+ v_max_u32 dst, src0, src1
+ v_mbcnt_hi_u32_b32 dst, src0, src1
+ v_mbcnt_lo_u32_b32 dst, src0, src1
+ v_min_f32 dst, src0, src1
+ v_min_i32 dst, src0, src1
+ v_min_legacy_f32 dst, src0, src1
+ v_min_u32 dst, src0, src1
+ v_mul_f32 dst, src0, src1
+ v_mul_hi_i32_i24 dst, src0, src1
+ v_mul_hi_u32_u24 dst, src0, src1
+ v_mul_i32_i24 dst, src0, src1
+ v_mul_legacy_f32 dst, src0, src1
+ v_mul_u32_u24 dst, src0, src1
+ v_or_b32 dst, src0, src1
+ v_readlane_b32 dst, src0, src1
+ v_sub_f32 dst, src0, src1
+ v_sub_i32 dst0, dst1, src0, src1
+ v_subb_u32 dst0, dst1, src0, src1, src2
+ v_subbrev_u32 dst0, dst1, src0, src1, src2
+ v_subrev_f32 dst, src0, src1
+ v_subrev_i32 dst0, dst1, src0, src1
+ v_writelane_b32 dst, src0, src1
+ v_xor_b32 dst, src0, src1
+
+VOP3
+===========================
+
+.. parsed-literal::
+
+ v_add_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_i32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_addc_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_alignbit_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_alignbyte_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_and_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ashr_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ashr_i64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ashrrev_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bcnt_u32_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bfe_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfe_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfi_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfm_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bfrev_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ceil_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ceil_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_clrexcp_e64 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_class_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_class_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lg_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lg_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_neq_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_neq_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nge_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nge_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ngt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ngt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nle_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nle_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlg_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlg_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_o_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_o_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_tru_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_tru_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_u_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_u_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_eq_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_eq_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_f_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_f_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_ge_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_ge_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_gt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_gt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_le_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_le_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_lg_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_lg_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_lt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_lt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_neq_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_neq_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_nge_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_nge_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_ngt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_ngt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_nle_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_nle_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_nlg_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_nlg_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_nlt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_nlt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_o_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_o_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_tru_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_tru_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_u_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmps_u_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_eq_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_eq_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_f_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_f_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_ge_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_ge_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_gt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_gt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_le_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_le_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_lg_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_lg_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_lt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_lt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_neq_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_neq_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_nge_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_nge_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_ngt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_ngt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_nle_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_nle_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_nlg_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_nlg_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_nlt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_nlt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_o_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_o_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_tru_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_tru_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_u_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpsx_u_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_class_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_class_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lg_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lg_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_neq_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_neq_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nge_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nge_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ngt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ngt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nle_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nle_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlg_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlg_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlt_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlt_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_o_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_o_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_tru_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_tru_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_u_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_u_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cndmask_b32_e64 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_cos_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubeid_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubema_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubesc_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubetc_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f16_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_i32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_u32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte0_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte1_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte2_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte3_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_i32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_u32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_flr_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_i32_f64_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_off_f32_i4_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_i16_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_u16_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_u8_f32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pkaccum_u8_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pknorm_i16_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pknorm_u16_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pkrtz_f16_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_rpi_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_u32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_u32_f64_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fmas_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fmas_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_scale_f32 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_div_scale_f64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_exp_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_exp_legacy_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ffbh_i32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ffbh_u32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ffbl_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_floor_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_floor_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fract_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fract_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_exp_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_exp_i32_f64_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_mant_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_mant_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ldexp_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ldexp_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_lerp_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_log_clamp_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_log_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_log_legacy_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_lshl_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshl_b64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshlrev_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshr_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshr_b64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshrrev_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mac_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mac_legacy_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i32_i24 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i64_i32 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_mad_legacy_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u32_u24 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u64_u32 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_max3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_max3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_max_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_max_legacy_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mbcnt_hi_u32_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mbcnt_lo_u32_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_med3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_med3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_med3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_min_legacy_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mov_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_mov_fed_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_movreld_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_movrels_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_movrelsd_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_mqsad_pk_u16_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_mqsad_u32_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_msad_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_i32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_i32_i24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_u32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_u32_u24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_i32_i24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_legacy_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_lo_i32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_lo_u32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_u32_u24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mullit_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_nop_e64 :ref:`omod<amdgpu_synid_omod>`
+ v_not_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_or_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_qsad_pk_u16_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_clamp_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_clamp_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_iflag_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_legacy_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_rndne_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rndne_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_clamp_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_clamp_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_legacy_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_hi_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u16 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_sin_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sqrt_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sqrt_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_i32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_subb_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_subbrev_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_i32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_trig_preop_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_trunc_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_trunc_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_xor_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+
+VOPC
+===========================
+
+.. parsed-literal::
+
+ v_cmp_class_f32 dst, src0, src1
+ v_cmp_class_f64 dst, src0, src1
+ v_cmp_eq_f32 dst, src0, src1
+ v_cmp_eq_f64 dst, src0, src1
+ v_cmp_eq_i32 dst, src0, src1
+ v_cmp_eq_i64 dst, src0, src1
+ v_cmp_eq_u32 dst, src0, src1
+ v_cmp_eq_u64 dst, src0, src1
+ v_cmp_f_f32 dst, src0, src1
+ v_cmp_f_f64 dst, src0, src1
+ v_cmp_f_i32 dst, src0, src1
+ v_cmp_f_i64 dst, src0, src1
+ v_cmp_f_u32 dst, src0, src1
+ v_cmp_f_u64 dst, src0, src1
+ v_cmp_ge_f32 dst, src0, src1
+ v_cmp_ge_f64 dst, src0, src1
+ v_cmp_ge_i32 dst, src0, src1
+ v_cmp_ge_i64 dst, src0, src1
+ v_cmp_ge_u32 dst, src0, src1
+ v_cmp_ge_u64 dst, src0, src1
+ v_cmp_gt_f32 dst, src0, src1
+ v_cmp_gt_f64 dst, src0, src1
+ v_cmp_gt_i32 dst, src0, src1
+ v_cmp_gt_i64 dst, src0, src1
+ v_cmp_gt_u32 dst, src0, src1
+ v_cmp_gt_u64 dst, src0, src1
+ v_cmp_le_f32 dst, src0, src1
+ v_cmp_le_f64 dst, src0, src1
+ v_cmp_le_i32 dst, src0, src1
+ v_cmp_le_i64 dst, src0, src1
+ v_cmp_le_u32 dst, src0, src1
+ v_cmp_le_u64 dst, src0, src1
+ v_cmp_lg_f32 dst, src0, src1
+ v_cmp_lg_f64 dst, src0, src1
+ v_cmp_lt_f32 dst, src0, src1
+ v_cmp_lt_f64 dst, src0, src1
+ v_cmp_lt_i32 dst, src0, src1
+ v_cmp_lt_i64 dst, src0, src1
+ v_cmp_lt_u32 dst, src0, src1
+ v_cmp_lt_u64 dst, src0, src1
+ v_cmp_ne_i32 dst, src0, src1
+ v_cmp_ne_i64 dst, src0, src1
+ v_cmp_ne_u32 dst, src0, src1
+ v_cmp_ne_u64 dst, src0, src1
+ v_cmp_neq_f32 dst, src0, src1
+ v_cmp_neq_f64 dst, src0, src1
+ v_cmp_nge_f32 dst, src0, src1
+ v_cmp_nge_f64 dst, src0, src1
+ v_cmp_ngt_f32 dst, src0, src1
+ v_cmp_ngt_f64 dst, src0, src1
+ v_cmp_nle_f32 dst, src0, src1
+ v_cmp_nle_f64 dst, src0, src1
+ v_cmp_nlg_f32 dst, src0, src1
+ v_cmp_nlg_f64 dst, src0, src1
+ v_cmp_nlt_f32 dst, src0, src1
+ v_cmp_nlt_f64 dst, src0, src1
+ v_cmp_o_f32 dst, src0, src1
+ v_cmp_o_f64 dst, src0, src1
+ v_cmp_t_i32 dst, src0, src1
+ v_cmp_t_i64 dst, src0, src1
+ v_cmp_t_u32 dst, src0, src1
+ v_cmp_t_u64 dst, src0, src1
+ v_cmp_tru_f32 dst, src0, src1
+ v_cmp_tru_f64 dst, src0, src1
+ v_cmp_u_f32 dst, src0, src1
+ v_cmp_u_f64 dst, src0, src1
+ v_cmps_eq_f32 dst, src0, src1
+ v_cmps_eq_f64 dst, src0, src1
+ v_cmps_f_f32 dst, src0, src1
+ v_cmps_f_f64 dst, src0, src1
+ v_cmps_ge_f32 dst, src0, src1
+ v_cmps_ge_f64 dst, src0, src1
+ v_cmps_gt_f32 dst, src0, src1
+ v_cmps_gt_f64 dst, src0, src1
+ v_cmps_le_f32 dst, src0, src1
+ v_cmps_le_f64 dst, src0, src1
+ v_cmps_lg_f32 dst, src0, src1
+ v_cmps_lg_f64 dst, src0, src1
+ v_cmps_lt_f32 dst, src0, src1
+ v_cmps_lt_f64 dst, src0, src1
+ v_cmps_neq_f32 dst, src0, src1
+ v_cmps_neq_f64 dst, src0, src1
+ v_cmps_nge_f32 dst, src0, src1
+ v_cmps_nge_f64 dst, src0, src1
+ v_cmps_ngt_f32 dst, src0, src1
+ v_cmps_ngt_f64 dst, src0, src1
+ v_cmps_nle_f32 dst, src0, src1
+ v_cmps_nle_f64 dst, src0, src1
+ v_cmps_nlg_f32 dst, src0, src1
+ v_cmps_nlg_f64 dst, src0, src1
+ v_cmps_nlt_f32 dst, src0, src1
+ v_cmps_nlt_f64 dst, src0, src1
+ v_cmps_o_f32 dst, src0, src1
+ v_cmps_o_f64 dst, src0, src1
+ v_cmps_tru_f32 dst, src0, src1
+ v_cmps_tru_f64 dst, src0, src1
+ v_cmps_u_f32 dst, src0, src1
+ v_cmps_u_f64 dst, src0, src1
+ v_cmpsx_eq_f32 dst, src0, src1
+ v_cmpsx_eq_f64 dst, src0, src1
+ v_cmpsx_f_f32 dst, src0, src1
+ v_cmpsx_f_f64 dst, src0, src1
+ v_cmpsx_ge_f32 dst, src0, src1
+ v_cmpsx_ge_f64 dst, src0, src1
+ v_cmpsx_gt_f32 dst, src0, src1
+ v_cmpsx_gt_f64 dst, src0, src1
+ v_cmpsx_le_f32 dst, src0, src1
+ v_cmpsx_le_f64 dst, src0, src1
+ v_cmpsx_lg_f32 dst, src0, src1
+ v_cmpsx_lg_f64 dst, src0, src1
+ v_cmpsx_lt_f32 dst, src0, src1
+ v_cmpsx_lt_f64 dst, src0, src1
+ v_cmpsx_neq_f32 dst, src0, src1
+ v_cmpsx_neq_f64 dst, src0, src1
+ v_cmpsx_nge_f32 dst, src0, src1
+ v_cmpsx_nge_f64 dst, src0, src1
+ v_cmpsx_ngt_f32 dst, src0, src1
+ v_cmpsx_ngt_f64 dst, src0, src1
+ v_cmpsx_nle_f32 dst, src0, src1
+ v_cmpsx_nle_f64 dst, src0, src1
+ v_cmpsx_nlg_f32 dst, src0, src1
+ v_cmpsx_nlg_f64 dst, src0, src1
+ v_cmpsx_nlt_f32 dst, src0, src1
+ v_cmpsx_nlt_f64 dst, src0, src1
+ v_cmpsx_o_f32 dst, src0, src1
+ v_cmpsx_o_f64 dst, src0, src1
+ v_cmpsx_tru_f32 dst, src0, src1
+ v_cmpsx_tru_f64 dst, src0, src1
+ v_cmpsx_u_f32 dst, src0, src1
+ v_cmpsx_u_f64 dst, src0, src1
+ v_cmpx_class_f32 dst, src0, src1
+ v_cmpx_class_f64 dst, src0, src1
+ v_cmpx_eq_f32 dst, src0, src1
+ v_cmpx_eq_f64 dst, src0, src1
+ v_cmpx_eq_i32 dst, src0, src1
+ v_cmpx_eq_i64 dst, src0, src1
+ v_cmpx_eq_u32 dst, src0, src1
+ v_cmpx_eq_u64 dst, src0, src1
+ v_cmpx_f_f32 dst, src0, src1
+ v_cmpx_f_f64 dst, src0, src1
+ v_cmpx_f_i32 dst, src0, src1
+ v_cmpx_f_i64 dst, src0, src1
+ v_cmpx_f_u32 dst, src0, src1
+ v_cmpx_f_u64 dst, src0, src1
+ v_cmpx_ge_f32 dst, src0, src1
+ v_cmpx_ge_f64 dst, src0, src1
+ v_cmpx_ge_i32 dst, src0, src1
+ v_cmpx_ge_i64 dst, src0, src1
+ v_cmpx_ge_u32 dst, src0, src1
+ v_cmpx_ge_u64 dst, src0, src1
+ v_cmpx_gt_f32 dst, src0, src1
+ v_cmpx_gt_f64 dst, src0, src1
+ v_cmpx_gt_i32 dst, src0, src1
+ v_cmpx_gt_i64 dst, src0, src1
+ v_cmpx_gt_u32 dst, src0, src1
+ v_cmpx_gt_u64 dst, src0, src1
+ v_cmpx_le_f32 dst, src0, src1
+ v_cmpx_le_f64 dst, src0, src1
+ v_cmpx_le_i32 dst, src0, src1
+ v_cmpx_le_i64 dst, src0, src1
+ v_cmpx_le_u32 dst, src0, src1
+ v_cmpx_le_u64 dst, src0, src1
+ v_cmpx_lg_f32 dst, src0, src1
+ v_cmpx_lg_f64 dst, src0, src1
+ v_cmpx_lt_f32 dst, src0, src1
+ v_cmpx_lt_f64 dst, src0, src1
+ v_cmpx_lt_i32 dst, src0, src1
+ v_cmpx_lt_i64 dst, src0, src1
+ v_cmpx_lt_u32 dst, src0, src1
+ v_cmpx_lt_u64 dst, src0, src1
+ v_cmpx_ne_i32 dst, src0, src1
+ v_cmpx_ne_i64 dst, src0, src1
+ v_cmpx_ne_u32 dst, src0, src1
+ v_cmpx_ne_u64 dst, src0, src1
+ v_cmpx_neq_f32 dst, src0, src1
+ v_cmpx_neq_f64 dst, src0, src1
+ v_cmpx_nge_f32 dst, src0, src1
+ v_cmpx_nge_f64 dst, src0, src1
+ v_cmpx_ngt_f32 dst, src0, src1
+ v_cmpx_ngt_f64 dst, src0, src1
+ v_cmpx_nle_f32 dst, src0, src1
+ v_cmpx_nle_f64 dst, src0, src1
+ v_cmpx_nlg_f32 dst, src0, src1
+ v_cmpx_nlg_f64 dst, src0, src1
+ v_cmpx_nlt_f32 dst, src0, src1
+ v_cmpx_nlt_f64 dst, src0, src1
+ v_cmpx_o_f32 dst, src0, src1
+ v_cmpx_o_f64 dst, src0, src1
+ v_cmpx_t_i32 dst, src0, src1
+ v_cmpx_t_i64 dst, src0, src1
+ v_cmpx_t_u32 dst, src0, src1
+ v_cmpx_t_u64 dst, src0, src1
+ v_cmpx_tru_f32 dst, src0, src1
+ v_cmpx_tru_f64 dst, src0, src1
+ v_cmpx_u_f32 dst, src0, src1
+ v_cmpx_u_f64 dst, src0, src1
diff --git a/docs/AMDGPUAsmGFX8.rst b/docs/AMDGPUAsmGFX8.rst
new file mode 100644
index 000000000000..44c48432fda1
--- /dev/null
+++ b/docs/AMDGPUAsmGFX8.rst
@@ -0,0 +1,1672 @@
+..
+ **************************************************
+ * *
+ * Automatically generated file, do not edit! *
+ * *
+ **************************************************
+
+===========================
+Syntax of GFX8 Instructions
+===========================
+
+.. contents::
+ :local:
+
+
+DS
+===========================
+
+.. parsed-literal::
+
+ ds_add_f32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_rtn_f32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_src2_f32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_append dst :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_bpermute_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>`
+ ds_cmpst_b32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_b64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_f32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_f64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_f32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_f64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_condxchg32_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_consume dst :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_barrier src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_init src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_br src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_p :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_release_all :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_v :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_f32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_f64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_i32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_i64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_f32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_f64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_i32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_i64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_f32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_f64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_i32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_i64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_f32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_f64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_i32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_i64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_f32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_f64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_i32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_i64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_f32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_f64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_i32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_i64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_b32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_b64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_nop
+ ds_or_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_ordered_count dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_permute_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>`
+ ds_read2_b32 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2_b64 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2st64_b32 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2st64_b64 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b128 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b32 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b64 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b96 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_i16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_i8 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u8 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_swizzle_b32 dst, src0 :ref:`sw_offset16<amdgpu_synid_sw_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrap_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2_b32 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2_b64 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2st64_b32 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2st64_b64 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b128 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b16 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b8 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b96 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2st64_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2st64_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+
+EXP
+===========================
+
+.. parsed-literal::
+
+ exp dst, src0, src1, src2, src3 :ref:`done<amdgpu_synid_done>` :ref:`compr<amdgpu_synid_compr>` :ref:`vm<amdgpu_synid_vm>`
+
+FLAT
+===========================
+
+.. parsed-literal::
+
+ flat_atomic_add dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_add_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_and dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_and_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_cmpswap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_cmpswap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_dec dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_dec_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_inc dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_inc_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_or dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_or_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_sub dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_sub_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_swap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_swap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_xor dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_xor_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dword dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx2 dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx3 dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx4 dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_sbyte dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_sshort dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_ubyte dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_ushort dst, src0 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_byte src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dword src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx2 src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx3 src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx4 src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_short src0, src1 :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+
+MIMG
+===========================
+
+.. parsed-literal::
+
+ image_atomic_add dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_and dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_cmpswap dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_dec dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_inc dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_or dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_smax dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_smin dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_sub dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_swap dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_umax dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_umin dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_xor dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4 dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_b_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_b_cl_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_b_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_b_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_b_cl_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_b_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_cl_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_l_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_lz_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_cl_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_l_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_lz_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_get_lod dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_get_resinfo dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_load_mip dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_load_mip_pck dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_mip_pck_sgn dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_pck dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_pck_sgn dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_b_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_c dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_c_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_c_b_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_c_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_c_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_c_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_store src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_store_mip src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_store_mip_pck src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_store_pck src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+
+MUBUF
+===========================
+
+.. parsed-literal::
+
+ buffer_atomic_add dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_add_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_and dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_and_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_cmpswap dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_cmpswap_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_dec dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_dec_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_inc dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_inc_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_or dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_or_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smax dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smax_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smin dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smin_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_sub dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_sub_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_swap dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_swap_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umax dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umax_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umin dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umin_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_xor dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_xor_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dword dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_dwordx2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dwordx3 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dwordx4 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_x dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_xy dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_xyz dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_xyzw dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_x dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_format_xy dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_xyz dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_xyzw dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_sbyte dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_sshort dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_ubyte dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_ushort dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_store_byte src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dword src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx3 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx4 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_x src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_xy src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_xyz src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_xyzw src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_x src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xy src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xyz src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xyzw src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_lds_dword src0, src1 :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`lds<amdgpu_synid_lds>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_short src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_wbinvl1
+ buffer_wbinvl1_vol
+
+SMEM
+===========================
+
+.. parsed-literal::
+
+ s_atc_probe src0, src1, src2
+ s_atc_probe_buffer src0, src1, src2
+ s_buffer_load_dword dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dwordx16 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dwordx2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dwordx4 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dwordx8 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_store_dword src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_store_dwordx2 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_store_dwordx4 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_dcache_inv
+ s_dcache_inv_vol
+ s_dcache_wb
+ s_dcache_wb_vol
+ s_load_dword dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_load_dwordx16 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_load_dwordx2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_load_dwordx4 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_load_dwordx8 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_memrealtime dst
+ s_memtime dst
+ s_store_dword src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_store_dwordx2 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_store_dwordx4 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+
+SOP1
+===========================
+
+.. parsed-literal::
+
+ s_abs_i32 dst, src0
+ s_and_saveexec_b64 dst, src0
+ s_andn2_saveexec_b64 dst, src0
+ s_bcnt0_i32_b32 dst, src0
+ s_bcnt0_i32_b64 dst, src0
+ s_bcnt1_i32_b32 dst, src0
+ s_bcnt1_i32_b64 dst, src0
+ s_bitset0_b32 dst, src0
+ s_bitset0_b64 dst, src0
+ s_bitset1_b32 dst, src0
+ s_bitset1_b64 dst, src0
+ s_brev_b32 dst, src0
+ s_brev_b64 dst, src0
+ s_cbranch_join src0
+ s_cmov_b32 dst, src0
+ s_cmov_b64 dst, src0
+ s_ff0_i32_b32 dst, src0
+ s_ff0_i32_b64 dst, src0
+ s_ff1_i32_b32 dst, src0
+ s_ff1_i32_b64 dst, src0
+ s_flbit_i32 dst, src0
+ s_flbit_i32_b32 dst, src0
+ s_flbit_i32_b64 dst, src0
+ s_flbit_i32_i64 dst, src0
+ s_getpc_b64 dst
+ s_mov_b32 dst, src0
+ s_mov_b64 dst, src0
+ s_mov_fed_b32 dst, src0
+ s_movreld_b32 dst, src0
+ s_movreld_b64 dst, src0
+ s_movrels_b32 dst, src0
+ s_movrels_b64 dst, src0
+ s_nand_saveexec_b64 dst, src0
+ s_nor_saveexec_b64 dst, src0
+ s_not_b32 dst, src0
+ s_not_b64 dst, src0
+ s_or_saveexec_b64 dst, src0
+ s_orn2_saveexec_b64 dst, src0
+ s_quadmask_b32 dst, src0
+ s_quadmask_b64 dst, src0
+ s_rfe_b64 src0
+ s_set_gpr_idx_idx src0
+ s_setpc_b64 src0
+ s_sext_i32_i16 dst, src0
+ s_sext_i32_i8 dst, src0
+ s_swappc_b64 dst, src0
+ s_wqm_b32 dst, src0
+ s_wqm_b64 dst, src0
+ s_xnor_saveexec_b64 dst, src0
+ s_xor_saveexec_b64 dst, src0
+
+SOP2
+===========================
+
+.. parsed-literal::
+
+ s_absdiff_i32 dst, src0, src1
+ s_add_i32 dst, src0, src1
+ s_add_u32 dst, src0, src1
+ s_addc_u32 dst, src0, src1
+ s_and_b32 dst, src0, src1
+ s_and_b64 dst, src0, src1
+ s_andn2_b32 dst, src0, src1
+ s_andn2_b64 dst, src0, src1
+ s_ashr_i32 dst, src0, src1
+ s_ashr_i64 dst, src0, src1
+ s_bfe_i32 dst, src0, src1
+ s_bfe_i64 dst, src0, src1
+ s_bfe_u32 dst, src0, src1
+ s_bfe_u64 dst, src0, src1
+ s_bfm_b32 dst, src0, src1
+ s_bfm_b64 dst, src0, src1
+ s_cbranch_g_fork src0, src1
+ s_cselect_b32 dst, src0, src1
+ s_cselect_b64 dst, src0, src1
+ s_lshl_b32 dst, src0, src1
+ s_lshl_b64 dst, src0, src1
+ s_lshr_b32 dst, src0, src1
+ s_lshr_b64 dst, src0, src1
+ s_max_i32 dst, src0, src1
+ s_max_u32 dst, src0, src1
+ s_min_i32 dst, src0, src1
+ s_min_u32 dst, src0, src1
+ s_mul_i32 dst, src0, src1
+ s_nand_b32 dst, src0, src1
+ s_nand_b64 dst, src0, src1
+ s_nor_b32 dst, src0, src1
+ s_nor_b64 dst, src0, src1
+ s_or_b32 dst, src0, src1
+ s_or_b64 dst, src0, src1
+ s_orn2_b32 dst, src0, src1
+ s_orn2_b64 dst, src0, src1
+ s_rfe_restore_b64 src0, src1
+ s_sub_i32 dst, src0, src1
+ s_sub_u32 dst, src0, src1
+ s_subb_u32 dst, src0, src1
+ s_xnor_b32 dst, src0, src1
+ s_xnor_b64 dst, src0, src1
+ s_xor_b32 dst, src0, src1
+ s_xor_b64 dst, src0, src1
+
+SOPC
+===========================
+
+.. parsed-literal::
+
+ s_bitcmp0_b32 src0, src1
+ s_bitcmp0_b64 src0, src1
+ s_bitcmp1_b32 src0, src1
+ s_bitcmp1_b64 src0, src1
+ s_cmp_eq_i32 src0, src1
+ s_cmp_eq_u32 src0, src1
+ s_cmp_eq_u64 src0, src1
+ s_cmp_ge_i32 src0, src1
+ s_cmp_ge_u32 src0, src1
+ s_cmp_gt_i32 src0, src1
+ s_cmp_gt_u32 src0, src1
+ s_cmp_le_i32 src0, src1
+ s_cmp_le_u32 src0, src1
+ s_cmp_lg_i32 src0, src1
+ s_cmp_lg_u32 src0, src1
+ s_cmp_lg_u64 src0, src1
+ s_cmp_lt_i32 src0, src1
+ s_cmp_lt_u32 src0, src1
+ s_set_gpr_idx_on src0, src1
+ s_setvskip src0, src1
+
+SOPK
+===========================
+
+.. parsed-literal::
+
+ s_addk_i32 dst, src0
+ s_cbranch_i_fork src0, src1
+ s_cmovk_i32 dst, src0
+ s_cmpk_eq_i32 src0, src1
+ s_cmpk_eq_u32 src0, src1
+ s_cmpk_ge_i32 src0, src1
+ s_cmpk_ge_u32 src0, src1
+ s_cmpk_gt_i32 src0, src1
+ s_cmpk_gt_u32 src0, src1
+ s_cmpk_le_i32 src0, src1
+ s_cmpk_le_u32 src0, src1
+ s_cmpk_lg_i32 src0, src1
+ s_cmpk_lg_u32 src0, src1
+ s_cmpk_lt_i32 src0, src1
+ s_cmpk_lt_u32 src0, src1
+ s_getreg_b32 dst, src0
+ s_movk_i32 dst, src0
+ s_mulk_i32 dst, src0
+ s_setreg_b32 dst, src0
+ s_setreg_imm32_b32 dst, src0
+
+SOPP
+===========================
+
+.. parsed-literal::
+
+ s_barrier
+ s_branch src0
+ s_cbranch_cdbgsys src0
+ s_cbranch_cdbgsys_and_user src0
+ s_cbranch_cdbgsys_or_user src0
+ s_cbranch_cdbguser src0
+ s_cbranch_execnz src0
+ s_cbranch_execz src0
+ s_cbranch_scc0 src0
+ s_cbranch_scc1 src0
+ s_cbranch_vccnz src0
+ s_cbranch_vccz src0
+ s_decperflevel src0
+ s_endpgm
+ s_endpgm_saved
+ s_icache_inv
+ s_incperflevel src0
+ s_nop src0
+ s_sendmsg src0
+ s_sendmsghalt src0
+ s_set_gpr_idx_mode src0
+ s_set_gpr_idx_off
+ s_sethalt src0
+ s_setkill src0
+ s_setprio src0
+ s_sleep src0
+ s_trap src0
+ s_ttracedata
+ s_waitcnt src0
+ s_wakeup
+
+VINTRP
+===========================
+
+.. parsed-literal::
+
+ v_interp_mov_f32 dst, src0, src1
+ v_interp_p1_f32 dst, src0, src1
+ v_interp_p2_f32 dst, src0, src1
+
+VOP1
+===========================
+
+.. parsed-literal::
+
+ v_bfrev_b32 dst, src0
+ v_bfrev_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_bfrev_b32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ceil_f16 dst, src0
+ v_ceil_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ceil_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ceil_f32 dst, src0
+ v_ceil_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ceil_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ceil_f64 dst, src0
+ v_clrexcp
+ v_cos_f16 dst, src0
+ v_cos_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cos_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cos_f32 dst, src0
+ v_cos_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cos_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f16_f32 dst, src0
+ v_cvt_f16_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f16_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f16_i16 dst, src0
+ v_cvt_f16_i16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f16_i16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f16_u16 dst, src0
+ v_cvt_f16_u16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f16_u16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_f16 dst, src0
+ v_cvt_f32_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_f64 dst, src0
+ v_cvt_f32_i32 dst, src0
+ v_cvt_f32_i32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_i32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_u32 dst, src0
+ v_cvt_f32_u32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_u32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_ubyte0 dst, src0
+ v_cvt_f32_ubyte0_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_ubyte0_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_ubyte1 dst, src0
+ v_cvt_f32_ubyte1_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_ubyte1_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_ubyte2 dst, src0
+ v_cvt_f32_ubyte2_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_ubyte2_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_ubyte3 dst, src0
+ v_cvt_f32_ubyte3_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_ubyte3_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f64_f32 dst, src0
+ v_cvt_f64_i32 dst, src0
+ v_cvt_f64_u32 dst, src0
+ v_cvt_flr_i32_f32 dst, src0
+ v_cvt_flr_i32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_flr_i32_f32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_i16_f16 dst, src0
+ v_cvt_i16_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_i16_f16_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_i32_f32 dst, src0
+ v_cvt_i32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_i32_f32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_i32_f64 dst, src0
+ v_cvt_off_f32_i4 dst, src0
+ v_cvt_off_f32_i4_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_off_f32_i4_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_rpi_i32_f32 dst, src0
+ v_cvt_rpi_i32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_rpi_i32_f32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_u16_f16 dst, src0
+ v_cvt_u16_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_u16_f16_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_u32_f32 dst, src0
+ v_cvt_u32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_u32_f32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_u32_f64 dst, src0
+ v_exp_f16 dst, src0
+ v_exp_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_exp_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_exp_f32 dst, src0
+ v_exp_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_exp_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_exp_legacy_f32 dst, src0
+ v_exp_legacy_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_exp_legacy_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ffbh_i32 dst, src0
+ v_ffbh_i32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ffbh_i32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ffbh_u32 dst, src0
+ v_ffbh_u32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ffbh_u32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ffbl_b32 dst, src0
+ v_ffbl_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ffbl_b32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_floor_f16 dst, src0
+ v_floor_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_floor_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_floor_f32 dst, src0
+ v_floor_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_floor_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_floor_f64 dst, src0
+ v_fract_f16 dst, src0
+ v_fract_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_fract_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_fract_f32 dst, src0
+ v_fract_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_fract_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_fract_f64 dst, src0
+ v_frexp_exp_i16_f16 dst, src0
+ v_frexp_exp_i16_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_frexp_exp_i16_f16_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_frexp_exp_i32_f32 dst, src0
+ v_frexp_exp_i32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_frexp_exp_i32_f32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_frexp_exp_i32_f64 dst, src0
+ v_frexp_mant_f16 dst, src0
+ v_frexp_mant_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_frexp_mant_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_frexp_mant_f32 dst, src0
+ v_frexp_mant_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_frexp_mant_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_frexp_mant_f64 dst, src0
+ v_log_f16 dst, src0
+ v_log_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_log_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_log_f32 dst, src0
+ v_log_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_log_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_log_legacy_f32 dst, src0
+ v_log_legacy_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_log_legacy_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_mov_b32 dst, src0
+ v_mov_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mov_b32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_mov_fed_b32 dst, src0
+ v_mov_fed_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mov_fed_b32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_movreld_b32 dst, src0
+ v_movrels_b32 dst, src0
+ v_movrelsd_b32 dst, src0
+ v_nop
+ v_not_b32 dst, src0
+ v_not_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_not_b32_sdwa dst, src0 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rcp_f16 dst, src0
+ v_rcp_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rcp_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rcp_f32 dst, src0
+ v_rcp_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rcp_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rcp_f64 dst, src0
+ v_rcp_iflag_f32 dst, src0
+ v_rcp_iflag_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rcp_iflag_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_readfirstlane_b32 dst, src0
+ v_rndne_f16 dst, src0
+ v_rndne_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rndne_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rndne_f32 dst, src0
+ v_rndne_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rndne_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rndne_f64 dst, src0
+ v_rsq_f16 dst, src0
+ v_rsq_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rsq_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rsq_f32 dst, src0
+ v_rsq_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rsq_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rsq_f64 dst, src0
+ v_sin_f16 dst, src0
+ v_sin_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sin_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sin_f32 dst, src0
+ v_sin_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sin_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sqrt_f16 dst, src0
+ v_sqrt_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sqrt_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sqrt_f32 dst, src0
+ v_sqrt_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sqrt_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sqrt_f64 dst, src0
+ v_trunc_f16 dst, src0
+ v_trunc_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_trunc_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_trunc_f32 dst, src0
+ v_trunc_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_trunc_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_trunc_f64 dst, src0
+
+VOP2
+===========================
+
+.. parsed-literal::
+
+ v_add_f16 dst, src0, src1
+ v_add_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_add_f32 dst, src0, src1
+ v_add_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_add_u16 dst, src0, src1
+ v_add_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_u16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_add_u32 dst0, dst1, src0, src1
+ v_add_u32_dpp dst0, dst1, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_u32_sdwa dst0, dst1, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_addc_u32 dst0, dst1, src0, src1, src2
+ v_addc_u32_dpp dst0, dst1, src0, src1, src2 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_addc_u32_sdwa dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_and_b32 dst, src0, src1
+ v_and_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_and_b32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_ashrrev_i16 dst, src0, src1
+ v_ashrrev_i16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ashrrev_i16_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_ashrrev_i32 dst, src0, src1
+ v_ashrrev_i32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ashrrev_i32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cndmask_b32 dst, src0, src1, src2
+ v_cndmask_b32_dpp dst, src0, src1, src2 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cndmask_b32_sdwa dst, src0, src1, src2 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_ldexp_f16 dst, src0, src1
+ v_ldexp_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ldexp_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_lshlrev_b16 dst, src0, src1
+ v_lshlrev_b16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_lshlrev_b16_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_lshlrev_b32 dst, src0, src1
+ v_lshlrev_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_lshlrev_b32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_lshrrev_b16 dst, src0, src1
+ v_lshrrev_b16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_lshrrev_b16_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_lshrrev_b32 dst, src0, src1
+ v_lshrrev_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_lshrrev_b32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mac_f16 dst, src0, src1
+ v_mac_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mac_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mac_f32 dst, src0, src1
+ v_mac_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mac_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_madak_f16 dst, src0, src1, src2
+ v_madak_f32 dst, src0, src1, src2
+ v_madmk_f16 dst, src0, src1, src2
+ v_madmk_f32 dst, src0, src1, src2
+ v_max_f16 dst, src0, src1
+ v_max_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_f32 dst, src0, src1
+ v_max_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_i16 dst, src0, src1
+ v_max_i16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_i16_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_i32 dst, src0, src1
+ v_max_i32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_i32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_u16 dst, src0, src1
+ v_max_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_u16_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_u32 dst, src0, src1
+ v_max_u32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_u32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_f16 dst, src0, src1
+ v_min_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_f32 dst, src0, src1
+ v_min_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_i16 dst, src0, src1
+ v_min_i16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_i16_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_i32 dst, src0, src1
+ v_min_i32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_i32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_u16 dst, src0, src1
+ v_min_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_u16_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_u32 dst, src0, src1
+ v_min_u32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_u32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_f16 dst, src0, src1
+ v_mul_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_f32 dst, src0, src1
+ v_mul_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_hi_i32_i24 dst, src0, src1
+ v_mul_hi_i32_i24_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_hi_i32_i24_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_hi_u32_u24 dst, src0, src1
+ v_mul_hi_u32_u24_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_hi_u32_u24_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_i32_i24 dst, src0, src1
+ v_mul_i32_i24_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_i32_i24_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_legacy_f32 dst, src0, src1
+ v_mul_legacy_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_legacy_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_lo_u16 dst, src0, src1
+ v_mul_lo_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_lo_u16_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_u32_u24 dst, src0, src1
+ v_mul_u32_u24_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_u32_u24_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_or_b32 dst, src0, src1
+ v_or_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_or_b32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_f16 dst, src0, src1
+ v_sub_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_f32 dst, src0, src1
+ v_sub_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_u16 dst, src0, src1
+ v_sub_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_u16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_u32 dst0, dst1, src0, src1
+ v_sub_u32_dpp dst0, dst1, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_u32_sdwa dst0, dst1, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subb_u32 dst0, dst1, src0, src1, src2
+ v_subb_u32_dpp dst0, dst1, src0, src1, src2 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subb_u32_sdwa dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subbrev_u32 dst0, dst1, src0, src1, src2
+ v_subbrev_u32_dpp dst0, dst1, src0, src1, src2 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subbrev_u32_sdwa dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_f16 dst, src0, src1
+ v_subrev_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_f32 dst, src0, src1
+ v_subrev_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_u16 dst, src0, src1
+ v_subrev_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_u16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_u32 dst0, dst1, src0, src1
+ v_subrev_u32_dpp dst0, dst1, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_u32_sdwa dst0, dst1, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_xor_b32 dst, src0, src1
+ v_xor_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_xor_b32_sdwa dst, src0, src1 :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+
+VOP3
+===========================
+
+.. parsed-literal::
+
+ v_add_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_add_u32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_addc_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_alignbit_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_alignbyte_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_and_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ashrrev_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ashrrev_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ashrrev_i64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bcnt_u32_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bfe_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfe_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfi_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfm_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bfrev_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ceil_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ceil_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ceil_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_clrexcp_e64 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_class_f16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_class_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_class_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lg_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lg_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lg_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_neq_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_neq_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_neq_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nge_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nge_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nge_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ngt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ngt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ngt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nle_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nle_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nle_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlg_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlg_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlg_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_o_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_o_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_o_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_tru_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_tru_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_tru_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_u_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_u_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_u_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_class_f16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_class_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_class_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lg_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lg_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lg_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_neq_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_neq_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_neq_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nge_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nge_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nge_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ngt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ngt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ngt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nle_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nle_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nle_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlg_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlg_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlg_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_o_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_o_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_o_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_tru_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_tru_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_tru_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_u_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_u_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_u_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cndmask_b32_e64 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_cos_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cos_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubeid_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubema_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubesc_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubetc_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f16_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f16_i16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f16_u16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_i32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_u32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte0_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte1_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte2_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte3_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_i32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_u32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_flr_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_i16_f16_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_i32_f64_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_off_f32_i4_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_i16_i32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_u16_u32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_u8_f32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pkaccum_u8_f32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pknorm_i16_f32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pknorm_u16_f32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pkrtz_f16_f32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_rpi_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_u16_f16_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_u32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_u32_f64_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_f16 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fmas_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fmas_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_scale_f32 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_div_scale_f64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_exp_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_exp_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_exp_legacy_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ffbh_i32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ffbh_u32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ffbl_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_floor_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_floor_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_floor_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_f16 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fract_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fract_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fract_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_exp_i16_f16_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_exp_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_exp_i32_f64_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_mant_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_mant_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_mant_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_mov_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p1_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p1ll_f16 dst, src0, src1 :ref:`high<amdgpu_synid_high>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p1lv_f16 dst, src0, src1, src2 :ref:`high<amdgpu_synid_high>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p2_f16 dst, src0, src1, src2 :ref:`high<amdgpu_synid_high>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p2_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ldexp_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ldexp_f32 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ldexp_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_lerp_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_log_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_log_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_log_legacy_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_lshlrev_b16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshlrev_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshlrev_b64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshrrev_b16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshrrev_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshrrev_b64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mac_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mac_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_f16 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i16 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i32_i24 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i64_i32 dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_legacy_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u16 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u32_u24 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u64_u32 dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_max3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_max_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_max_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_max_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_max_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mbcnt_hi_u32_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mbcnt_lo_u32_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_med3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_med3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_med3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_min_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_min_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_min_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mov_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_mov_fed_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_movreld_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_movrels_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_movrelsd_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_mqsad_pk_u16_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mqsad_u32_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_msad_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_i32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_i32_i24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_u32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_u32_u24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_i32_i24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_legacy_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_lo_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_lo_u32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_u32_u24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_nop_e64 :ref:`omod<amdgpu_synid_omod>`
+ v_not_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_or_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_perm_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_qsad_pk_u16_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_iflag_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_readlane_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_rndne_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rndne_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rndne_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_hi_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u16 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sin_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sin_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sqrt_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sqrt_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sqrt_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_sub_u32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_subb_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_subbrev_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_u32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_trig_preop_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_trunc_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_trunc_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_trunc_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_writelane_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_xor_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+
+VOPC
+===========================
+
+.. parsed-literal::
+
+ v_cmp_class_f16 dst, src0, src1
+ v_cmp_class_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_class_f32 dst, src0, src1
+ v_cmp_class_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_class_f64 dst, src0, src1
+ v_cmp_eq_f16 dst, src0, src1
+ v_cmp_eq_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_f32 dst, src0, src1
+ v_cmp_eq_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_f64 dst, src0, src1
+ v_cmp_eq_i16 dst, src0, src1
+ v_cmp_eq_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_i32 dst, src0, src1
+ v_cmp_eq_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_i64 dst, src0, src1
+ v_cmp_eq_u16 dst, src0, src1
+ v_cmp_eq_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_u32 dst, src0, src1
+ v_cmp_eq_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_u64 dst, src0, src1
+ v_cmp_f_f16 dst, src0, src1
+ v_cmp_f_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_f32 dst, src0, src1
+ v_cmp_f_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_f64 dst, src0, src1
+ v_cmp_f_i16 dst, src0, src1
+ v_cmp_f_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_i32 dst, src0, src1
+ v_cmp_f_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_i64 dst, src0, src1
+ v_cmp_f_u16 dst, src0, src1
+ v_cmp_f_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_u32 dst, src0, src1
+ v_cmp_f_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_u64 dst, src0, src1
+ v_cmp_ge_f16 dst, src0, src1
+ v_cmp_ge_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_f32 dst, src0, src1
+ v_cmp_ge_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_f64 dst, src0, src1
+ v_cmp_ge_i16 dst, src0, src1
+ v_cmp_ge_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_i32 dst, src0, src1
+ v_cmp_ge_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_i64 dst, src0, src1
+ v_cmp_ge_u16 dst, src0, src1
+ v_cmp_ge_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_u32 dst, src0, src1
+ v_cmp_ge_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_u64 dst, src0, src1
+ v_cmp_gt_f16 dst, src0, src1
+ v_cmp_gt_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_f32 dst, src0, src1
+ v_cmp_gt_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_f64 dst, src0, src1
+ v_cmp_gt_i16 dst, src0, src1
+ v_cmp_gt_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_i32 dst, src0, src1
+ v_cmp_gt_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_i64 dst, src0, src1
+ v_cmp_gt_u16 dst, src0, src1
+ v_cmp_gt_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_u32 dst, src0, src1
+ v_cmp_gt_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_u64 dst, src0, src1
+ v_cmp_le_f16 dst, src0, src1
+ v_cmp_le_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_f32 dst, src0, src1
+ v_cmp_le_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_f64 dst, src0, src1
+ v_cmp_le_i16 dst, src0, src1
+ v_cmp_le_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_i32 dst, src0, src1
+ v_cmp_le_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_i64 dst, src0, src1
+ v_cmp_le_u16 dst, src0, src1
+ v_cmp_le_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_u32 dst, src0, src1
+ v_cmp_le_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_u64 dst, src0, src1
+ v_cmp_lg_f16 dst, src0, src1
+ v_cmp_lg_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lg_f32 dst, src0, src1
+ v_cmp_lg_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lg_f64 dst, src0, src1
+ v_cmp_lt_f16 dst, src0, src1
+ v_cmp_lt_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_f32 dst, src0, src1
+ v_cmp_lt_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_f64 dst, src0, src1
+ v_cmp_lt_i16 dst, src0, src1
+ v_cmp_lt_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_i32 dst, src0, src1
+ v_cmp_lt_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_i64 dst, src0, src1
+ v_cmp_lt_u16 dst, src0, src1
+ v_cmp_lt_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_u32 dst, src0, src1
+ v_cmp_lt_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_u64 dst, src0, src1
+ v_cmp_ne_i16 dst, src0, src1
+ v_cmp_ne_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ne_i32 dst, src0, src1
+ v_cmp_ne_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ne_i64 dst, src0, src1
+ v_cmp_ne_u16 dst, src0, src1
+ v_cmp_ne_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ne_u32 dst, src0, src1
+ v_cmp_ne_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ne_u64 dst, src0, src1
+ v_cmp_neq_f16 dst, src0, src1
+ v_cmp_neq_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_neq_f32 dst, src0, src1
+ v_cmp_neq_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_neq_f64 dst, src0, src1
+ v_cmp_nge_f16 dst, src0, src1
+ v_cmp_nge_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nge_f32 dst, src0, src1
+ v_cmp_nge_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nge_f64 dst, src0, src1
+ v_cmp_ngt_f16 dst, src0, src1
+ v_cmp_ngt_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ngt_f32 dst, src0, src1
+ v_cmp_ngt_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ngt_f64 dst, src0, src1
+ v_cmp_nle_f16 dst, src0, src1
+ v_cmp_nle_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nle_f32 dst, src0, src1
+ v_cmp_nle_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nle_f64 dst, src0, src1
+ v_cmp_nlg_f16 dst, src0, src1
+ v_cmp_nlg_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nlg_f32 dst, src0, src1
+ v_cmp_nlg_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nlg_f64 dst, src0, src1
+ v_cmp_nlt_f16 dst, src0, src1
+ v_cmp_nlt_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nlt_f32 dst, src0, src1
+ v_cmp_nlt_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nlt_f64 dst, src0, src1
+ v_cmp_o_f16 dst, src0, src1
+ v_cmp_o_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_o_f32 dst, src0, src1
+ v_cmp_o_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_o_f64 dst, src0, src1
+ v_cmp_t_i16 dst, src0, src1
+ v_cmp_t_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_t_i32 dst, src0, src1
+ v_cmp_t_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_t_i64 dst, src0, src1
+ v_cmp_t_u16 dst, src0, src1
+ v_cmp_t_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_t_u32 dst, src0, src1
+ v_cmp_t_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_t_u64 dst, src0, src1
+ v_cmp_tru_f16 dst, src0, src1
+ v_cmp_tru_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_tru_f32 dst, src0, src1
+ v_cmp_tru_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_tru_f64 dst, src0, src1
+ v_cmp_u_f16 dst, src0, src1
+ v_cmp_u_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_u_f32 dst, src0, src1
+ v_cmp_u_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_u_f64 dst, src0, src1
+ v_cmpx_class_f16 dst, src0, src1
+ v_cmpx_class_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_class_f32 dst, src0, src1
+ v_cmpx_class_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_class_f64 dst, src0, src1
+ v_cmpx_eq_f16 dst, src0, src1
+ v_cmpx_eq_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_f32 dst, src0, src1
+ v_cmpx_eq_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_f64 dst, src0, src1
+ v_cmpx_eq_i16 dst, src0, src1
+ v_cmpx_eq_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_i32 dst, src0, src1
+ v_cmpx_eq_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_i64 dst, src0, src1
+ v_cmpx_eq_u16 dst, src0, src1
+ v_cmpx_eq_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_u32 dst, src0, src1
+ v_cmpx_eq_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_u64 dst, src0, src1
+ v_cmpx_f_f16 dst, src0, src1
+ v_cmpx_f_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_f32 dst, src0, src1
+ v_cmpx_f_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_f64 dst, src0, src1
+ v_cmpx_f_i16 dst, src0, src1
+ v_cmpx_f_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_i32 dst, src0, src1
+ v_cmpx_f_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_i64 dst, src0, src1
+ v_cmpx_f_u16 dst, src0, src1
+ v_cmpx_f_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_u32 dst, src0, src1
+ v_cmpx_f_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_u64 dst, src0, src1
+ v_cmpx_ge_f16 dst, src0, src1
+ v_cmpx_ge_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_f32 dst, src0, src1
+ v_cmpx_ge_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_f64 dst, src0, src1
+ v_cmpx_ge_i16 dst, src0, src1
+ v_cmpx_ge_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_i32 dst, src0, src1
+ v_cmpx_ge_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_i64 dst, src0, src1
+ v_cmpx_ge_u16 dst, src0, src1
+ v_cmpx_ge_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_u32 dst, src0, src1
+ v_cmpx_ge_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_u64 dst, src0, src1
+ v_cmpx_gt_f16 dst, src0, src1
+ v_cmpx_gt_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_f32 dst, src0, src1
+ v_cmpx_gt_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_f64 dst, src0, src1
+ v_cmpx_gt_i16 dst, src0, src1
+ v_cmpx_gt_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_i32 dst, src0, src1
+ v_cmpx_gt_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_i64 dst, src0, src1
+ v_cmpx_gt_u16 dst, src0, src1
+ v_cmpx_gt_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_u32 dst, src0, src1
+ v_cmpx_gt_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_u64 dst, src0, src1
+ v_cmpx_le_f16 dst, src0, src1
+ v_cmpx_le_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_f32 dst, src0, src1
+ v_cmpx_le_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_f64 dst, src0, src1
+ v_cmpx_le_i16 dst, src0, src1
+ v_cmpx_le_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_i32 dst, src0, src1
+ v_cmpx_le_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_i64 dst, src0, src1
+ v_cmpx_le_u16 dst, src0, src1
+ v_cmpx_le_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_u32 dst, src0, src1
+ v_cmpx_le_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_u64 dst, src0, src1
+ v_cmpx_lg_f16 dst, src0, src1
+ v_cmpx_lg_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lg_f32 dst, src0, src1
+ v_cmpx_lg_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lg_f64 dst, src0, src1
+ v_cmpx_lt_f16 dst, src0, src1
+ v_cmpx_lt_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_f32 dst, src0, src1
+ v_cmpx_lt_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_f64 dst, src0, src1
+ v_cmpx_lt_i16 dst, src0, src1
+ v_cmpx_lt_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_i32 dst, src0, src1
+ v_cmpx_lt_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_i64 dst, src0, src1
+ v_cmpx_lt_u16 dst, src0, src1
+ v_cmpx_lt_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_u32 dst, src0, src1
+ v_cmpx_lt_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_u64 dst, src0, src1
+ v_cmpx_ne_i16 dst, src0, src1
+ v_cmpx_ne_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ne_i32 dst, src0, src1
+ v_cmpx_ne_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ne_i64 dst, src0, src1
+ v_cmpx_ne_u16 dst, src0, src1
+ v_cmpx_ne_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ne_u32 dst, src0, src1
+ v_cmpx_ne_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ne_u64 dst, src0, src1
+ v_cmpx_neq_f16 dst, src0, src1
+ v_cmpx_neq_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_neq_f32 dst, src0, src1
+ v_cmpx_neq_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_neq_f64 dst, src0, src1
+ v_cmpx_nge_f16 dst, src0, src1
+ v_cmpx_nge_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nge_f32 dst, src0, src1
+ v_cmpx_nge_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nge_f64 dst, src0, src1
+ v_cmpx_ngt_f16 dst, src0, src1
+ v_cmpx_ngt_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ngt_f32 dst, src0, src1
+ v_cmpx_ngt_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ngt_f64 dst, src0, src1
+ v_cmpx_nle_f16 dst, src0, src1
+ v_cmpx_nle_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nle_f32 dst, src0, src1
+ v_cmpx_nle_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nle_f64 dst, src0, src1
+ v_cmpx_nlg_f16 dst, src0, src1
+ v_cmpx_nlg_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nlg_f32 dst, src0, src1
+ v_cmpx_nlg_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nlg_f64 dst, src0, src1
+ v_cmpx_nlt_f16 dst, src0, src1
+ v_cmpx_nlt_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nlt_f32 dst, src0, src1
+ v_cmpx_nlt_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nlt_f64 dst, src0, src1
+ v_cmpx_o_f16 dst, src0, src1
+ v_cmpx_o_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_o_f32 dst, src0, src1
+ v_cmpx_o_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_o_f64 dst, src0, src1
+ v_cmpx_t_i16 dst, src0, src1
+ v_cmpx_t_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_t_i32 dst, src0, src1
+ v_cmpx_t_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_t_i64 dst, src0, src1
+ v_cmpx_t_u16 dst, src0, src1
+ v_cmpx_t_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_t_u32 dst, src0, src1
+ v_cmpx_t_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_t_u64 dst, src0, src1
+ v_cmpx_tru_f16 dst, src0, src1
+ v_cmpx_tru_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_tru_f32 dst, src0, src1
+ v_cmpx_tru_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_tru_f64 dst, src0, src1
+ v_cmpx_u_f16 dst, src0, src1
+ v_cmpx_u_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_u_f32 dst, src0, src1
+ v_cmpx_u_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_u_f64 dst, src0, src1
diff --git a/docs/AMDGPUAsmGFX9.rst b/docs/AMDGPUAsmGFX9.rst
new file mode 100644
index 000000000000..97c13f2476bd
--- /dev/null
+++ b/docs/AMDGPUAsmGFX9.rst
@@ -0,0 +1,1906 @@
+..
+ **************************************************
+ * *
+ * Automatically generated file, do not edit! *
+ * *
+ **************************************************
+
+===========================
+Syntax of GFX9 Instructions
+===========================
+
+.. contents::
+ :local:
+
+
+DS
+===========================
+
+.. parsed-literal::
+
+ ds_add_f32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_rtn_f32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_src2_f32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_add_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_and_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_append dst :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_bpermute_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>`
+ ds_cmpst_b32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_b64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_f32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_f64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_f32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_cmpst_rtn_f64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_condxchg32_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_consume dst :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_dec_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_barrier src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_init src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_br src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_p :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_release_all :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_gws_sema_v :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_inc_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_f32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_f64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_i32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_i64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_f32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_f64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_i32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_i64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_f32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_f64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_i32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_i64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_max_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_f32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_f64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_i32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_i64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_f32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_f64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_i32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_i64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_f32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_f64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_i32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_i64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_min_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_b32 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_b64 src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_mskor_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_nop
+ ds_or_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_or_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_ordered_count dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_permute_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>`
+ ds_read2_b32 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2_b64 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2st64_b32 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read2st64_b64 dst, src0 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b128 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b32 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b64 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_b96 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_i16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_i8 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_i8_d16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_i8_d16_hi dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u16_d16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u16_d16_hi dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u8 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u8_d16 dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_read_u8_d16_hi dst, src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_rsub_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_rtn_u32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_rtn_u64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_src2_u32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_src2_u64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_u32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_sub_u64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_swizzle_b32 dst, src0 :ref:`sw_offset16<amdgpu_synid_sw_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrap_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2_b32 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2_b64 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2st64_b32 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write2st64_b64 src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b128 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b16 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b16_d16_hi src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b8 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b8_d16_hi src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_b96 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_write_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2st64_rtn_b32 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg2st64_rtn_b64 dst, src0, src1, src2 :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`ds_offset8<amdgpu_synid_ds_offset8>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_wrxchg_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_b32 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_b64 src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_rtn_b32 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_rtn_b64 dst, src0, src1 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_src2_b32 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+ ds_xor_src2_b64 src0 :ref:`ds_offset16<amdgpu_synid_ds_offset16>` :ref:`gds<amdgpu_synid_gds>`
+
+EXP
+===========================
+
+.. parsed-literal::
+
+ exp dst, src0, src1, src2, src3 :ref:`done<amdgpu_synid_done>` :ref:`compr<amdgpu_synid_compr>` :ref:`vm<amdgpu_synid_vm>`
+
+FLAT
+===========================
+
+.. parsed-literal::
+
+ flat_atomic_add dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_add_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_and dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_and_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_cmpswap dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_cmpswap_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_dec dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_dec_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_inc dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_inc_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_or dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_or_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smax dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smax_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smin dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_smin_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_sub dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_sub_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_swap dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_swap_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umax dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umax_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umin dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_umin_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_xor dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_atomic_xor_x2 dst, src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dword dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx2 dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx3 dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_dwordx4 dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_sbyte dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_sbyte_d16 dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_sbyte_d16_hi dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_short_d16 dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_short_d16_hi dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_sshort dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_ubyte dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_ubyte_d16 dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_ubyte_d16_hi dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_load_ushort dst, src0 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_byte src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_byte_d16_hi src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dword src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx2 src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx3 src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_dwordx4 src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_short src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ flat_store_short_d16_hi src0, src1 :ref:`flat_offset12<amdgpu_synid_flat_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ global_atomic_add dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_add_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_and dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_and_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_cmpswap dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_cmpswap_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_dec dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_dec_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_inc dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_inc_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_or dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_or_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_smax dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_smax_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_smin dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_smin_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_sub dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_sub_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_swap dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_swap_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_umax dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_umax_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_umin dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_umin_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_xor dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_atomic_xor_x2 dst, src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_dword dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_dwordx2 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_dwordx3 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_dwordx4 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_sbyte dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_sbyte_d16 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_sbyte_d16_hi dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_short_d16 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_short_d16_hi dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_sshort dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_ubyte dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_ubyte_d16 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_ubyte_d16_hi dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_load_ushort dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_store_byte src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_store_byte_d16_hi src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_store_dword src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_store_dwordx2 src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_store_dwordx3 src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_store_dwordx4 src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_store_short src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ global_store_short_d16_hi src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>`
+ scratch_load_dword dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_dwordx2 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_dwordx3 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_dwordx4 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_sbyte dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_sbyte_d16 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_sbyte_d16_hi dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_short_d16 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_short_d16_hi dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_sshort dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_ubyte dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_ubyte_d16 dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_ubyte_d16_hi dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_load_ushort dst, src0, src1 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_store_byte src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_store_byte_d16_hi src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_store_dword src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_store_dwordx2 src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_store_dwordx3 src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_store_dwordx4 src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_store_short src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ scratch_store_short_d16_hi src0, src1, src2 :ref:`flat_offset13<amdgpu_synid_flat_offset13>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+
+MIMG
+===========================
+
+.. parsed-literal::
+
+ image_atomic_add dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_and dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_cmpswap dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_dec dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_inc dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_or dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_smax dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_smin dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_sub dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_swap dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_umax dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_umin dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_atomic_xor dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_gather4 dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_c_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_lz_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_gather4_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_get_lod dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_get_resinfo dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_load_mip dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_load_mip_pck dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_mip_pck_sgn dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_pck dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_load_pck_sgn dst, src0, src1 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_sample dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_b dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_c dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_c_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_cl dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_l dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_lz dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_lz_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_sample_o dst, src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`tfe<amdgpu_synid_tfe>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_store src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_store_mip src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>` :ref:`d16<amdgpu_synid_d16>`
+ image_store_mip_pck src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+ image_store_pck src0, src1, src2 :ref:`dmask<amdgpu_synid_dmask>` :ref:`unorm<amdgpu_synid_unorm>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lwe<amdgpu_synid_lwe>` :ref:`da<amdgpu_synid_da>`
+
+MUBUF
+===========================
+
+.. parsed-literal::
+
+ buffer_atomic_add dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_add_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_and dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_and_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_cmpswap dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_cmpswap_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_dec dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_dec_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_inc dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_inc_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_or dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_or_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smax dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smax_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smin dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_smin_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_sub dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_sub_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_swap dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_swap_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umax dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umax_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umin dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_umin_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_xor dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_atomic_xor_x2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dword dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_dwordx2 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dwordx3 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_dwordx4 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_hi_x dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_x dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_xy dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_xyz dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_d16_xyzw dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_x dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_format_xy dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_xyz dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_format_xyzw dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_sbyte dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_sbyte_d16 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_sbyte_d16_hi dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_short_d16 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_short_d16_hi dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_sshort dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_ubyte dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_load_ubyte_d16 dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_ubyte_d16_hi dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_load_ushort dst, src0, src1, src2 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_store_byte src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_byte_d16_hi src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dword src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx2 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx3 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_dwordx4 src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_hi_x src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_x src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_xy src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_xyz src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_d16_xyzw src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_x src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xy src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xyz src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_format_xyzw src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_lds_dword src0, src1 :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`lds<amdgpu_synid_lds>`
+ buffer_store_short src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_store_short_d16_hi src0, src1, src2, src3 :ref:`idxen<amdgpu_synid_idxen>` :ref:`offen<amdgpu_synid_offen>` :ref:`buf_offset12<amdgpu_synid_buf_offset12>` :ref:`glc<amdgpu_synid_glc>` :ref:`slc<amdgpu_synid_slc>`
+ buffer_wbinvl1
+ buffer_wbinvl1_vol
+
+SMEM
+===========================
+
+.. parsed-literal::
+
+ s_atc_probe src0, src1, src2
+ s_atc_probe_buffer src0, src1, src2
+ s_atomic_add dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_add_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_and dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_and_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_cmpswap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_cmpswap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_dec dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_dec_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_inc dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_inc_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_or dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_or_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_smax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_smax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_smin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_smin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_sub dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_sub_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_swap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_swap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_umax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_umax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_umin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_umin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_xor dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_atomic_xor_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_add dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_add_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_and dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_and_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_cmpswap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_cmpswap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_dec dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_dec_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_inc dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_inc_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_or dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_or_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_smax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_smax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_smin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_smin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_sub dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_sub_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_swap dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_swap_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_umax dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_umax_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_umin dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_umin_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_xor dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_atomic_xor_x2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dword dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dwordx16 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dwordx2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dwordx4 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_load_dwordx8 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_store_dword src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_store_dwordx2 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_buffer_store_dwordx4 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_dcache_discard src0, src1
+ s_dcache_discard_x2 src0, src1
+ s_dcache_inv
+ s_dcache_inv_vol
+ s_dcache_wb
+ s_dcache_wb_vol
+ s_load_dword dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_load_dwordx16 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_load_dwordx2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_load_dwordx4 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_load_dwordx8 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_memrealtime dst
+ s_memtime dst
+ s_scratch_load_dword dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_scratch_load_dwordx2 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_scratch_load_dwordx4 dst, src0, src1 :ref:`glc<amdgpu_synid_glc>`
+ s_scratch_store_dword src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_scratch_store_dwordx2 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_scratch_store_dwordx4 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_store_dword src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_store_dwordx2 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+ s_store_dwordx4 src0, src1, src2 :ref:`glc<amdgpu_synid_glc>`
+
+SOP1
+===========================
+
+.. parsed-literal::
+
+ s_abs_i32 dst, src0
+ s_and_saveexec_b64 dst, src0
+ s_andn1_saveexec_b64 dst, src0
+ s_andn1_wrexec_b64 dst, src0
+ s_andn2_saveexec_b64 dst, src0
+ s_andn2_wrexec_b64 dst, src0
+ s_bcnt0_i32_b32 dst, src0
+ s_bcnt0_i32_b64 dst, src0
+ s_bcnt1_i32_b32 dst, src0
+ s_bcnt1_i32_b64 dst, src0
+ s_bitreplicate_b64_b32 dst, src0
+ s_bitset0_b32 dst, src0
+ s_bitset0_b64 dst, src0
+ s_bitset1_b32 dst, src0
+ s_bitset1_b64 dst, src0
+ s_brev_b32 dst, src0
+ s_brev_b64 dst, src0
+ s_cbranch_join src0
+ s_cmov_b32 dst, src0
+ s_cmov_b64 dst, src0
+ s_ff0_i32_b32 dst, src0
+ s_ff0_i32_b64 dst, src0
+ s_ff1_i32_b32 dst, src0
+ s_ff1_i32_b64 dst, src0
+ s_flbit_i32 dst, src0
+ s_flbit_i32_b32 dst, src0
+ s_flbit_i32_b64 dst, src0
+ s_flbit_i32_i64 dst, src0
+ s_getpc_b64 dst
+ s_mov_b32 dst, src0
+ s_mov_b64 dst, src0
+ s_mov_fed_b32 dst, src0
+ s_movreld_b32 dst, src0
+ s_movreld_b64 dst, src0
+ s_movrels_b32 dst, src0
+ s_movrels_b64 dst, src0
+ s_nand_saveexec_b64 dst, src0
+ s_nor_saveexec_b64 dst, src0
+ s_not_b32 dst, src0
+ s_not_b64 dst, src0
+ s_or_saveexec_b64 dst, src0
+ s_orn1_saveexec_b64 dst, src0
+ s_orn2_saveexec_b64 dst, src0
+ s_quadmask_b32 dst, src0
+ s_quadmask_b64 dst, src0
+ s_rfe_b64 src0
+ s_set_gpr_idx_idx src0
+ s_setpc_b64 src0
+ s_sext_i32_i16 dst, src0
+ s_sext_i32_i8 dst, src0
+ s_swappc_b64 dst, src0
+ s_wqm_b32 dst, src0
+ s_wqm_b64 dst, src0
+ s_xnor_saveexec_b64 dst, src0
+ s_xor_saveexec_b64 dst, src0
+
+SOP2
+===========================
+
+.. parsed-literal::
+
+ s_absdiff_i32 dst, src0, src1
+ s_add_i32 dst, src0, src1
+ s_add_u32 dst, src0, src1
+ s_addc_u32 dst, src0, src1
+ s_and_b32 dst, src0, src1
+ s_and_b64 dst, src0, src1
+ s_andn2_b32 dst, src0, src1
+ s_andn2_b64 dst, src0, src1
+ s_ashr_i32 dst, src0, src1
+ s_ashr_i64 dst, src0, src1
+ s_bfe_i32 dst, src0, src1
+ s_bfe_i64 dst, src0, src1
+ s_bfe_u32 dst, src0, src1
+ s_bfe_u64 dst, src0, src1
+ s_bfm_b32 dst, src0, src1
+ s_bfm_b64 dst, src0, src1
+ s_cbranch_g_fork src0, src1
+ s_cselect_b32 dst, src0, src1
+ s_cselect_b64 dst, src0, src1
+ s_lshl1_add_u32 dst, src0, src1
+ s_lshl2_add_u32 dst, src0, src1
+ s_lshl3_add_u32 dst, src0, src1
+ s_lshl4_add_u32 dst, src0, src1
+ s_lshl_b32 dst, src0, src1
+ s_lshl_b64 dst, src0, src1
+ s_lshr_b32 dst, src0, src1
+ s_lshr_b64 dst, src0, src1
+ s_max_i32 dst, src0, src1
+ s_max_u32 dst, src0, src1
+ s_min_i32 dst, src0, src1
+ s_min_u32 dst, src0, src1
+ s_mul_hi_i32 dst, src0, src1
+ s_mul_hi_u32 dst, src0, src1
+ s_mul_i32 dst, src0, src1
+ s_nand_b32 dst, src0, src1
+ s_nand_b64 dst, src0, src1
+ s_nor_b32 dst, src0, src1
+ s_nor_b64 dst, src0, src1
+ s_or_b32 dst, src0, src1
+ s_or_b64 dst, src0, src1
+ s_orn2_b32 dst, src0, src1
+ s_orn2_b64 dst, src0, src1
+ s_pack_hh_b32_b16 dst, src0, src1
+ s_pack_lh_b32_b16 dst, src0, src1
+ s_pack_ll_b32_b16 dst, src0, src1
+ s_rfe_restore_b64 src0, src1
+ s_sub_i32 dst, src0, src1
+ s_sub_u32 dst, src0, src1
+ s_subb_u32 dst, src0, src1
+ s_xnor_b32 dst, src0, src1
+ s_xnor_b64 dst, src0, src1
+ s_xor_b32 dst, src0, src1
+ s_xor_b64 dst, src0, src1
+
+SOPC
+===========================
+
+.. parsed-literal::
+
+ s_bitcmp0_b32 src0, src1
+ s_bitcmp0_b64 src0, src1
+ s_bitcmp1_b32 src0, src1
+ s_bitcmp1_b64 src0, src1
+ s_cmp_eq_i32 src0, src1
+ s_cmp_eq_u32 src0, src1
+ s_cmp_eq_u64 src0, src1
+ s_cmp_ge_i32 src0, src1
+ s_cmp_ge_u32 src0, src1
+ s_cmp_gt_i32 src0, src1
+ s_cmp_gt_u32 src0, src1
+ s_cmp_le_i32 src0, src1
+ s_cmp_le_u32 src0, src1
+ s_cmp_lg_i32 src0, src1
+ s_cmp_lg_u32 src0, src1
+ s_cmp_lg_u64 src0, src1
+ s_cmp_lt_i32 src0, src1
+ s_cmp_lt_u32 src0, src1
+ s_set_gpr_idx_on src0, src1
+ s_setvskip src0, src1
+
+SOPK
+===========================
+
+.. parsed-literal::
+
+ s_addk_i32 dst, src0
+ s_call_b64 dst, src0
+ s_cbranch_i_fork src0, src1
+ s_cmovk_i32 dst, src0
+ s_cmpk_eq_i32 src0, src1
+ s_cmpk_eq_u32 src0, src1
+ s_cmpk_ge_i32 src0, src1
+ s_cmpk_ge_u32 src0, src1
+ s_cmpk_gt_i32 src0, src1
+ s_cmpk_gt_u32 src0, src1
+ s_cmpk_le_i32 src0, src1
+ s_cmpk_le_u32 src0, src1
+ s_cmpk_lg_i32 src0, src1
+ s_cmpk_lg_u32 src0, src1
+ s_cmpk_lt_i32 src0, src1
+ s_cmpk_lt_u32 src0, src1
+ s_getreg_b32 dst, src0
+ s_movk_i32 dst, src0
+ s_mulk_i32 dst, src0
+ s_setreg_b32 dst, src0
+ s_setreg_imm32_b32 dst, src0
+
+SOPP
+===========================
+
+.. parsed-literal::
+
+ s_barrier
+ s_branch src0
+ s_cbranch_cdbgsys src0
+ s_cbranch_cdbgsys_and_user src0
+ s_cbranch_cdbgsys_or_user src0
+ s_cbranch_cdbguser src0
+ s_cbranch_execnz src0
+ s_cbranch_execz src0
+ s_cbranch_scc0 src0
+ s_cbranch_scc1 src0
+ s_cbranch_vccnz src0
+ s_cbranch_vccz src0
+ s_decperflevel src0
+ s_endpgm
+ s_endpgm_ordered_ps_done
+ s_endpgm_saved
+ s_icache_inv
+ s_incperflevel src0
+ s_nop src0
+ s_sendmsg src0
+ s_sendmsghalt src0
+ s_set_gpr_idx_mode src0
+ s_set_gpr_idx_off
+ s_sethalt src0
+ s_setkill src0
+ s_setprio src0
+ s_sleep src0
+ s_trap src0
+ s_ttracedata
+ s_waitcnt src0
+ s_wakeup
+
+VINTRP
+===========================
+
+.. parsed-literal::
+
+ v_interp_mov_f32 dst, src0, src1
+ v_interp_p1_f32 dst, src0, src1
+ v_interp_p2_f32 dst, src0, src1
+
+VOP1
+===========================
+
+.. parsed-literal::
+
+ v_bfrev_b32 dst, src0
+ v_bfrev_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_bfrev_b32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ceil_f16 dst, src0
+ v_ceil_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ceil_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ceil_f32 dst, src0
+ v_ceil_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ceil_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ceil_f64 dst, src0
+ v_clrexcp
+ v_cos_f16 dst, src0
+ v_cos_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cos_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cos_f32 dst, src0
+ v_cos_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cos_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f16_f32 dst, src0
+ v_cvt_f16_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f16_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f16_i16 dst, src0
+ v_cvt_f16_i16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f16_i16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f16_u16 dst, src0
+ v_cvt_f16_u16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f16_u16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_f16 dst, src0
+ v_cvt_f32_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_f64 dst, src0
+ v_cvt_f32_i32 dst, src0
+ v_cvt_f32_i32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_i32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_u32 dst, src0
+ v_cvt_f32_u32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_u32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_ubyte0 dst, src0
+ v_cvt_f32_ubyte0_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_ubyte0_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_ubyte1 dst, src0
+ v_cvt_f32_ubyte1_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_ubyte1_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_ubyte2 dst, src0
+ v_cvt_f32_ubyte2_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_ubyte2_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f32_ubyte3 dst, src0
+ v_cvt_f32_ubyte3_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_f32_ubyte3_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_f64_f32 dst, src0
+ v_cvt_f64_i32 dst, src0
+ v_cvt_f64_u32 dst, src0
+ v_cvt_flr_i32_f32 dst, src0
+ v_cvt_flr_i32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_flr_i32_f32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_i16_f16 dst, src0
+ v_cvt_i16_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_i16_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_i32_f32 dst, src0
+ v_cvt_i32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_i32_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_i32_f64 dst, src0
+ v_cvt_norm_i16_f16 dst, src0
+ v_cvt_norm_i16_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_norm_i16_f16_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_norm_u16_f16 dst, src0
+ v_cvt_norm_u16_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_norm_u16_f16_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_off_f32_i4 dst, src0
+ v_cvt_off_f32_i4_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_off_f32_i4_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_rpi_i32_f32 dst, src0
+ v_cvt_rpi_i32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_rpi_i32_f32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_u16_f16 dst, src0
+ v_cvt_u16_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_u16_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_u32_f32 dst, src0
+ v_cvt_u32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cvt_u32_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_cvt_u32_f64 dst, src0
+ v_exp_f16 dst, src0
+ v_exp_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_exp_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_exp_f32 dst, src0
+ v_exp_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_exp_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_exp_legacy_f32 dst, src0
+ v_exp_legacy_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_exp_legacy_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ffbh_i32 dst, src0
+ v_ffbh_i32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ffbh_i32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ffbh_u32 dst, src0
+ v_ffbh_u32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ffbh_u32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_ffbl_b32 dst, src0
+ v_ffbl_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ffbl_b32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_floor_f16 dst, src0
+ v_floor_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_floor_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_floor_f32 dst, src0
+ v_floor_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_floor_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_floor_f64 dst, src0
+ v_fract_f16 dst, src0
+ v_fract_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_fract_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_fract_f32 dst, src0
+ v_fract_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_fract_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_fract_f64 dst, src0
+ v_frexp_exp_i16_f16 dst, src0
+ v_frexp_exp_i16_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_frexp_exp_i16_f16_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_frexp_exp_i32_f32 dst, src0
+ v_frexp_exp_i32_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_frexp_exp_i32_f32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_frexp_exp_i32_f64 dst, src0
+ v_frexp_mant_f16 dst, src0
+ v_frexp_mant_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_frexp_mant_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_frexp_mant_f32 dst, src0
+ v_frexp_mant_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_frexp_mant_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_frexp_mant_f64 dst, src0
+ v_log_f16 dst, src0
+ v_log_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_log_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_log_f32 dst, src0
+ v_log_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_log_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_log_legacy_f32 dst, src0
+ v_log_legacy_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_log_legacy_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_mov_b32 dst, src0
+ v_mov_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mov_b32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_mov_fed_b32 dst, src0
+ v_mov_fed_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mov_fed_b32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_nop
+ v_not_b32 dst, src0
+ v_not_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_not_b32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rcp_f16 dst, src0
+ v_rcp_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rcp_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rcp_f32 dst, src0
+ v_rcp_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rcp_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rcp_f64 dst, src0
+ v_rcp_iflag_f32 dst, src0
+ v_rcp_iflag_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rcp_iflag_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_readfirstlane_b32 dst, src0
+ v_rndne_f16 dst, src0
+ v_rndne_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rndne_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rndne_f32 dst, src0
+ v_rndne_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rndne_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rndne_f64 dst, src0
+ v_rsq_f16 dst, src0
+ v_rsq_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rsq_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rsq_f32 dst, src0
+ v_rsq_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_rsq_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_rsq_f64 dst, src0
+ v_sat_pk_u8_i16 dst, src0
+ v_sat_pk_u8_i16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sat_pk_u8_i16_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_screen_partition_4se_b32 dst, src0
+ v_screen_partition_4se_b32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_screen_partition_4se_b32_sdwa dst, src0 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sin_f16 dst, src0
+ v_sin_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sin_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sin_f32 dst, src0
+ v_sin_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sin_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sqrt_f16 dst, src0
+ v_sqrt_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sqrt_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sqrt_f32 dst, src0
+ v_sqrt_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sqrt_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_sqrt_f64 dst, src0
+ v_swap_b32 dst, src0
+ v_trunc_f16 dst, src0
+ v_trunc_f16_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_trunc_f16_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_trunc_f32 dst, src0
+ v_trunc_f32_dpp dst, src0 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_trunc_f32_sdwa dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>`
+ v_trunc_f64 dst, src0
+
+VOP2
+===========================
+
+.. parsed-literal::
+
+ v_add_co_u32 dst0, dst1, src0, src1
+ v_add_co_u32_dpp dst0, dst1, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_co_u32_sdwa dst0, dst1, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_add_f16 dst, src0, src1
+ v_add_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_add_f32 dst, src0, src1
+ v_add_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_add_u16 dst, src0, src1
+ v_add_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_u16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_add_u32 dst, src0, src1
+ v_add_u32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_add_u32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_addc_co_u32 dst0, dst1, src0, src1, src2
+ v_addc_co_u32_dpp dst0, dst1, src0, src1, src2 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_addc_co_u32_sdwa dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_and_b32 dst, src0, src1
+ v_and_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_and_b32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_ashrrev_i16 dst, src0, src1
+ v_ashrrev_i16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ashrrev_i16_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_ashrrev_i32 dst, src0, src1
+ v_ashrrev_i32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ashrrev_i32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cndmask_b32 dst, src0, src1, src2
+ v_cndmask_b32_dpp dst, src0, src1, src2 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_cndmask_b32_sdwa dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_ldexp_f16 dst, src0, src1
+ v_ldexp_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_ldexp_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_lshlrev_b16 dst, src0, src1
+ v_lshlrev_b16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_lshlrev_b16_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_lshlrev_b32 dst, src0, src1
+ v_lshlrev_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_lshlrev_b32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_lshrrev_b16 dst, src0, src1
+ v_lshrrev_b16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_lshrrev_b16_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_lshrrev_b32 dst, src0, src1
+ v_lshrrev_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_lshrrev_b32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mac_f16 dst, src0, src1
+ v_mac_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mac_f32 dst, src0, src1
+ v_mac_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_madak_f16 dst, src0, src1, src2
+ v_madak_f32 dst, src0, src1, src2
+ v_madmk_f16 dst, src0, src1, src2
+ v_madmk_f32 dst, src0, src1, src2
+ v_max_f16 dst, src0, src1
+ v_max_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_f32 dst, src0, src1
+ v_max_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_i16 dst, src0, src1
+ v_max_i16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_i16_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_i32 dst, src0, src1
+ v_max_i32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_i32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_u16 dst, src0, src1
+ v_max_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_u16_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_max_u32 dst, src0, src1
+ v_max_u32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_max_u32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_f16 dst, src0, src1
+ v_min_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_f32 dst, src0, src1
+ v_min_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_i16 dst, src0, src1
+ v_min_i16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_i16_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_i32 dst, src0, src1
+ v_min_i32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_i32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_u16 dst, src0, src1
+ v_min_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_u16_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_min_u32 dst, src0, src1
+ v_min_u32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_min_u32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_f16 dst, src0, src1
+ v_mul_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_f32 dst, src0, src1
+ v_mul_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_hi_i32_i24 dst, src0, src1
+ v_mul_hi_i32_i24_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_hi_i32_i24_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_hi_u32_u24 dst, src0, src1
+ v_mul_hi_u32_u24_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_hi_u32_u24_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_i32_i24 dst, src0, src1
+ v_mul_i32_i24_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_i32_i24_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_legacy_f32 dst, src0, src1
+ v_mul_legacy_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_legacy_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_lo_u16 dst, src0, src1
+ v_mul_lo_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_lo_u16_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_mul_u32_u24 dst, src0, src1
+ v_mul_u32_u24_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_mul_u32_u24_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_or_b32 dst, src0, src1
+ v_or_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_or_b32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_co_u32 dst0, dst1, src0, src1
+ v_sub_co_u32_dpp dst0, dst1, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_co_u32_sdwa dst0, dst1, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_f16 dst, src0, src1
+ v_sub_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_f32 dst, src0, src1
+ v_sub_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_u16 dst, src0, src1
+ v_sub_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_u16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_sub_u32 dst, src0, src1
+ v_sub_u32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_sub_u32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subb_co_u32 dst0, dst1, src0, src1, src2
+ v_subb_co_u32_dpp dst0, dst1, src0, src1, src2 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subb_co_u32_sdwa dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subbrev_co_u32 dst0, dst1, src0, src1, src2
+ v_subbrev_co_u32_dpp dst0, dst1, src0, src1, src2 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subbrev_co_u32_sdwa dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_co_u32 dst0, dst1, src0, src1
+ v_subrev_co_u32_dpp dst0, dst1, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_co_u32_sdwa dst0, dst1, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_f16 dst, src0, src1
+ v_subrev_f16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_f16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_f32 dst, src0, src1
+ v_subrev_f32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_f32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_u16 dst, src0, src1
+ v_subrev_u16_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_u16_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_subrev_u32 dst, src0, src1
+ v_subrev_u32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_subrev_u32_sdwa dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_xor_b32 dst, src0, src1
+ v_xor_b32_dpp dst, src0, src1 :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` :ref:`row_mask<amdgpu_synid_row_mask>` :ref:`bank_mask<amdgpu_synid_bank_mask>` :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`
+ v_xor_b32_sdwa dst, src0, src1 :ref:`omod<amdgpu_synid_omod>` :ref:`dst_sel<amdgpu_synid_dst_sel>` :ref:`dst_unused<amdgpu_synid_dst_unused>` :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+
+VOP3
+===========================
+
+.. parsed-literal::
+
+ v_add3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_add_co_u32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_add_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_i16 dst, src0, src1 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_add_i32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_add_lshl_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_add_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_add_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_addc_co_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_alignbit_b32 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_alignbyte_b32 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_and_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_and_or_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_ashrrev_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ashrrev_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_ashrrev_i64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bcnt_u32_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bfe_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfe_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfi_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_bfm_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_bfrev_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ceil_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ceil_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ceil_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_clrexcp_e64 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_class_f16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_class_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_class_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_eq_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_f_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ge_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_gt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_le_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lg_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lg_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lg_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_lt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ne_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_neq_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_neq_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_neq_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nge_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nge_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nge_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ngt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ngt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_ngt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nle_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nle_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nle_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlg_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlg_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlg_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_nlt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_o_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_o_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_o_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_t_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_tru_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_tru_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_tru_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_u_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_u_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmp_u_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_class_f16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_class_f32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_class_f64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_eq_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_f_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ge_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_gt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_le_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lg_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lg_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lg_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_lt_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ne_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_neq_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_neq_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_neq_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nge_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nge_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nge_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ngt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ngt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_ngt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nle_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nle_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nle_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlg_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlg_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlg_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlt_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlt_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_nlt_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_o_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_o_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_o_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_i64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_t_u64_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_tru_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_tru_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_tru_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_u_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_u_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cmpx_u_f64_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cndmask_b32_e64 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_cos_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cos_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubeid_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubema_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubesc_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cubetc_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f16_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f16_i16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f16_u16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_i32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_u32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte0_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte1_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte2_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f32_ubyte3_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_i32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_f64_u32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_flr_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_i16_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_i32_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_i32_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_norm_i16_f16_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_norm_u16_f16_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_off_f32_i4_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_i16_i32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_u16_u32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pk_u8_f32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pkaccum_u8_f32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pknorm_i16_f16 dst, src0, src1 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pknorm_i16_f32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pknorm_u16_f16 dst, src0, src1 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pknorm_u16_f32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_pkrtz_f16_f32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_rpi_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_u16_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_u32_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_cvt_u32_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fixup_legacy_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fmas_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_fmas_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_div_scale_f32 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_div_scale_f64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_exp_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_exp_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_exp_legacy_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ffbh_i32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ffbh_u32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_ffbl_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_floor_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_floor_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_floor_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_f64 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fma_legacy_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fract_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fract_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_fract_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_exp_i16_f16_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_exp_i32_f32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_exp_i32_f64_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_mant_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_mant_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_frexp_mant_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_mov_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p1_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p1ll_f16 dst, src0, src1 :ref:`high<amdgpu_synid_high>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p1lv_f16 dst, src0, src1, src2 :ref:`high<amdgpu_synid_high>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p2_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`high<amdgpu_synid_high>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p2_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_interp_p2_legacy_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`high<amdgpu_synid_high>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ldexp_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ldexp_f32 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_ldexp_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_lerp_u8 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_log_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_log_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_log_legacy_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_lshl_add_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_lshl_or_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_lshlrev_b16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshlrev_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshlrev_b64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshrrev_b16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshrrev_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_lshrrev_b64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mac_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mac_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i32_i16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i32_i24 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_i64_i32 dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_legacy_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_legacy_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_legacy_i16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_legacy_u16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u32_u16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u32_u24 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mad_u64_u32 dst0, dst1, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max3_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max3_i16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_max3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_max3_u16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_max3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_max_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_max_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_max_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_max_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_max_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mbcnt_hi_u32_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mbcnt_lo_u32_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_med3_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_med3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_med3_i16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_med3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_med3_u16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_med3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min3_f16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min3_f32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min3_i16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_min3_i32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min3_u16 dst, src0, src1, src2 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_min3_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_min_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_min_i16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_min_i32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_min_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_min_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mov_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_mov_fed_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_mqsad_pk_u16_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mqsad_u32_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_msad_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_i32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_i32_i24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_u32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_hi_u32_u24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_i32_i24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_legacy_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_mul_lo_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_lo_u32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_mul_u32_u24_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_nop_e64 :ref:`omod<amdgpu_synid_omod>`
+ v_not_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_or3_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_or_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_pack_b32_f16 dst, src0, src1 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`omod<amdgpu_synid_omod>`
+ v_perm_b32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_qsad_pk_u16_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rcp_iflag_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_readlane_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_rndne_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rndne_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rndne_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_rsq_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_hi_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u16 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u32 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sad_u8 dst, src0, src1, src2 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sat_pk_u8_i16_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_screen_partition_4se_b32_e64 dst, src0 :ref:`omod<amdgpu_synid_omod>`
+ v_sin_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sin_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sqrt_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sqrt_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sqrt_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_co_u32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_sub_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_i16 dst, src0, src1 :ref:`vop3_op_sel<amdgpu_synid_vop3_op_sel>` :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_sub_i32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_sub_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_sub_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_subb_co_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_subbrev_co_u32_e64 dst0, dst1, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_co_u32_e64 dst0, dst1, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_f16_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_f32_e64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_u16_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_subrev_u32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_trig_preop_f64 dst, src0, src1 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_trunc_f16_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_trunc_f32_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_trunc_f64_e64 dst, src0 :ref:`clamp<amdgpu_synid_clamp>` :ref:`omod<amdgpu_synid_omod>`
+ v_writelane_b32 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+ v_xad_u32 dst, src0, src1, src2 :ref:`omod<amdgpu_synid_omod>`
+ v_xor_b32_e64 dst, src0, src1 :ref:`omod<amdgpu_synid_omod>`
+
+VOP3P
+===========================
+
+.. parsed-literal::
+
+ v_mad_mix_f32 dst, src0, src1, src2 :ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>` :ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_mad_mixhi_f16 dst, src0, src1, src2 :ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>` :ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_mad_mixlo_f16 dst, src0, src1, src2 :ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>` :ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_add_f16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`neg_lo<amdgpu_synid_neg_lo>` :ref:`neg_hi<amdgpu_synid_neg_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_add_i16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_add_u16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_ashrrev_i16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`
+ v_pk_fma_f16 dst, src0, src1, src2 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`neg_lo<amdgpu_synid_neg_lo>` :ref:`neg_hi<amdgpu_synid_neg_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_lshlrev_b16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`
+ v_pk_lshrrev_b16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`
+ v_pk_mad_i16 dst, src0, src1, src2 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_mad_u16 dst, src0, src1, src2 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_max_f16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`neg_lo<amdgpu_synid_neg_lo>` :ref:`neg_hi<amdgpu_synid_neg_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_max_i16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`
+ v_pk_max_u16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`
+ v_pk_min_f16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`neg_lo<amdgpu_synid_neg_lo>` :ref:`neg_hi<amdgpu_synid_neg_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_min_i16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`
+ v_pk_min_u16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`
+ v_pk_mul_f16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`neg_lo<amdgpu_synid_neg_lo>` :ref:`neg_hi<amdgpu_synid_neg_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_mul_lo_u16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`
+ v_pk_sub_i16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+ v_pk_sub_u16 dst, src0, src1 :ref:`op_sel<amdgpu_synid_op_sel>` :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>` :ref:`clamp<amdgpu_synid_clamp>`
+
+VOPC
+===========================
+
+.. parsed-literal::
+
+ v_cmp_class_f16 dst, src0, src1
+ v_cmp_class_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_class_f32 dst, src0, src1
+ v_cmp_class_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_class_f64 dst, src0, src1
+ v_cmp_eq_f16 dst, src0, src1
+ v_cmp_eq_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_f32 dst, src0, src1
+ v_cmp_eq_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_f64 dst, src0, src1
+ v_cmp_eq_i16 dst, src0, src1
+ v_cmp_eq_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_i32 dst, src0, src1
+ v_cmp_eq_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_i64 dst, src0, src1
+ v_cmp_eq_u16 dst, src0, src1
+ v_cmp_eq_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_u32 dst, src0, src1
+ v_cmp_eq_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_eq_u64 dst, src0, src1
+ v_cmp_f_f16 dst, src0, src1
+ v_cmp_f_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_f32 dst, src0, src1
+ v_cmp_f_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_f64 dst, src0, src1
+ v_cmp_f_i16 dst, src0, src1
+ v_cmp_f_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_i32 dst, src0, src1
+ v_cmp_f_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_i64 dst, src0, src1
+ v_cmp_f_u16 dst, src0, src1
+ v_cmp_f_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_u32 dst, src0, src1
+ v_cmp_f_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_f_u64 dst, src0, src1
+ v_cmp_ge_f16 dst, src0, src1
+ v_cmp_ge_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_f32 dst, src0, src1
+ v_cmp_ge_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_f64 dst, src0, src1
+ v_cmp_ge_i16 dst, src0, src1
+ v_cmp_ge_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_i32 dst, src0, src1
+ v_cmp_ge_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_i64 dst, src0, src1
+ v_cmp_ge_u16 dst, src0, src1
+ v_cmp_ge_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_u32 dst, src0, src1
+ v_cmp_ge_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ge_u64 dst, src0, src1
+ v_cmp_gt_f16 dst, src0, src1
+ v_cmp_gt_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_f32 dst, src0, src1
+ v_cmp_gt_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_f64 dst, src0, src1
+ v_cmp_gt_i16 dst, src0, src1
+ v_cmp_gt_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_i32 dst, src0, src1
+ v_cmp_gt_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_i64 dst, src0, src1
+ v_cmp_gt_u16 dst, src0, src1
+ v_cmp_gt_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_u32 dst, src0, src1
+ v_cmp_gt_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_gt_u64 dst, src0, src1
+ v_cmp_le_f16 dst, src0, src1
+ v_cmp_le_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_f32 dst, src0, src1
+ v_cmp_le_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_f64 dst, src0, src1
+ v_cmp_le_i16 dst, src0, src1
+ v_cmp_le_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_i32 dst, src0, src1
+ v_cmp_le_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_i64 dst, src0, src1
+ v_cmp_le_u16 dst, src0, src1
+ v_cmp_le_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_u32 dst, src0, src1
+ v_cmp_le_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_le_u64 dst, src0, src1
+ v_cmp_lg_f16 dst, src0, src1
+ v_cmp_lg_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lg_f32 dst, src0, src1
+ v_cmp_lg_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lg_f64 dst, src0, src1
+ v_cmp_lt_f16 dst, src0, src1
+ v_cmp_lt_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_f32 dst, src0, src1
+ v_cmp_lt_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_f64 dst, src0, src1
+ v_cmp_lt_i16 dst, src0, src1
+ v_cmp_lt_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_i32 dst, src0, src1
+ v_cmp_lt_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_i64 dst, src0, src1
+ v_cmp_lt_u16 dst, src0, src1
+ v_cmp_lt_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_u32 dst, src0, src1
+ v_cmp_lt_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_lt_u64 dst, src0, src1
+ v_cmp_ne_i16 dst, src0, src1
+ v_cmp_ne_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ne_i32 dst, src0, src1
+ v_cmp_ne_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ne_i64 dst, src0, src1
+ v_cmp_ne_u16 dst, src0, src1
+ v_cmp_ne_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ne_u32 dst, src0, src1
+ v_cmp_ne_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ne_u64 dst, src0, src1
+ v_cmp_neq_f16 dst, src0, src1
+ v_cmp_neq_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_neq_f32 dst, src0, src1
+ v_cmp_neq_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_neq_f64 dst, src0, src1
+ v_cmp_nge_f16 dst, src0, src1
+ v_cmp_nge_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nge_f32 dst, src0, src1
+ v_cmp_nge_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nge_f64 dst, src0, src1
+ v_cmp_ngt_f16 dst, src0, src1
+ v_cmp_ngt_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ngt_f32 dst, src0, src1
+ v_cmp_ngt_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_ngt_f64 dst, src0, src1
+ v_cmp_nle_f16 dst, src0, src1
+ v_cmp_nle_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nle_f32 dst, src0, src1
+ v_cmp_nle_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nle_f64 dst, src0, src1
+ v_cmp_nlg_f16 dst, src0, src1
+ v_cmp_nlg_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nlg_f32 dst, src0, src1
+ v_cmp_nlg_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nlg_f64 dst, src0, src1
+ v_cmp_nlt_f16 dst, src0, src1
+ v_cmp_nlt_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nlt_f32 dst, src0, src1
+ v_cmp_nlt_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_nlt_f64 dst, src0, src1
+ v_cmp_o_f16 dst, src0, src1
+ v_cmp_o_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_o_f32 dst, src0, src1
+ v_cmp_o_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_o_f64 dst, src0, src1
+ v_cmp_t_i16 dst, src0, src1
+ v_cmp_t_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_t_i32 dst, src0, src1
+ v_cmp_t_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_t_i64 dst, src0, src1
+ v_cmp_t_u16 dst, src0, src1
+ v_cmp_t_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_t_u32 dst, src0, src1
+ v_cmp_t_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_t_u64 dst, src0, src1
+ v_cmp_tru_f16 dst, src0, src1
+ v_cmp_tru_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_tru_f32 dst, src0, src1
+ v_cmp_tru_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_tru_f64 dst, src0, src1
+ v_cmp_u_f16 dst, src0, src1
+ v_cmp_u_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_u_f32 dst, src0, src1
+ v_cmp_u_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmp_u_f64 dst, src0, src1
+ v_cmpx_class_f16 dst, src0, src1
+ v_cmpx_class_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_class_f32 dst, src0, src1
+ v_cmpx_class_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_class_f64 dst, src0, src1
+ v_cmpx_eq_f16 dst, src0, src1
+ v_cmpx_eq_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_f32 dst, src0, src1
+ v_cmpx_eq_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_f64 dst, src0, src1
+ v_cmpx_eq_i16 dst, src0, src1
+ v_cmpx_eq_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_i32 dst, src0, src1
+ v_cmpx_eq_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_i64 dst, src0, src1
+ v_cmpx_eq_u16 dst, src0, src1
+ v_cmpx_eq_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_u32 dst, src0, src1
+ v_cmpx_eq_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_eq_u64 dst, src0, src1
+ v_cmpx_f_f16 dst, src0, src1
+ v_cmpx_f_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_f32 dst, src0, src1
+ v_cmpx_f_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_f64 dst, src0, src1
+ v_cmpx_f_i16 dst, src0, src1
+ v_cmpx_f_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_i32 dst, src0, src1
+ v_cmpx_f_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_i64 dst, src0, src1
+ v_cmpx_f_u16 dst, src0, src1
+ v_cmpx_f_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_u32 dst, src0, src1
+ v_cmpx_f_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_f_u64 dst, src0, src1
+ v_cmpx_ge_f16 dst, src0, src1
+ v_cmpx_ge_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_f32 dst, src0, src1
+ v_cmpx_ge_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_f64 dst, src0, src1
+ v_cmpx_ge_i16 dst, src0, src1
+ v_cmpx_ge_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_i32 dst, src0, src1
+ v_cmpx_ge_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_i64 dst, src0, src1
+ v_cmpx_ge_u16 dst, src0, src1
+ v_cmpx_ge_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_u32 dst, src0, src1
+ v_cmpx_ge_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ge_u64 dst, src0, src1
+ v_cmpx_gt_f16 dst, src0, src1
+ v_cmpx_gt_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_f32 dst, src0, src1
+ v_cmpx_gt_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_f64 dst, src0, src1
+ v_cmpx_gt_i16 dst, src0, src1
+ v_cmpx_gt_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_i32 dst, src0, src1
+ v_cmpx_gt_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_i64 dst, src0, src1
+ v_cmpx_gt_u16 dst, src0, src1
+ v_cmpx_gt_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_u32 dst, src0, src1
+ v_cmpx_gt_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_gt_u64 dst, src0, src1
+ v_cmpx_le_f16 dst, src0, src1
+ v_cmpx_le_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_f32 dst, src0, src1
+ v_cmpx_le_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_f64 dst, src0, src1
+ v_cmpx_le_i16 dst, src0, src1
+ v_cmpx_le_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_i32 dst, src0, src1
+ v_cmpx_le_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_i64 dst, src0, src1
+ v_cmpx_le_u16 dst, src0, src1
+ v_cmpx_le_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_u32 dst, src0, src1
+ v_cmpx_le_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_le_u64 dst, src0, src1
+ v_cmpx_lg_f16 dst, src0, src1
+ v_cmpx_lg_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lg_f32 dst, src0, src1
+ v_cmpx_lg_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lg_f64 dst, src0, src1
+ v_cmpx_lt_f16 dst, src0, src1
+ v_cmpx_lt_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_f32 dst, src0, src1
+ v_cmpx_lt_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_f64 dst, src0, src1
+ v_cmpx_lt_i16 dst, src0, src1
+ v_cmpx_lt_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_i32 dst, src0, src1
+ v_cmpx_lt_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_i64 dst, src0, src1
+ v_cmpx_lt_u16 dst, src0, src1
+ v_cmpx_lt_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_u32 dst, src0, src1
+ v_cmpx_lt_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_lt_u64 dst, src0, src1
+ v_cmpx_ne_i16 dst, src0, src1
+ v_cmpx_ne_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ne_i32 dst, src0, src1
+ v_cmpx_ne_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ne_i64 dst, src0, src1
+ v_cmpx_ne_u16 dst, src0, src1
+ v_cmpx_ne_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ne_u32 dst, src0, src1
+ v_cmpx_ne_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ne_u64 dst, src0, src1
+ v_cmpx_neq_f16 dst, src0, src1
+ v_cmpx_neq_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_neq_f32 dst, src0, src1
+ v_cmpx_neq_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_neq_f64 dst, src0, src1
+ v_cmpx_nge_f16 dst, src0, src1
+ v_cmpx_nge_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nge_f32 dst, src0, src1
+ v_cmpx_nge_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nge_f64 dst, src0, src1
+ v_cmpx_ngt_f16 dst, src0, src1
+ v_cmpx_ngt_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ngt_f32 dst, src0, src1
+ v_cmpx_ngt_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_ngt_f64 dst, src0, src1
+ v_cmpx_nle_f16 dst, src0, src1
+ v_cmpx_nle_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nle_f32 dst, src0, src1
+ v_cmpx_nle_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nle_f64 dst, src0, src1
+ v_cmpx_nlg_f16 dst, src0, src1
+ v_cmpx_nlg_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nlg_f32 dst, src0, src1
+ v_cmpx_nlg_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nlg_f64 dst, src0, src1
+ v_cmpx_nlt_f16 dst, src0, src1
+ v_cmpx_nlt_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nlt_f32 dst, src0, src1
+ v_cmpx_nlt_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_nlt_f64 dst, src0, src1
+ v_cmpx_o_f16 dst, src0, src1
+ v_cmpx_o_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_o_f32 dst, src0, src1
+ v_cmpx_o_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_o_f64 dst, src0, src1
+ v_cmpx_t_i16 dst, src0, src1
+ v_cmpx_t_i16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_t_i32 dst, src0, src1
+ v_cmpx_t_i32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_t_i64 dst, src0, src1
+ v_cmpx_t_u16 dst, src0, src1
+ v_cmpx_t_u16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_t_u32 dst, src0, src1
+ v_cmpx_t_u32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_t_u64 dst, src0, src1
+ v_cmpx_tru_f16 dst, src0, src1
+ v_cmpx_tru_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_tru_f32 dst, src0, src1
+ v_cmpx_tru_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_tru_f64 dst, src0, src1
+ v_cmpx_u_f16 dst, src0, src1
+ v_cmpx_u_f16_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_u_f32 dst, src0, src1
+ v_cmpx_u_f32_sdwa dst, src0, src1 :ref:`src0_sel<amdgpu_synid_src0_sel>` :ref:`src1_sel<amdgpu_synid_src1_sel>`
+ v_cmpx_u_f64 dst, src0, src1
diff --git a/docs/AMDGPUOperandSyntax.rst b/docs/AMDGPUOperandSyntax.rst
new file mode 100644
index 000000000000..4f3536eed40d
--- /dev/null
+++ b/docs/AMDGPUOperandSyntax.rst
@@ -0,0 +1,1055 @@
+=================================================
+Syntax of AMDGPU Assembler Operands and Modifiers
+=================================================
+
+.. contents::
+ :local:
+
+Conventions
+===========
+
+The following conventions are used in syntax description:
+
+ =================== =============================================================
+ Notation Description
+ =================== =============================================================
+ {0..N} Any integer value in the range from 0 to N (inclusive).
+ Unless stated otherwise, this value may be specified as
+ either a literal or an llvm expression.
+ <x> Syntax and meaning of *<x>* is explained elsewhere.
+ =================== =============================================================
+
+.. _amdgpu_syn_operands:
+
+Operands
+========
+
+TBD
+
+.. _amdgpu_syn_modifiers:
+
+Modifiers
+=========
+
+DS Modifiers
+------------
+
+.. _amdgpu_synid_ds_offset8:
+
+ds_offset8
+~~~~~~~~~~
+
+Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
+
+Used with DS instructions which have 2 addresses.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ offset:{0..0xFF} Specifies a 8-bit offset.
+ ======================================== ================================================
+
+.. _amdgpu_synid_ds_offset16:
+
+ds_offset16
+~~~~~~~~~~~
+
+Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
+
+Used with DS instructions which have 1 address.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ offset:{0..0xFFFF} Specifies a 16-bit offset.
+ ======================================== ================================================
+
+.. _amdgpu_synid_sw_offset16:
+
+sw_offset16
+~~~~~~~~~~~
+
+This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
+Specifies a sizzle pattern in numeric or symbolic form. The default value is 0.
+
+See AMD documentation for more information.
+
+ ======================================================= ===================================================
+ Syntax Description
+ ======================================================= ===================================================
+ offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern
+ in a numeric form.
+ offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern; each
+ number is a lane id.
+ offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern
+ which converts a 5-bit lane id to another
+ lane id with which the lane interacts.
+
+ <mask> is a 5 character sequence which
+ specifies how to transform the bits of the
+ lane id. The following characters are allowed:
+
+ * "0" - set bit to 0.
+
+ * "1" - set bit to 1.
+
+ * "p" - preserve bit.
+
+ * "i" - inverse bit.
+
+ offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode.
+ Broadcasts the value of any particular lane to
+ all lanes in its group.
+
+ The first numeric parameter is a group
+ size and must be equal to 2, 4, 8, 16 or 32.
+
+ The second numeric parameter is an index of the
+ lane being broadcasted. The index must not exceed
+ group size.
+ offset:swizzle(SWAP,{1..16}) Specifies a swap mode.
+ Swaps the neighboring groups of
+ 1, 2, 4, 8 or 16 lanes.
+ offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode. Reverses
+ the lanes for groups of 2, 4, 8, 16 or 32 lanes.
+ ======================================================= ===================================================
+
+.. _amdgpu_synid_gds:
+
+gds
+~~~
+
+Specifies whether to use GDS or LDS memory (LDS is the default).
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ gds Use GDS memory.
+ ======================================== ================================================
+
+
+EXP Modifiers
+-------------
+
+.. _amdgpu_synid_done:
+
+done
+~~~~
+
+Specifies if this is the last export from the shader to the target. By default, current
+instruction does not finish an export sequence.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ done Indicates the last export operation.
+ ======================================== ================================================
+
+.. _amdgpu_synid_compr:
+
+compr
+~~~~~
+
+Indicates if the data are compressed (not compressed by default).
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ compr Data are compressed.
+ ======================================== ================================================
+
+.. _amdgpu_synid_vm:
+
+vm
+~~
+
+Specifies valid mask flag state (off by default).
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ vm Set valid mask flag.
+ ======================================== ================================================
+
+FLAT Modifiers
+--------------
+
+.. _amdgpu_synid_flat_offset12:
+
+flat_offset12
+~~~~~~~~~~~~~
+
+Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
+
+Cannot be used with *global/scratch* opcodes. GFX9 only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ offset:{0..4095} Specifies a 12-bit unsigned offset.
+ ======================================== ================================================
+
+.. _amdgpu_synid_flat_offset13:
+
+flat_offset13
+~~~~~~~~~~~~~
+
+Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
+
+Can be used with *global/scratch* opcodes only. GFX9 only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ offset:{-4096..+4095} Specifies a 13-bit signed offset.
+ ======================================== ================================================
+
+glc
+~~~
+
+See a description :ref:`here<amdgpu_synid_glc>`.
+
+slc
+~~~
+
+See a description :ref:`here<amdgpu_synid_slc>`.
+
+tfe
+~~~
+
+See a description :ref:`here<amdgpu_synid_tfe>`.
+
+nv
+~~
+
+See a description :ref:`here<amdgpu_synid_nv>`.
+
+MIMG Modifiers
+--------------
+
+.. _amdgpu_synid_dmask:
+
+dmask
+~~~~~
+
+Specifies which channels (image components) are used by the operation. By default, no channels
+are used.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ dmask:{0..15} Each bit corresponds to one of 4 image
+ components (RGBA). If the specified bit value
+ is 0, the component is not used, value 1 means
+ that the component is used.
+ ======================================== ================================================
+
+This modifier has some limitations depending on instruction kind:
+
+ ======================================== ================================================
+ Instruction Kind Valid dmask Values
+ ======================================== ================================================
+ 32-bit atomic cmpswap 0x3
+ other 32-bit atomic instructions 0x1
+ 64-bit atomic cmpswap 0xF
+ other 64-bit atomic instructions 0x3
+ GATHER4 0x1, 0x2, 0x4, 0x8
+ Other instructions any value
+ ======================================== ================================================
+
+.. _amdgpu_synid_unorm:
+
+unorm
+~~~~~
+
+Specifies whether address is normalized or not (normalized by default).
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ unorm Force address to be un-normalized.
+ ======================================== ================================================
+
+glc
+~~~
+
+See a description :ref:`here<amdgpu_synid_glc>`.
+
+slc
+~~~
+
+See a description :ref:`here<amdgpu_synid_slc>`.
+
+.. _amdgpu_synid_r128:
+
+r128
+~~~~
+
+Specifies texture resource size. The default size is 256 bits.
+
+GFX7 and GFX8 only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ r128 Specifies 128 bits texture resource size.
+ ======================================== ================================================
+
+tfe
+~~~
+
+See a description :ref:`here<amdgpu_synid_tfe>`.
+
+.. _amdgpu_synid_lwe:
+
+lwe
+~~~
+
+Specifies LOD warning status (LOD warning is disabled by default).
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ lwe Enables LOD warning.
+ ======================================== ================================================
+
+.. _amdgpu_synid_da:
+
+da
+~~
+
+Specifies if an array index must be sent to TA. By default, array index is not sent.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ da Send an array-index to TA.
+ ======================================== ================================================
+
+.. _amdgpu_synid_d16:
+
+d16
+~~~
+
+Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ d16 Enables 16-bits data mode.
+
+ On loads, convert data in memory to 16-bit
+ format before storing it in VGPRs.
+
+ For stores, convert 16-bit data in VGPRs to
+ 32 bits before going to memory.
+
+ Note that 16-bit data are stored in VGPRs
+ unpacked in GFX8.0. In GFX8.1 and GFX9 16-bit
+ data are packed.
+ ======================================== ================================================
+
+.. _amdgpu_synid_a16:
+
+a16
+~~~
+
+Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ a16 Enables 16-bits image address components.
+ ======================================== ================================================
+
+Miscellaneous Modifiers
+-----------------------
+
+.. _amdgpu_synid_glc:
+
+glc
+~~~
+
+This modifier has different meaning for loads, stores, and atomic operations.
+The default value is off (0).
+
+See AMD documentation for details.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ glc Set glc bit to 1.
+ ======================================== ================================================
+
+.. _amdgpu_synid_slc:
+
+slc
+~~~
+
+Specifies cache policy. The default value is off (0).
+
+See AMD documentation for details.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ slc Set slc bit to 1.
+ ======================================== ================================================
+
+.. _amdgpu_synid_tfe:
+
+tfe
+~~~
+
+Controls access to partially resident textures. The default value is off (0).
+
+See AMD documentation for details.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ tfe Set tfe bit to 1.
+ ======================================== ================================================
+
+.. _amdgpu_synid_nv:
+
+nv
+~~
+
+Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
+
+GFX9 only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ nv Indicates that instruction operates on
+ non-volatile memory.
+ ======================================== ================================================
+
+MUBUF/MTBUF Modifiers
+---------------------
+
+.. _amdgpu_synid_idxen:
+
+idxen
+~~~~~
+
+Specifies whether address components include an index. By default, no components are used.
+
+Can be used together with :ref:`offen<amdgpu_synid_offen>`.
+
+Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ idxen Address components include an index.
+ ======================================== ================================================
+
+.. _amdgpu_synid_offen:
+
+offen
+~~~~~
+
+Specifies whether address components include an offset. By default, no components are used.
+
+Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
+
+Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ offen Address components include an offset.
+ ======================================== ================================================
+
+.. _amdgpu_synid_addr64:
+
+addr64
+~~~~~~
+
+Specifies whether a 64-bit address is used. By default, no address is used.
+
+GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
+:ref:`idxen<amdgpu_synid_idxen>` modifiers.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ addr64 A 64-bit address is used.
+ ======================================== ================================================
+
+.. _amdgpu_synid_buf_offset12:
+
+buf_offset12
+~~~~~~~~~~~~
+
+Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ offset:{0..0xFFF} Specifies a 12-bit unsigned offset.
+ ======================================== ================================================
+
+glc
+~~~
+
+See a description :ref:`here<amdgpu_synid_glc>`.
+
+slc
+~~~
+
+See a description :ref:`here<amdgpu_synid_slc>`.
+
+.. _amdgpu_synid_lds:
+
+lds
+~~~
+
+Specifies where to store the result: VGPRs or LDS (VGPRs by default).
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ lds Store result in LDS.
+ ======================================== ================================================
+
+tfe
+~~~
+
+See a description :ref:`here<amdgpu_synid_tfe>`.
+
+.. _amdgpu_synid_dfmt:
+
+dfmt
+~~~~
+
+TBD
+
+.. _amdgpu_synid_nfmt:
+
+nfmt
+~~~~
+
+TBD
+
+SMRD/SMEM Modifiers
+-------------------
+
+glc
+~~~
+
+See a description :ref:`here<amdgpu_synid_glc>`.
+
+nv
+~~
+
+See a description :ref:`here<amdgpu_synid_nv>`.
+
+VINTRP Modifiers
+----------------
+
+.. _amdgpu_synid_high:
+
+high
+~~~~
+
+Specifies which half of the LDS word to use. Low half of LDS word is used by default.
+GFX9 only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ high Use high half of LDS word.
+ ======================================== ================================================
+
+VOP1/VOP2 DPP Modifiers
+-----------------------
+
+GFX8 and GFX9 only.
+
+.. _amdgpu_synid_dpp_ctrl:
+
+dpp_ctrl
+~~~~~~~~
+
+Specifies how data are shared between threads. This is a mandatory modifier.
+There is no default value.
+
+Note. The lanes of a wavefront are organized in four banks and four rows.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads.
+ row_mirror Mirror threads within row.
+ row_half_mirror Mirror threads within 1/2 row (8 threads).
+ row_bcast:15 Broadcast 15th thread of each row to next row.
+ row_bcast:31 Broadcast thread 31 to rows 2 and 3.
+ wave_shl:1 Wavefront left shift by 1 thread.
+ wave_rol:1 Wavefront left rotate by 1 thread.
+ wave_shr:1 Wavefront right shift by 1 thread.
+ wave_ror:1 Wavefront right rotate by 1 thread.
+ row_shl:{1..15} Row shift left by 1-15 threads.
+ row_shr:{1..15} Row shift right by 1-15 threads.
+ row_ror:{1..15} Row rotate right by 1-15 threads.
+ ======================================== ================================================
+
+.. _amdgpu_synid_row_mask:
+
+row_mask
+~~~~~~~~
+
+Controls which rows are enabled for data sharing. By default, all rows are enabled.
+
+Note. The lanes of a wavefront are organized in four banks and four rows.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ row_mask:{0..15} Each of 4 bits in the mask controls one
+ row (0 - disabled, 1 - enabled).
+ ======================================== ================================================
+
+.. _amdgpu_synid_bank_mask:
+
+bank_mask
+~~~~~~~~~
+
+Controls which banks are enabled for data sharing. By default, all banks are enabled.
+
+Note. The lanes of a wavefront are organized in four banks and four rows.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ bank_mask:{0..15} Each of 4 bits in the mask controls one
+ bank (0 - disabled, 1 - enabled).
+ ======================================== ================================================
+
+.. _amdgpu_synid_bound_ctrl:
+
+bound_ctrl
+~~~~~~~~~~
+
+Controls data sharing when accessing an invalid lane. By default, data sharing with
+invalid lanes is disabled.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ bound_ctrl:0 Enables data sharing with invalid lanes.
+ Accessing data from an invalid lane will
+ return zero.
+ ======================================== ================================================
+
+VOP1/VOP2/VOPC SDWA Modifiers
+-----------------------------
+
+GFX8 and GFX9 only.
+
+clamp
+~~~~~
+
+See a description :ref:`here<amdgpu_synid_clamp>`.
+
+omod
+~~~~
+
+See a description :ref:`here<amdgpu_synid_omod>`.
+
+GFX9 only.
+
+.. _amdgpu_synid_dst_sel:
+
+dst_sel
+~~~~~~~
+
+Selects which bits in the destination are affected. By default, all bits are affected.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ dst_sel:DWORD Use bits 31:0.
+ dst_sel:BYTE_0 Use bits 7:0.
+ dst_sel:BYTE_1 Use bits 15:8.
+ dst_sel:BYTE_2 Use bits 23:16.
+ dst_sel:BYTE_3 Use bits 31:24.
+ dst_sel:WORD_0 Use bits 15:0.
+ dst_sel:WORD_1 Use bits 31:16.
+ ======================================== ================================================
+
+
+.. _amdgpu_synid_dst_unused:
+
+dst_unused
+~~~~~~~~~~
+
+Controls what to do with the bits in the destination which are not selected
+by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
+By default, unused bits are preserved.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ dst_unused:UNUSED_PAD Pad with zeros.
+ dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits.
+ dst_unused:UNUSED_PRESERVE Preserve bits.
+ ======================================== ================================================
+
+.. _amdgpu_synid_src0_sel:
+
+src0_sel
+~~~~~~~~
+
+Controls which bits in the src0 are used. By default, all bits are used.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ src0_sel:DWORD Use bits 31:0.
+ src0_sel:BYTE_0 Use bits 7:0.
+ src0_sel:BYTE_1 Use bits 15:8.
+ src0_sel:BYTE_2 Use bits 23:16.
+ src0_sel:BYTE_3 Use bits 31:24.
+ src0_sel:WORD_0 Use bits 15:0.
+ src0_sel:WORD_1 Use bits 31:16.
+ ======================================== ================================================
+
+.. _amdgpu_synid_src1_sel:
+
+src1_sel
+~~~~~~~~
+
+Controls which bits in the src1 are used. By default, all bits are used.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ src1_sel:DWORD Use bits 31:0.
+ src1_sel:BYTE_0 Use bits 7:0.
+ src1_sel:BYTE_1 Use bits 15:8.
+ src1_sel:BYTE_2 Use bits 23:16.
+ src1_sel:BYTE_3 Use bits 31:24.
+ src1_sel:WORD_0 Use bits 15:0.
+ src1_sel:WORD_1 Use bits 31:16.
+ ======================================== ================================================
+
+VOP1/VOP2/VOPC SDWA Operand Modifiers
+-------------------------------------
+
+Operand modifiers are not used separately. They are applied to source operands.
+
+GFX8 and GFX9 only.
+
+abs
+~~~
+
+See a description :ref:`here<amdgpu_synid_abs>`.
+
+neg
+~~~
+
+See a description :ref:`here<amdgpu_synid_neg>`.
+
+.. _amdgpu_synid_sext:
+
+sext
+~~~~
+
+Sign-extends value of a (sub-dword) operand to fill all 32 bits.
+Has no effect for 32-bit operands.
+
+Valid for integer operands only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ sext(<operand>) Sign-extend operand value.
+ ======================================== ================================================
+
+VOP3 Modifiers
+--------------
+
+.. _amdgpu_synid_vop3_op_sel:
+
+vop3_op_sel
+~~~~~~~~~~~
+
+Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
+By default, low bits are used for all operands.
+
+The number of values specified with the op_sel modifier must match the number of instruction
+operands (both source and destination). First value controls src0, second value controls src1
+and so on, except that the last value controls destination.
+The value 0 selects the low bits, while 1 selects the high bits.
+
+Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
+by op_sel must be 0.
+
+GFX9 only.
+
+ ======================================== ============================================================
+ Syntax Description
+ ======================================== ============================================================
+ op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand.
+ op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
+ op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
+ ======================================== ============================================================
+
+.. _amdgpu_synid_clamp:
+
+clamp
+~~~~~
+
+Clamp meaning depends on instruction.
+
+For *v_cmp* instructions, clamp modifier indicates that the compare signals
+if a floating point exception occurs. By default, signaling is disabled.
+Not supported by GFX7.
+
+For integer operations, clamp modifier indicates that the result must be clamped
+to the largest and smallest representable value. By default, there is no clamping.
+Integer clamping is not supported by GFX7.
+
+For floating point operations, clamp modifier indicates that the result must be clamped
+to the range [0.0, 1.0]. By default, there is no clamping.
+
+Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ clamp Enables clamping (or signaling).
+ ======================================== ================================================
+
+.. _amdgpu_synid_omod:
+
+omod
+~~~~
+
+Specifies if an output modifier must be applied to the result.
+By default, no output modifiers are applied.
+
+Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
+
+Output modifiers are valid for f32 and f64 floating point results only.
+They must not be used with f16.
+
+Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result
+but accepts output modifiers.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ mul:2 Multiply the result by 2.
+ mul:4 Multiply the result by 4.
+ div:2 Multiply the result by 0.5.
+ ======================================== ================================================
+
+VOP3 Operand Modifiers
+----------------------
+
+Operand modifiers are not used separately. They are applied to source operands.
+
+.. _amdgpu_synid_abs:
+
+abs
+~~~
+
+Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
+Valid for floating point operands only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ abs(<operand>) Get absolute value of operand.
+ \|<operand>| The same as above.
+ ======================================== ================================================
+
+.. _amdgpu_synid_neg:
+
+neg
+~~~
+
+Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
+Valid for floating point operands only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ neg(<operand>) Get negative value of operand.
+ -<operand> The same as above.
+ ======================================== ================================================
+
+VOP3P Modifiers
+---------------
+
+This section describes modifiers of regular VOP3P instructions.
+*v_mad_mix* modifiers are described :ref:`in a separate section<amdgpu_synid_mad_mix>`.
+
+GFX9 only.
+
+.. _amdgpu_synid_op_sel:
+
+op_sel
+~~~~~~
+
+Selects the low [15:0] or high [31:16] operand bits as input to the operation
+which results in the lower-half of the destination.
+By default, low bits are used for all operands.
+
+The number of values specified with the op_sel modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+The value 0 selects the low bits, while 1 selects the high bits.
+
+ ======================================== =============================================================
+ Syntax Description
+ ======================================== =============================================================
+ op_sel:[{0..1}] Select operand bits for instructions with 1 source operand.
+ op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
+ op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
+ ======================================== =============================================================
+
+.. _amdgpu_synid_op_sel_hi:
+
+op_sel_hi
+~~~~~~~~~
+
+Selects the low [15:0] or high [31:16] operand bits as input to the operation
+which results in the upper-half of the destination.
+By default, high bits are used for all operands.
+
+The number of values specified with the op_sel_hi modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+The value 0 selects the low bits, while 1 selects the high bits.
+
+ ======================================== =============================================================
+ Syntax Description
+ ======================================== =============================================================
+ op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand.
+ op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
+ op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
+ ======================================== =============================================================
+
+.. _amdgpu_synid_neg_lo:
+
+neg_lo
+~~~~~~
+
+Specifies whether to change sign of operand values selected by
+:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
+as input to the operation which results in the upper-half of the destination.
+
+The number of values specified with this modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+
+The value 0 indicates that the corresponding operand value is used unmodified,
+the value 1 indicates that negative value of the operand must be used.
+
+By default, operand values are used unmodified.
+
+This modifier is valid for floating point operands only.
+
+ ======================================== ==================================================================
+ Syntax Description
+ ======================================== ==================================================================
+ neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand.
+ neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
+ neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
+ ======================================== ==================================================================
+
+.. _amdgpu_synid_neg_hi:
+
+neg_hi
+~~~~~~
+
+Specifies whether to change sign of operand values selected by
+:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
+as input to the operation which results in the upper-half of the destination.
+
+The number of values specified with this modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+
+The value 0 indicates that the corresponding operand value is used unmodified,
+the value 1 indicates that negative value of the operand must be used.
+
+By default, operand values are used unmodified.
+
+This modifier is valid for floating point operands only.
+
+ ======================================== ==================================================================
+ Syntax Description
+ ======================================== ==================================================================
+ neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand.
+ neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
+ neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
+ ======================================== ==================================================================
+
+clamp
+~~~~~
+
+See a description :ref:`here<amdgpu_synid_clamp>`.
+
+.. _amdgpu_synid_mad_mix:
+
+VOP3P V_MAD_MIX Modifiers
+-------------------------
+
+These instructions use VOP3P format but have different modifiers.
+
+GFX9 only.
+
+.. _amdgpu_synid_mad_mix_op_sel:
+
+mad_mix_op_sel
+~~~~~~~~~~~~~~
+
+This operand has meaning only for 16-bit source operands as indicated by
+:ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
+It specifies to select either the low [15:0] or high [31:16] operand bits
+as input to the operation.
+
+The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
+By default, low bits are used for all operands.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand.
+ ======================================== ================================================
+
+.. _amdgpu_synid_mad_mix_op_sel_hi:
+
+mad_mix_op_sel_hi
+~~~~~~~~~~~~~~~~~
+
+Selects the size of source operands: either 32 bits or 16 bits.
+By default, 32 bits are used for all source operands.
+
+The value 0 indicates 32 bits, the value 1 indicates 16 bits.
+The location of 16 bits in the operand may be specified by
+:ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>`.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand.
+ ======================================== ================================================
+
+abs
+~~~
+
+See a description :ref:`here<amdgpu_synid_abs>`.
+
+neg
+~~~
+
+See a description :ref:`here<amdgpu_synid_neg>`.
+
+clamp
+~~~~~
+
+See a description :ref:`here<amdgpu_synid_clamp>`.
diff --git a/docs/AMDGPUUsage.rst b/docs/AMDGPUUsage.rst
index 439089348fff..15159834f410 100644
--- a/docs/AMDGPUUsage.rst
+++ b/docs/AMDGPUUsage.rst
@@ -64,13 +64,7 @@ specify the target triple:
============ ==============================================================
Environment Description
============ ==============================================================
- *<empty>* Defaults to ``opencl``.
- ``opencl`` OpenCL compute kernel (see :ref:`amdgpu-opencl`).
- ``amdgizcl`` Same as ``opencl`` except a different address space mapping is
- used (see :ref:`amdgpu-address-spaces`).
- ``amdgiz`` Same as ``opencl`` except a different address space mapping is
- used (see :ref:`amdgpu-address-spaces`).
- ``hcc`` AMD HC language compute kernel (see :ref:`amdgpu-hcc`).
+ *<empty>* Default.
============ ==============================================================
.. _amdgpu-processors:
@@ -104,23 +98,23 @@ names from both the *Processor* and *Alternative Processor* can be used.
**Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
-----------------------------------------------------------------------------------
``cedar`` ``r600`` dGPU
+ ``cypress`` ``r600`` dGPU
+ ``juniper`` ``r600`` dGPU
``redwood`` ``r600`` dGPU
``sumo`` ``r600`` dGPU
- ``juniper`` ``r600`` dGPU
- ``cypress`` ``r600`` dGPU
**Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
-----------------------------------------------------------------------------------
``barts`` ``r600`` dGPU
- ``turks`` ``r600`` dGPU
``caicos`` ``r600`` dGPU
``cayman`` ``r600`` dGPU
+ ``turks`` ``r600`` dGPU
**GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
-----------------------------------------------------------------------------------
``gfx600`` - ``tahiti`` ``amdgcn`` dGPU
- ``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU
- - ``verde``
+ ``gfx601`` - ``hainan`` ``amdgcn`` dGPU
- ``oland``
- - ``hainan``
+ - ``pitcairn``
+ - ``verde``
**GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
-----------------------------------------------------------------------------------
``gfx700`` - ``kaveri`` ``amdgcn`` APU - A6-7000
@@ -174,8 +168,8 @@ names from both the *Processor* and *Alternative Processor* can be used.
\ ``amdgcn`` APU - xnack - E2-9010
[on] - A6-9210
- A9-9410
- ``gfx802`` - ``tonga`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150
- - ``iceland`` [off] - FirePro S7100
+ ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150
+ - ``tonga`` [off] - FirePro S7100
- FirePro W7100
- Radeon R285
- Radeon R9 380
@@ -204,8 +198,15 @@ names from both the *Processor* and *Alternative Processor* can be used.
- Radeon RX Vega 64
Liquid
- Radeon Instinct MI25
- ``gfx902`` ``amdgcn`` APU - xnack *TBA*
- [on]
+ ``gfx902`` ``amdgcn`` APU - xnack - Ryzen 3 2200G
+ [on] - Ryzen 5 2400G
+ ``gfx904`` ``amdgcn`` dGPU - xnack *TBA*
+ [off]
+ .. TODO
+ Add product
+ names.
+ ``gfx906`` ``amdgcn`` dGPU - xnack *TBA*
+ [off]
.. TODO
Add product
names.
@@ -274,34 +275,17 @@ LLVM Address Space number is used throughout LLVM (for example, in LLVM IR).
.. table:: Address Space Mapping
:name: amdgpu-address-space-mapping-table
- ================== ================= ================= ================= =================
+ ================== =================
LLVM Address Space Memory Space
- ------------------ -----------------------------------------------------------------------
- \ Current Default amdgiz/amdgizcl hcc Future Default
- ================== ================= ================= ================= =================
- 0 Private (Scratch) Generic (Flat) Generic (Flat) Generic (Flat)
- 1 Global Global Global Global
- 2 Constant Constant Constant Region (GDS)
- 3 Local (group/LDS) Local (group/LDS) Local (group/LDS) Local (group/LDS)
- 4 Generic (Flat) Region (GDS) Region (GDS) Constant
- 5 Region (GDS) Private (Scratch) Private (Scratch) Private (Scratch)
- ================== ================= ================= ================= =================
-
-Current Default
- This is the current default address space mapping used for all languages
- except hcc. This will shortly be deprecated.
-
-amdgiz/amdgizcl
- This is the current address space mapping used when ``amdgiz`` or ``amdgizcl``
- is specified as the target triple environment value.
-
-hcc
- This is the current address space mapping used when ``hcc`` is specified as
- the target triple environment value.This will shortly be deprecated.
-
-Future Default
- This will shortly be the only address space mapping for all languages using
- AMDGPU backend.
+ ================== =================
+ 0 Generic (Flat)
+ 1 Global
+ 2 Region (GDS)
+ 3 Local (group/LDS)
+ 4 Constant
+ 5 Private (Scratch)
+ 6 Constant 32-bit
+ ================== =================
.. _amdgpu-memory-scopes:
@@ -385,13 +369,42 @@ is conservatively correct for OpenCL.
AMDGPU Intrinsics
-----------------
-The AMDGPU backend implements the following intrinsics.
+The AMDGPU backend implements the following LLVM IR intrinsics.
*This section is WIP.*
.. TODO
List AMDGPU intrinsics
+AMDGPU Attributes
+-----------------
+
+The AMDGPU backend supports the following LLVM IR attributes.
+
+ .. table:: AMDGPU LLVM IR Attributes
+ :name: amdgpu-llvm-ir-attributes-table
+
+ ======================================= ==========================================================
+ LLVM Attribute Description
+ ======================================= ==========================================================
+ "amdgpu-flat-work-group-size"="min,max" Specify the minimum and maximum flat work group sizes that
+ will be specified when the kernel is dispatched. Generated
+ by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
+ "amdgpu-implicitarg-num-bytes"="n" Number of kernel argument bytes to add to the kernel
+ argument block size for the implicit arguments. This
+ varies by OS and language (for OpenCL see
+ :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
+ "amdgpu-max-work-group-size"="n" Specify the maximum work-group size that will be specifed
+ when the kernel is dispatched.
+ "amdgpu-num-sgpr"="n" Specifies the number of SGPRs to use. Generated by
+ the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
+ "amdgpu-num-vgpr"="n" Specifies the number of VGPRs to use. Generated by the
+ ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
+ "amdgpu-waves-per-eu"="m,n" Specify the minimum and maximum number of waves per
+ execution unit. Generated by the ``amdgpu_waves_per_eu``
+ CLANG attribute [CLANG-ATTR]_.
+ ======================================= ==========================================================
+
Code Object
===========
@@ -524,6 +537,11 @@ The AMDGPU backend uses the following ELF header:
target feature is
enabled for all code
contained in the code object.
+ If the processor
+ does not support the
+ ``xnack`` target
+ feature then must
+ be 0.
See
:ref:`amdgpu-target-features`.
================================= ========== =============================
@@ -535,38 +553,42 @@ The AMDGPU backend uses the following ELF header:
Name Value Description (see
:ref:`amdgpu-processor-table`)
================================= ========== =============================
- ``EF_AMDGPU_MACH_NONE`` 0 *not specified*
- ``EF_AMDGPU_MACH_R600_R600`` 1 ``r600``
- ``EF_AMDGPU_MACH_R600_R630`` 2 ``r630``
- ``EF_AMDGPU_MACH_R600_RS880`` 3 ``rs880``
- ``EF_AMDGPU_MACH_R600_RV670`` 4 ``rv670``
- ``EF_AMDGPU_MACH_R600_RV710`` 5 ``rv710``
- ``EF_AMDGPU_MACH_R600_RV730`` 6 ``rv730``
- ``EF_AMDGPU_MACH_R600_RV770`` 7 ``rv770``
- ``EF_AMDGPU_MACH_R600_CEDAR`` 8 ``cedar``
- ``EF_AMDGPU_MACH_R600_REDWOOD`` 9 ``redwood``
- ``EF_AMDGPU_MACH_R600_SUMO`` 10 ``sumo``
- ``EF_AMDGPU_MACH_R600_JUNIPER`` 11 ``juniper``
- ``EF_AMDGPU_MACH_R600_CYPRESS`` 12 ``cypress``
- ``EF_AMDGPU_MACH_R600_BARTS`` 13 ``barts``
- ``EF_AMDGPU_MACH_R600_TURKS`` 14 ``turks``
- ``EF_AMDGPU_MACH_R600_CAICOS`` 15 ``caicos``
- ``EF_AMDGPU_MACH_R600_CAYMAN`` 16 ``cayman``
- *reserved* 17-31 Reserved for ``r600``
- architecture processors.
- ``EF_AMDGPU_MACH_AMDGCN_GFX600`` 32 ``gfx600``
- ``EF_AMDGPU_MACH_AMDGCN_GFX601`` 33 ``gfx601``
- ``EF_AMDGPU_MACH_AMDGCN_GFX700`` 34 ``gfx700``
- ``EF_AMDGPU_MACH_AMDGCN_GFX701`` 35 ``gfx701``
- ``EF_AMDGPU_MACH_AMDGCN_GFX702`` 36 ``gfx702``
- ``EF_AMDGPU_MACH_AMDGCN_GFX703`` 37 ``gfx703``
- ``EF_AMDGPU_MACH_AMDGCN_GFX704`` 38 ``gfx704``
- ``EF_AMDGPU_MACH_AMDGCN_GFX801`` 39 ``gfx801``
- ``EF_AMDGPU_MACH_AMDGCN_GFX802`` 40 ``gfx802``
- ``EF_AMDGPU_MACH_AMDGCN_GFX803`` 41 ``gfx803``
- ``EF_AMDGPU_MACH_AMDGCN_GFX810`` 42 ``gfx810``
- ``EF_AMDGPU_MACH_AMDGCN_GFX900`` 43 ``gfx900``
- ``EF_AMDGPU_MACH_AMDGCN_GFX902`` 44 ``gfx902``
+ ``EF_AMDGPU_MACH_NONE`` 0x000 *not specified*
+ ``EF_AMDGPU_MACH_R600_R600`` 0x001 ``r600``
+ ``EF_AMDGPU_MACH_R600_R630`` 0x002 ``r630``
+ ``EF_AMDGPU_MACH_R600_RS880`` 0x003 ``rs880``
+ ``EF_AMDGPU_MACH_R600_RV670`` 0x004 ``rv670``
+ ``EF_AMDGPU_MACH_R600_RV710`` 0x005 ``rv710``
+ ``EF_AMDGPU_MACH_R600_RV730`` 0x006 ``rv730``
+ ``EF_AMDGPU_MACH_R600_RV770`` 0x007 ``rv770``
+ ``EF_AMDGPU_MACH_R600_CEDAR`` 0x008 ``cedar``
+ ``EF_AMDGPU_MACH_R600_CYPRESS`` 0x009 ``cypress``
+ ``EF_AMDGPU_MACH_R600_JUNIPER`` 0x00a ``juniper``
+ ``EF_AMDGPU_MACH_R600_REDWOOD`` 0x00b ``redwood``
+ ``EF_AMDGPU_MACH_R600_SUMO`` 0x00c ``sumo``
+ ``EF_AMDGPU_MACH_R600_BARTS`` 0x00d ``barts``
+ ``EF_AMDGPU_MACH_R600_CAICOS`` 0x00e ``caicos``
+ ``EF_AMDGPU_MACH_R600_CAYMAN`` 0x00f ``cayman``
+ ``EF_AMDGPU_MACH_R600_TURKS`` 0x010 ``turks``
+ *reserved* 0x011 - Reserved for ``r600``
+ 0x01f architecture processors.
+ ``EF_AMDGPU_MACH_AMDGCN_GFX600`` 0x020 ``gfx600``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX601`` 0x021 ``gfx601``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX700`` 0x022 ``gfx700``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX701`` 0x023 ``gfx701``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX702`` 0x024 ``gfx702``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX703`` 0x025 ``gfx703``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX704`` 0x026 ``gfx704``
+ *reserved* 0x027 Reserved.
+ ``EF_AMDGPU_MACH_AMDGCN_GFX801`` 0x028 ``gfx801``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX802`` 0x029 ``gfx802``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX803`` 0x02a ``gfx803``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX810`` 0x02b ``gfx810``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX900`` 0x02c ``gfx900``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX902`` 0x02d ``gfx902``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX904`` 0x02e ``gfx904``
+ ``EF_AMDGPU_MACH_AMDGCN_GFX906`` 0x02f ``gfx906``
+ *reserved* 0x030 Reserved.
================================= ========== =============================
Sections
@@ -674,7 +696,7 @@ Additional note records can be present.
Specifies extensible metadata associated with the code objects executed on HSA
[HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. It is required when
the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). See
- :ref:`amdgpu-amdhsa-hsa-code-object-metadata` for the syntax of the code
+ :ref:`amdgpu-amdhsa-code-object-metadata` for the syntax of the code
object metadata string.
.. _amdgpu-symbols:
@@ -693,7 +715,7 @@ Symbols include the following:
*link-name* ``STT_OBJECT`` - ``.data`` Global variable
- ``.rodata``
- ``.bss``
- *link-name*\ ``@kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
+ *link-name*\ ``.kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
*link-name* ``STT_FUNC`` - ``.text`` Kernel entry point
===================== ============== ============= ==================
@@ -773,31 +795,41 @@ The following relocation types are supported:
.. table:: AMDGPU ELF Relocation Records
:name: amdgpu-elf-relocation-records-table
- ========================== ===== ========== ==============================
- Relocation Type Value Field Calculation
- ========================== ===== ========== ==============================
- ``R_AMDGPU_NONE`` 0 *none* *none*
- ``R_AMDGPU_ABS32_LO`` 1 ``word32`` (S + A) & 0xFFFFFFFF
- ``R_AMDGPU_ABS32_HI`` 2 ``word32`` (S + A) >> 32
- ``R_AMDGPU_ABS64`` 3 ``word64`` S + A
- ``R_AMDGPU_REL32`` 4 ``word32`` S + A - P
- ``R_AMDGPU_REL64`` 5 ``word64`` S + A - P
- ``R_AMDGPU_ABS32`` 6 ``word32`` S + A
- ``R_AMDGPU_GOTPCREL`` 7 ``word32`` G + GOT + A - P
- ``R_AMDGPU_GOTPCREL32_LO`` 8 ``word32`` (G + GOT + A - P) & 0xFFFFFFFF
- ``R_AMDGPU_GOTPCREL32_HI`` 9 ``word32`` (G + GOT + A - P) >> 32
- ``R_AMDGPU_REL32_LO`` 10 ``word32`` (S + A - P) & 0xFFFFFFFF
- ``R_AMDGPU_REL32_HI`` 11 ``word32`` (S + A - P) >> 32
- *reserved* 12
- ``R_AMDGPU_RELATIVE64`` 13 ``word64`` B + A
- ========================== ===== ========== ==============================
+ ========================== ======= ===== ========== ==============================
+ Relocation Type Kind Value Field Calculation
+ ========================== ======= ===== ========== ==============================
+ ``R_AMDGPU_NONE`` 0 *none* *none*
+ ``R_AMDGPU_ABS32_LO`` Static, 1 ``word32`` (S + A) & 0xFFFFFFFF
+ Dynamic
+ ``R_AMDGPU_ABS32_HI`` Static, 2 ``word32`` (S + A) >> 32
+ Dynamic
+ ``R_AMDGPU_ABS64`` Static, 3 ``word64`` S + A
+ Dynamic
+ ``R_AMDGPU_REL32`` Static 4 ``word32`` S + A - P
+ ``R_AMDGPU_REL64`` Static 5 ``word64`` S + A - P
+ ``R_AMDGPU_ABS32`` Static, 6 ``word32`` S + A
+ Dynamic
+ ``R_AMDGPU_GOTPCREL`` Static 7 ``word32`` G + GOT + A - P
+ ``R_AMDGPU_GOTPCREL32_LO`` Static 8 ``word32`` (G + GOT + A - P) & 0xFFFFFFFF
+ ``R_AMDGPU_GOTPCREL32_HI`` Static 9 ``word32`` (G + GOT + A - P) >> 32
+ ``R_AMDGPU_REL32_LO`` Static 10 ``word32`` (S + A - P) & 0xFFFFFFFF
+ ``R_AMDGPU_REL32_HI`` Static 11 ``word32`` (S + A - P) >> 32
+ *reserved* 12
+ ``R_AMDGPU_RELATIVE64`` Dynamic 13 ``word64`` B + A
+ ========================== ======= ===== ========== ==============================
+
+``R_AMDGPU_ABS32_LO`` and ``R_AMDGPU_ABS32_HI`` are only supported by
+the ``mesa3d`` OS, which does not support ``R_AMDGPU_ABS64``.
+
+There is no current OS loader support for 32 bit programs and so
+``R_AMDGPU_ABS32`` is not used.
.. _amdgpu-dwarf:
DWARF
-----
-Standard DWARF [DWARF]_ Version 2 sections can be generated. These contain
+Standard DWARF [DWARF]_ Version 5 sections can be generated. These contain
information that maps the code object executable code and data to the source
language constructs. It can be used by tools such as debuggers and profilers.
@@ -853,10 +885,60 @@ Register Mapping
Source Text
~~~~~~~~~~~
-*This section is WIP.*
+Source text for online-compiled programs (e.g. those compiled by the OpenCL
+runtime) may be embedded into the DWARF v5 line table using the ``clang
+-gembed-source`` option, described in table :ref:`amdgpu-debug-options`.
-.. TODO
- DWARF extension to include runtime generated source text.
+For example:
+
+``-gembed-source``
+ Enable the embedded source DWARF v5 extension.
+``-gno-embed-source``
+ Disable the embedded source DWARF v5 extension.
+
+ .. table:: AMDGPU Debug Options
+ :name: amdgpu-debug-options
+
+ ==================== ==================================================
+ Debug Flag Description
+ ==================== ==================================================
+ -g[no-]embed-source Enable/disable embedding source text in DWARF
+ debug sections. Useful for environments where
+ source cannot be written to disk, such as
+ when performing online compilation.
+ ==================== ==================================================
+
+This option enables one extended content types in the DWARF v5 Line Number
+Program Header, which is used to encode embedded source.
+
+ .. table:: AMDGPU DWARF Line Number Program Header Extended Content Types
+ :name: amdgpu-dwarf-extended-content-types
+
+ ============================ ======================
+ Content Type Form
+ ============================ ======================
+ ``DW_LNCT_LLVM_source`` ``DW_FORM_line_strp``
+ ============================ ======================
+
+The source field will contain the UTF-8 encoded, null-terminated source text
+with ``'\n'`` line endings. When the source field is present, consumers can use
+the embedded source instead of attempting to discover the source on disk. When
+the source field is absent, consumers can access the file to get the source
+text.
+
+The above content type appears in the ``file_name_entry_format`` field of the
+line table prologue, and its corresponding value appear in the ``file_names``
+field. The current encoding of the content type is documented in table
+:ref:`amdgpu-dwarf-extended-content-types-encoding`
+
+ .. table:: AMDGPU DWARF Line Number Program Header Extended Content Types Encoding
+ :name: amdgpu-dwarf-extended-content-types-encoding
+
+ ============================ ====================
+ Content Type Value
+ ============================ ====================
+ ``DW_LNCT_LLVM_source`` 0x2001
+ ============================ ====================
.. _amdgpu-code-conventions:
@@ -872,7 +954,37 @@ AMDHSA
This section provides code conventions used when the target triple OS is
``amdhsa`` (see :ref:`amdgpu-target-triples`).
-.. _amdgpu-amdhsa-hsa-code-object-metadata:
+.. _amdgpu-amdhsa-code-object-target-identification:
+
+Code Object Target Identification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The AMDHSA OS uses the following syntax to specify the code object
+target as a single string:
+
+ ``<Architecture>-<Vendor>-<OS>-<Environment>-<Processor><Target Features>``
+
+Where:
+
+ - ``<Architecture>``, ``<Vendor>``, ``<OS>`` and ``<Environment>``
+ are the same as the *Target Triple* (see
+ :ref:`amdgpu-target-triples`).
+
+ - ``<Processor>`` is the same as the *Processor* (see
+ :ref:`amdgpu-processors`).
+
+ - ``<Target Features>`` is a list of the enabled *Target Features*
+ (see :ref:`amdgpu-target-features`), each prefixed by a plus, that
+ apply to *Processor*. The list must be in the same order as listed
+ in the table :ref:`amdgpu-target-feature-table`. Note that *Target
+ Features* must be included in the list if they are enabled even if
+ that is the default for *Processor*.
+
+For example:
+
+ ``"amdgcn-amd-amdhsa--gfx902+xnack"``
+
+.. _amdgpu-amdhsa-code-object-metadata:
Code Object Metadata
~~~~~~~~~~~~~~~~~~~~
@@ -991,9 +1103,11 @@ non-AMD key names should be prefixed by "*vendor-name*.".
=================== ============== ========= ==============================
String Key Value Type Required? Description
=================== ============== ========= ==============================
- "ReqdWorkGroupSize" sequence of The dispatch work-group size
- 3 integers X, Y, Z must correspond to the
- specified values.
+ "ReqdWorkGroupSize" sequence of If not 0, 0, 0 then all values
+ 3 integers must be >=1 and the dispatch
+ work-group size X, Y, Z must
+ correspond to the specified
+ values. Defaults to 0, 0, 0.
Corresponds to the OpenCL
``reqd_work_group_size``
@@ -1286,19 +1400,9 @@ non-AMD key names should be prefixed by "*vendor-name*.".
supported by the
kernel in work-items.
Must be >=1 and
- consistent with any
- non-0 values in
- FixedWorkGroupSize.
- "FixedWorkGroupSize" sequence of Corresponds to the
- 3 integers dispatch work-group
- size X, Y, Z. If
- omitted, defaults to
- 0, 0, 0. If an
- element is non-0 then
- the kernel must only
- be launched with a
- matching corresponding
- work-group size.
+ consistent with
+ ReqdWorkGroupSize if
+ not 0, 0, 0.
"NumSpilledSGPRs" integer Number of stores from
a scalar register to
a register allocator
@@ -1363,7 +1467,7 @@ CPU host program, or from an HSA kernel executing on a GPU.
such as grid and work-group size, together with information from the code
object about the kernel, such as segment sizes. The ROCm runtime queries on
the kernel symbol can be used to obtain the code object values which are
- recorded in the :ref:`amdgpu-amdhsa-hsa-code-object-metadata`.
+ recorded in the :ref:`amdgpu-amdhsa-code-object-metadata`.
7. CP executes micro-code and is responsible for detecting and setting up the
GPU to execute the wavefronts of a kernel dispatch.
8. CP ensures that when the a wavefront starts executing the kernel machine
@@ -1430,7 +1534,7 @@ address to physical address is:
There are different ways that the wavefront scratch base address is determined
by a wavefront (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). This
memory can be accessed in an interleaved manner using buffer instruction with
-the scratch buffer descriptor and per wave scratch offset, by the scratch
+the scratch buffer descriptor and per wavefront scratch offset, by the scratch
instructions, or by flat instructions. If each lane of a wavefront accesses the
same private address, the interleaving results in adjacent dwords being accessed
and hence requires fewer cache lines to be fetched. Multi-dword access is not
@@ -1497,7 +1601,8 @@ that implements the kernel.
Kernel Descriptor for GFX6-GFX9
+++++++++++++++++++++++++++++++
-CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
+CP microcode requires the Kernel descriptor to be allocated on 64 byte
+alignment.
.. table:: Kernel Descriptor for GFX6-GFX9
:name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table
@@ -1505,7 +1610,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
======= ======= =============================== ============================
Bits Size Field Name Description
======= ======= =============================== ============================
- 31:0 4 bytes GroupSegmentFixedSize The amount of fixed local
+ 31:0 4 bytes GROUP_SEGMENT_FIXED_SIZE The amount of fixed local
address space memory
required for a work-group
in bytes. This does not
@@ -1514,7 +1619,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
space memory that may be
added when the kernel is
dispatched.
- 63:32 4 bytes PrivateSegmentFixedSize The amount of fixed
+ 63:32 4 bytes PRIVATE_SEGMENT_FIXED_SIZE The amount of fixed
private address space
memory required for a
work-item in bytes. If
@@ -1523,54 +1628,31 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
be added to this value for
the call stack.
127:64 8 bytes Reserved, must be 0.
- 191:128 8 bytes KernelCodeEntryByteOffset Byte offset (possibly
+ 191:128 8 bytes KERNEL_CODE_ENTRY_BYTE_OFFSET Byte offset (possibly
negative) from base
address of kernel
descriptor to kernel's
entry point instruction
which must be 256 byte
aligned.
- 223:192 4 bytes MaxFlatWorkGroupSize Maximum flat work-group
- size supported by the
- kernel in work-items. If
- an exact work-group size
- is required then must be
- omitted or 0 and
- ReqdWorkGroupSize* must
- be set to non-0.
- 239:224 2 bytes ReqdWorkGroupSizeX If present and non-0 then
- the kernel
- must be executed with the
- specified work-group size
- for X.
- 255:240 2 bytes ReqdWorkGroupSizeY If present and non-0 then
- the kernel
- must be executed with the
- specified work-group size
- for Y.
- 271:256 2 bytes ReqdWorkGroupSizeZ If present and non-0 then
- the kernel
- must be executed with the
- specified work-group size
- for Z.
- 383:272 14 Reserved, must be 0.
+ 383:192 24 Reserved, must be 0.
bytes
- 415:384 4 bytes ComputePgmRsrc1 Compute Shader (CS)
+ 415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC1``
configuration
register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
- 447:416 4 bytes ComputePgmRsrc2 Compute Shader (CS)
+ 447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC2``
configuration
register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
- 448 1 bit EnableSGPRPrivateSegmentBuffer Enable the setup of the
- SGPR user data registers
+ 448 1 bit ENABLE_SGPR_PRIVATE_SEGMENT Enable the setup of the
+ _BUFFER SGPR user data registers
(see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
@@ -1581,21 +1663,15 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``.
Any requests beyond 16
will be ignored.
- 449 1 bit EnableSGPRDispatchPtr *see above*
- 450 1 bit EnableSGPRQueuePtr *see above*
- 451 1 bit EnableSGPRKernargSegmentPtr *see above*
- 452 1 bit EnableSGPRDispatchID *see above*
- 453 1 bit EnableSGPRFlatScratchInit *see above*
- 454 1 bit EnableSGPRPrivateSegmentSize *see above*
- 455 1 bit EnableSGPRGridWorkgroupCountX Not implemented in CP and
- should always be 0.
- 456 1 bit EnableSGPRGridWorkgroupCountY Not implemented in CP and
- should always be 0.
- 457 1 bit EnableSGPRGridWorkgroupCountZ Not implemented in CP and
- should always be 0.
- 463:458 6 bits Reserved, must be 0.
- 511:464 6 Reserved, must be 0.
- bytes
+ 449 1 bit ENABLE_SGPR_DISPATCH_PTR *see above*
+ 450 1 bit ENABLE_SGPR_QUEUE_PTR *see above*
+ 451 1 bit ENABLE_SGPR_KERNARG_SEGMENT_PTR *see above*
+ 452 1 bit ENABLE_SGPR_DISPATCH_ID *see above*
+ 453 1 bit ENABLE_SGPR_FLAT_SCRATCH_INIT *see above*
+ 454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT *see above*
+ _SIZE
+ 455 1 bit Reserved, must be 0.
+ 511:456 8 bytes Reserved, must be 0.
512 **Total size 64 bytes.**
======= ====================================================================
@@ -1607,42 +1683,86 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
======= ======= =============================== ===========================================================================
- 5:0 6 bits GRANULATED_WORKITEM_VGPR_COUNT Number of vector registers
- used by each work-item,
+ 5:0 6 bits GRANULATED_WORKITEM_VGPR_COUNT Number of vector register
+ blocks used by each work-item;
granularity is device
specific:
GFX6-GFX9
- - max_vgpr 1..256
- - roundup((max_vgpg + 1)
- / 4) - 1
+ - vgprs_used 0..256
+ - max(0, ceil(vgprs_used / 4) - 1)
+
+ Where vgprs_used is defined
+ as the highest VGPR number
+ explicitly referenced plus
+ one.
Used by CP to set up
``COMPUTE_PGM_RSRC1.VGPRS``.
- 9:6 4 bits GRANULATED_WAVEFRONT_SGPR_COUNT Number of scalar registers
- used by a wavefront,
+
+ The
+ :ref:`amdgpu-assembler`
+ calculates this
+ automatically for the
+ selected processor from
+ values provided to the
+ `.amdhsa_kernel` directive
+ by the
+ `.amdhsa_next_free_vgpr`
+ nested directive (see
+ :ref:`amdhsa-kernel-directives-table`).
+ 9:6 4 bits GRANULATED_WAVEFRONT_SGPR_COUNT Number of scalar register
+ blocks used by a wavefront;
granularity is device
specific:
GFX6-GFX8
- - max_sgpr 1..112
- - roundup((max_sgpg + 1)
- / 8) - 1
+ - sgprs_used 0..112
+ - max(0, ceil(sgprs_used / 8) - 1)
GFX9
- - max_sgpr 1..112
- - roundup((max_sgpg + 1)
- / 16) - 1
-
- Includes the special SGPRs
- for VCC, Flat Scratch (for
- GFX7 onwards) and XNACK
- (for GFX8 onwards). It does
- not include the 16 SGPR
- added if a trap handler is
+ - sgprs_used 0..112
+ - 2 * max(0, ceil(sgprs_used / 16) - 1)
+
+ Where sgprs_used is
+ defined as the highest
+ SGPR number explicitly
+ referenced plus one, plus
+ a target-specific number
+ of additional special
+ SGPRs for VCC,
+ FLAT_SCRATCH (GFX7+) and
+ XNACK_MASK (GFX8+), and
+ any additional
+ target-specific
+ limitations. It does not
+ include the 16 SGPRs added
+ if a trap handler is
enabled.
+ The target-specific
+ limitations and special
+ SGPR layout are defined in
+ the hardware
+ documentation, which can
+ be found in the
+ :ref:`amdgpu-processors`
+ table.
+
Used by CP to set up
``COMPUTE_PGM_RSRC1.SGPRS``.
+
+ The
+ :ref:`amdgpu-assembler`
+ calculates this
+ automatically for the
+ selected processor from
+ values provided to the
+ `.amdhsa_kernel` directive
+ by the
+ `.amdhsa_next_free_sgpr`
+ and `.amdhsa_reserve_*`
+ nested directives (see
+ :ref:`amdhsa-kernel-directives-table`).
11:10 2 bits PRIORITY Must be 0.
Start executing wavefront
@@ -1794,7 +1914,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
Bits Size Field Name Description
======= ======= =============================== ===========================================================================
0 1 bit ENABLE_SGPR_PRIVATE_SEGMENT Enable the setup of the
- _WAVE_OFFSET SGPR wave scratch offset
+ _WAVEFRONT_OFFSET SGPR wavefront scratch offset
system register (see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
@@ -1808,17 +1928,13 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
Used by CP to set up
``COMPUTE_PGM_RSRC2.USER_SGPR``.
- 6 1 bit ENABLE_TRAP_HANDLER Set to 1 if code contains a
- TRAP instruction which
- requires a trap handler to
- be enabled.
-
- CP sets
- ``COMPUTE_PGM_RSRC2.TRAP_PRESENT``
- if the runtime has
- installed a trap handler
- regardless of the setting
- of this field.
+ 6 1 bit ENABLE_TRAP_HANDLER Must be 0.
+
+ This bit represents
+ ``COMPUTE_PGM_RSRC2.TRAP_PRESENT``,
+ which is set by the CP if
+ the runtime has installed a
+ trap handler.
7 1 bit ENABLE_SGPR_WORKGROUP_ID_X Enable the setup of the
system SGPR register for
the work-group id in the X
@@ -1881,7 +1997,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
exceptions exceptions
enabled which are generated
when a memory violation has
- occurred for this wave from
+ occurred for this wavefront from
L1 or LDS
(write-to-read-only-memory,
mis-aligned atomic, LDS
@@ -1950,10 +2066,10 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
====================================== ===== ==============================
Enumeration Name Value Description
====================================== ===== ==============================
- AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
- AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
- AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
- AMDGPU_FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
+ FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
+ FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
+ FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
+ FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
====================================== ===== ==============================
..
@@ -1964,11 +2080,11 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
====================================== ===== ==============================
Enumeration Name Value Description
====================================== ===== ==============================
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
+ FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
Denorms
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
+ FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
+ FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
+ FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
====================================== ===== ==============================
..
@@ -1979,13 +2095,13 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
======================================== ===== ============================
Enumeration Name Value Description
======================================== ===== ============================
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
+ SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
ID.
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
+ SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
dimensions ID.
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
+ SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
dimensions ID.
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
+ SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
======================================== ===== ============================
.. _amdgpu-amdhsa-initial-kernel-execution-state:
@@ -2005,10 +2121,10 @@ SGPR0, the next enabled register is SGPR1 etc.; disabled registers do not have
an SGPR number.
The initial SGPRs comprise up to 16 User SRGPs that are set by CP and apply to
-all waves of the grid. It is possible to specify more than 16 User SGPRs using
+all wavefronts of the grid. It is possible to specify more than 16 User SGPRs using
the ``enable_sgpr_*`` bit fields, in which case only the first 16 are actually
initialized. These are then immediately followed by the System SGPRs that are
-set up by ADC/SPI and can have different values for each wave of the grid
+set up by ADC/SPI and can have different values for each wavefront of the grid
dispatch.
SGPR register initial state is defined in
@@ -2023,10 +2139,10 @@ SGPR register initial state is defined in
field) SGPRs
========== ========================== ====== ==============================
First Private Segment Buffer 4 V# that can be used, together
- (enable_sgpr_private with Scratch Wave Offset as an
- _segment_buffer) offset, to access the private
- memory space using a segment
- address.
+ (enable_sgpr_private with Scratch Wavefront Offset
+ _segment_buffer) as an offset, to access the
+ private memory space using a
+ segment address.
CP uses the value provided by
the runtime.
@@ -2066,7 +2182,7 @@ SGPR register initial state is defined in
address is
``SH_HIDDEN_PRIVATE_BASE_VIMID``
plus this offset.) The value
- of Scratch Wave Offset must
+ of Scratch Wavefront Offset must
be added to this offset by
the kernel machine code,
right shifted by 8, and
@@ -2076,13 +2192,13 @@ SGPR register initial state is defined in
to SGPRn-4 on GFX7, and
SGPRn-6 on GFX8 (where SGPRn
is the highest numbered SGPR
- allocated to the wave).
+ allocated to the wavefront).
FLAT_SCRATCH_HI is
multiplied by 256 (as it is
in units of 256 bytes) and
added to
``SH_HIDDEN_PRIVATE_BASE_VIMID``
- to calculate the per wave
+ to calculate the per wavefront
FLAT SCRATCH BASE in flat
memory instructions that
access the scratch
@@ -2122,7 +2238,7 @@ SGPR register initial state is defined in
divides it if there are
multiple Shader Arrays each
with its own SPI). The value
- of Scratch Wave Offset must
+ of Scratch Wavefront Offset must
be added by the kernel
machine code and the result
moved to the FLAT_SCRATCH
@@ -2191,12 +2307,12 @@ SGPR register initial state is defined in
then Work-Group Id Z 1 32 bit work-group id in Z
(enable_sgpr_workgroup_id dimension of grid for
_Z) wavefront.
- then Work-Group Info 1 {first_wave, 14'b0000,
+ then Work-Group Info 1 {first_wavefront, 14'b0000,
(enable_sgpr_workgroup ordered_append_term[10:0],
- _info) threadgroup_size_in_waves[5:0]}
- then Scratch Wave Offset 1 32 bit byte offset from base
+ _info) threadgroup_size_in_wavefronts[5:0]}
+ then Scratch Wavefront Offset 1 32 bit byte offset from base
(enable_sgpr_private of scratch base of queue
- _segment_wave_offset) executing the kernel
+ _segment_wavefront_offset) executing the kernel
dispatch. Must be used as an
offset with Private
segment address when using
@@ -2236,14 +2352,14 @@ VGPR register initial state is defined in
> 1) wavefront lane.
========== ========================== ====== ==============================
-The setting of registers is is done by GPU CP/ADC/SPI hardware as follows:
+The setting of registers is done by GPU CP/ADC/SPI hardware as follows:
1. SGPRs before the Work-Group Ids are set by CP using the 16 User Data
registers.
2. Work-group Id registers X, Y, Z are set by ADC which supports any
combination including none.
-3. Scratch Wave Offset is set by SPI in a per wave basis which is why its value
- cannot included with the flat scratch init value which is per queue.
+3. Scratch Wavefront Offset is set by SPI in a per wavefront basis which is why
+ its value cannot included with the flat scratch init value which is per queue.
4. The VGPRs are set by SPI which only supports specifying either (X), (X, Y)
or (X, Y, Z).
@@ -2291,7 +2407,7 @@ Flat Scratch
If the kernel may use flat operations to access scratch memory, the prolog code
must set up FLAT_SCRATCH register pair (FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which
-are in SGPRn-4/SGPRn-3). Initialization uses Flat Scratch Init and Scratch Wave
+are in SGPRn-4/SGPRn-3). Initialization uses Flat Scratch Init and Scratch Wavefront
Offset SGPR registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
GFX6
@@ -2302,7 +2418,7 @@ GFX7-GFX8
``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
being managed by SPI for the queue executing the kernel dispatch. This is
the same value used in the Scratch Segment Buffer V# base address. The
- prolog must add the value of Scratch Wave Offset to get the wave's byte
+ prolog must add the value of Scratch Wavefront Offset to get the wavefront's byte
scratch backing memory offset from ``SH_HIDDEN_PRIVATE_BASE_VIMID``. Since
FLAT_SCRATCH_LO is in units of 256 bytes, the offset must be right shifted
by 8 before moving into FLAT_SCRATCH_LO.
@@ -2316,7 +2432,7 @@ GFX7-GFX8
GFX9
The Flat Scratch Init is the 64 bit address of the base of scratch backing
memory being managed by SPI for the queue executing the kernel dispatch. The
- prolog must add the value of Scratch Wave Offset and moved to the FLAT_SCRATCH
+ prolog must add the value of Scratch Wavefront Offset and moved to the FLAT_SCRATCH
pair for use as the flat scratch base in flat memory instructions.
.. _amdgpu-amdhsa-memory-model:
@@ -2382,12 +2498,12 @@ For GFX6-GFX9:
global order and involve no caching. Completion is reported to a wavefront in
execution order.
* The LDS memory has multiple request queues shared by the SIMDs of a
- CU. Therefore, the LDS operations performed by different waves of a work-group
+ CU. Therefore, the LDS operations performed by different wavefronts of a work-group
can be reordered relative to each other, which can result in reordering the
visibility of vector memory operations with respect to LDS operations of other
wavefronts in the same work-group. A ``s_waitcnt lgkmcnt(0)`` is required to
ensure synchronization between LDS operations and vector memory operations
- between waves of a work-group, but not between operations performed by the
+ between wavefronts of a work-group, but not between operations performed by the
same wavefront.
* The vector memory operations are performed as wavefront wide operations and
completion is reported to a wavefront in execution order. The exception is
@@ -2397,7 +2513,7 @@ For GFX6-GFX9:
* The vector memory operations access a single vector L1 cache shared by all
SIMDs a CU. Therefore, no special action is required for coherence between the
lanes of a single wavefront, or for coherence between wavefronts in the same
- work-group. A ``buffer_wbinvl1_vol`` is required for coherence between waves
+ work-group. A ``buffer_wbinvl1_vol`` is required for coherence between wavefronts
executing in different work-groups as they may be executing on different CUs.
* The scalar memory operations access a scalar L1 cache shared by all wavefronts
on a group of CUs. The scalar and vector L1 caches are not coherent. However,
@@ -2408,7 +2524,7 @@ For GFX6-GFX9:
* The L2 cache has independent channels to service disjoint ranges of virtual
addresses.
* Each CU has a separate request queue per channel. Therefore, the vector and
- scalar memory operations performed by waves executing in different work-groups
+ scalar memory operations performed by wavefronts executing in different work-groups
(which may be executing on different CUs) of an agent can be reordered
relative to each other. A ``s_waitcnt vmcnt(0)`` is required to ensure
synchronization between vector memory operations of different CUs. It ensures a
@@ -2458,7 +2574,7 @@ case the AMDGPU backend ensures the memory location used to spill is never
accessed by vector memory operations at the same time. If scalar writes are used
then a ``s_dcache_wb`` is inserted before the ``s_endpgm`` and before a function
return since the locations may be used for vector memory instructions by a
-future wave that uses the same scratch area, or a function call that creates a
+future wavefront that uses the same scratch area, or a function call that creates a
frame at the same address, respectively. There is no need for a ``s_dcache_inv``
as all scalar writes are write-before-read in the same thread.
@@ -3738,16 +3854,154 @@ the ``s_trap`` instruction with the following usage:
``queue_ptr`` terminated and its
associated queue put
into the error state.
- ``llvm.debugtrap`` ``s_trap 0x03`` ``SGPR0-1``: If debugger not
- ``queue_ptr`` installed handled
- same as ``llvm.trap``.
- debugger breakpoint ``s_trap 0x07`` Reserved for debugger
+ ``llvm.debugtrap`` ``s_trap 0x03`` - If debugger not
+ installed then
+ behaves as a
+ no-operation. The
+ trap handler is
+ entered and
+ immediately returns
+ to continue
+ execution of the
+ wavefront.
+ - If the debugger is
+ installed, causes
+ the debug trap to be
+ reported by the
+ debugger and the
+ wavefront is put in
+ the halt state until
+ resumed by the
+ debugger.
+ reserved ``s_trap 0x04`` Reserved.
+ reserved ``s_trap 0x05`` Reserved.
+ reserved ``s_trap 0x06`` Reserved.
+ debugger breakpoint ``s_trap 0x07`` Reserved for debugger
breakpoints.
- debugger ``s_trap 0x08`` Reserved for debugger.
- debugger ``s_trap 0xfe`` Reserved for debugger.
- debugger ``s_trap 0xff`` Reserved for debugger.
+ reserved ``s_trap 0x08`` Reserved.
+ reserved ``s_trap 0xfe`` Reserved.
+ reserved ``s_trap 0xff`` Reserved.
=================== =============== =============== =======================
+AMDPAL
+------
+
+This section provides code conventions used when the target triple OS is
+``amdpal`` (see :ref:`amdgpu-target-triples`) for passing runtime parameters
+from the application/runtime to each invocation of a hardware shader. These
+parameters include both generic, application-controlled parameters called
+*user data* as well as system-generated parameters that are a product of the
+draw or dispatch execution.
+
+User Data
+~~~~~~~~~
+
+Each hardware stage has a set of 32-bit *user data registers* which can be
+written from a command buffer and then loaded into SGPRs when waves are launched
+via a subsequent dispatch or draw operation. This is the way most arguments are
+passed from the application/runtime to a hardware shader.
+
+Compute User Data
+~~~~~~~~~~~~~~~~~
+
+Compute shader user data mappings are simpler than graphics shaders, and have a
+fixed mapping.
+
+Note that there are always 10 available *user data entries* in registers -
+entries beyond that limit must be fetched from memory (via the spill table
+pointer) by the shader.
+
+ .. table:: PAL Compute Shader User Data Registers
+ :name: pal-compute-user-data-registers
+
+ ============= ================================
+ User Register Description
+ ============= ================================
+ 0 Global Internal Table (32-bit pointer)
+ 1 Per-Shader Internal Table (32-bit pointer)
+ 2 - 11 Application-Controlled User Data (10 32-bit values)
+ 12 Spill Table (32-bit pointer)
+ 13 - 14 Thread Group Count (64-bit pointer)
+ 15 GDS Range
+ ============= ================================
+
+Graphics User Data
+~~~~~~~~~~~~~~~~~~
+
+Graphics pipelines support a much more flexible user data mapping:
+
+ .. table:: PAL Graphics Shader User Data Registers
+ :name: pal-graphics-user-data-registers
+
+ ============= ================================
+ User Register Description
+ ============= ================================
+ 0 Global Internal Table (32-bit pointer)
+ + Per-Shader Internal Table (32-bit pointer)
+ + 1-15 Application Controlled User Data
+ (1-15 Contiguous 32-bit Values in Registers)
+ + Spill Table (32-bit pointer)
+ + Draw Index (First Stage Only)
+ + Vertex Offset (First Stage Only)
+ + Instance Offset (First Stage Only)
+ ============= ================================
+
+ The placement of the global internal table remains fixed in the first *user
+ data SGPR register*. Otherwise all parameters are optional, and can be mapped
+ to any desired *user data SGPR register*, with the following regstrictions:
+
+ * Draw Index, Vertex Offset, and Instance Offset can only be used by the first
+ activehardware stage in a graphics pipeline (i.e. where the API vertex
+ shader runs).
+
+ * Application-controlled user data must be mapped into a contiguous range of
+ user data registers.
+
+ * The application-controlled user data range supports compaction remapping, so
+ only *entries* that are actually consumed by the shader must be assigned to
+ corresponding *registers*. Note that in order to support an efficient runtime
+ implementation, the remapping must pack *registers* in the same order as
+ *entries*, with unused *entries* removed.
+
+.. _pal_global_internal_table:
+
+Global Internal Table
+~~~~~~~~~~~~~~~~~~~~~
+
+The global internal table is a table of *shader resource descriptors* (SRDs) that
+define how certain engine-wide, runtime-managed resources should be accessed
+from a shader. The majority of these resources have HW-defined formats, and it
+is up to the compiler to write/read data as required by the target hardware.
+
+The following table illustrates the required format:
+
+ .. table:: PAL Global Internal Table
+ :name: pal-git-table
+
+ ============= ================================
+ Offset Description
+ ============= ================================
+ 0-3 Graphics Scratch SRD
+ 4-7 Compute Scratch SRD
+ 8-11 ES/GS Ring Output SRD
+ 12-15 ES/GS Ring Input SRD
+ 16-19 GS/VS Ring Output #0
+ 20-23 GS/VS Ring Output #1
+ 24-27 GS/VS Ring Output #2
+ 28-31 GS/VS Ring Output #3
+ 32-35 GS/VS Ring Input SRD
+ 36-39 Tessellation Factor Buffer SRD
+ 40-43 Off-Chip LDS Buffer SRD
+ 44-47 Off-Chip Param Cache Buffer SRD
+ 48-51 Sample Position Buffer SRD
+ 52 vaRange::ShadowDescriptorTable High Bits
+ ============= ================================
+
+ The pointer to the global internal table passed to the shader as user data
+ is a 32-bit pointer. The top 32 bits should be assumed to be the same as
+ the top 32 bits of the pipeline, so the shader may use the program
+ counter's top 32 bits.
+
Unspecified OS
--------------
@@ -3780,34 +4034,42 @@ Source Languages
OpenCL
------
-When generating code for the OpenCL language the target triple environment
-should be ``opencl`` or ``amdgizcl`` (see :ref:`amdgpu-target-triples`).
-
When the language is OpenCL the following differences occur:
1. The OpenCL memory model is used (see :ref:`amdgpu-amdhsa-memory-model`).
-2. The AMDGPU backend adds additional arguments to the kernel.
+2. The AMDGPU backend appends additional arguments to the kernel's explicit
+ arguments for the AMDHSA OS (see
+ :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
3. Additional metadata is generated
- (:ref:`amdgpu-amdhsa-hsa-code-object-metadata`).
-
-.. TODO
- Specify what affect this has. Hidden arguments added. Additional metadata
- generated.
+ (see :ref:`amdgpu-amdhsa-code-object-metadata`).
+
+ .. table:: OpenCL kernel implicit arguments appended for AMDHSA OS
+ :name: opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table
+
+ ======== ==== ========= ===========================================
+ Position Byte Byte Description
+ Size Alignment
+ ======== ==== ========= ===========================================
+ 1 8 8 OpenCL Global Offset X
+ 2 8 8 OpenCL Global Offset Y
+ 3 8 8 OpenCL Global Offset Z
+ 4 8 8 OpenCL address of printf buffer
+ 5 8 8 OpenCL address of virtual queue used by
+ enqueue_kernel.
+ 6 8 8 OpenCL address of AqlWrap struct used by
+ enqueue_kernel.
+ ======== ==== ========= ===========================================
.. _amdgpu-hcc:
HCC
---
-When generating code for the OpenCL language the target triple environment
-should be ``hcc`` (see :ref:`amdgpu-target-triples`).
-
-When the language is OpenCL the following differences occur:
+When the language is HCC the following differences occur:
1. The HSA memory model is used (see :ref:`amdgpu-amdhsa-memory-model`).
-.. TODO
- Specify what affect this has.
+.. _amdgpu-assembler:
Assembler
---------
@@ -3815,15 +4077,35 @@ Assembler
AMDGPU backend has LLVM-MC based assembler which is currently in development.
It supports AMDGCN GFX6-GFX9.
-This section describes general syntax for instructions and operands. For more
-information about instructions, their semantics and supported combinations of
-operands, refer to one of instruction set architecture manuals
-[AMD-GCN-GFX6]_, [AMD-GCN-GFX7]_, [AMD-GCN-GFX8]_ and [AMD-GCN-GFX9]_.
+This section describes general syntax for instructions and operands.
-An instruction has the following syntax (register operands are normally
-comma-separated while extra operands are space-separated):
+Instructions
+~~~~~~~~~~~~
+
+.. toctree::
+ :hidden:
+
+ AMDGPUAsmGFX7
+ AMDGPUAsmGFX8
+ AMDGPUAsmGFX9
+ AMDGPUOperandSyntax
+
+An instruction has the following syntax:
+
+ *<opcode> <operand0>, <operand1>,... <modifier0> <modifier1>...*
+
+Note that operands are normally comma-separated while modifiers are space-separated.
-*<opcode> <register_operand0>, ... <extra_operand0> ...*
+The order of operands and modifiers is fixed. Most modifiers are optional and may be omitted.
+
+See detailed instruction syntax description for :doc:`GFX7<AMDGPUAsmGFX7>`,
+:doc:`GFX8<AMDGPUAsmGFX8>` and :doc:`GFX9<AMDGPUAsmGFX9>`.
+
+Note that features under development are not included in this description.
+
+For more information about instructions, their semantics and supported combinations of
+operands, refer to one of instruction set architecture manuals
+[AMD-GCN-GFX6]_, [AMD-GCN-GFX7]_, [AMD-GCN-GFX8]_ and [AMD-GCN-GFX9]_.
Operands
~~~~~~~~
@@ -3840,34 +4122,16 @@ The following syntax for register operands is supported:
* Register index expressions: v[2*2], s[1-1:2-1]
* 'off' indicates that an operand is not enabled
-The following extra operands are supported:
-
-* offset, offset0, offset1
-* idxen, offen bits
-* glc, slc, tfe bits
-* waitcnt: integer or combination of counter values
-* VOP3 modifiers:
+Modifiers
+~~~~~~~~~
- - abs (\| \|), neg (\-)
-
-* DPP modifiers:
-
- - row_shl, row_shr, row_ror, row_rol
- - row_mirror, row_half_mirror, row_bcast
- - wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
- - row_mask, bank_mask, bound_ctrl
-
-* SDWA modifiers:
-
- - dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
- - dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
- - abs, neg, sext
+Detailed description of modifiers may be found :doc:`here<AMDGPUOperandSyntax>`.
Instruction Examples
~~~~~~~~~~~~~~~~~~~~
DS
-~~
+++
.. code-block:: nasm
@@ -4039,6 +4303,9 @@ VOP_SDWA examples:
For full list of supported instructions, refer to "Vector ALU instructions".
+.. TODO
+ Remove once we switch to code object v3 by default.
+
HSA Code Object Directives
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -4131,6 +4398,209 @@ Here is an example of a minimal amd_kernel_code_t specification:
.Lfunc_end0:
.size hello_world, .Lfunc_end0-hello_world
+Predefined Symbols (-mattr=+code-object-v3)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The AMDGPU assembler defines and updates some symbols automatically. These
+symbols do not affect code generation.
+
+.amdgcn.gfx_generation_number
++++++++++++++++++++++++++++++
+
+Set to the GFX generation number of the target being assembled for. For
+example, when assembling for a "GFX9" target this will be set to the integer
+value "9". The possible GFX generation numbers are presented in
+:ref:`amdgpu-processors`.
+
+.amdgcn.next_free_vgpr
+++++++++++++++++++++++
+
+Set to zero before assembly begins. At each instruction, if the current value
+of this symbol is less than or equal to the maximum VGPR number explicitly
+referenced within that instruction then the symbol value is updated to equal
+that VGPR number plus one.
+
+May be used to set the `.amdhsa_next_free_vpgr` directive in
+:ref:`amdhsa-kernel-directives-table`.
+
+May be set at any time, e.g. manually set to zero at the start of each kernel.
+
+.amdgcn.next_free_sgpr
+++++++++++++++++++++++
+
+Set to zero before assembly begins. At each instruction, if the current value
+of this symbol is less than or equal the maximum SGPR number explicitly
+referenced within that instruction then the symbol value is updated to equal
+that SGPR number plus one.
+
+May be used to set the `.amdhsa_next_free_spgr` directive in
+:ref:`amdhsa-kernel-directives-table`.
+
+May be set at any time, e.g. manually set to zero at the start of each kernel.
+
+Code Object Directives (-mattr=+code-object-v3)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Directives which begin with ``.amdgcn`` are valid for all ``amdgcn``
+architecture processors, and are not OS-specific. Directives which begin with
+``.amdhsa`` are specific to ``amdgcn`` architecture processors when the
+``amdhsa`` OS is specified. See :ref:`amdgpu-target-triples` and
+:ref:`amdgpu-processors`.
+
+.amdgcn_target <target>
++++++++++++++++++++++++
+
+Optional directive which declares the target supported by the containing
+assembler source file. Valid values are described in
+:ref:`amdgpu-amdhsa-code-object-target-identification`. Used by the assembler
+to validate command-line options such as ``-triple``, ``-mcpu``, and those
+which specify target features.
+
+.amdhsa_kernel <name>
++++++++++++++++++++++
+
+Creates a correctly aligned AMDHSA kernel descriptor and a symbol,
+``<name>.kd``, in the current location of the current section. Only valid when
+the OS is ``amdhsa``. ``<name>`` must be a symbol that labels the first
+instruction to execute, and does not need to be previously defined.
+
+Marks the beginning of a list of directives used to generate the bytes of a
+kernel descriptor, as described in :ref:`amdgpu-amdhsa-kernel-descriptor`.
+Directives which may appear in this list are described in
+:ref:`amdhsa-kernel-directives-table`. Directives may appear in any order, must
+be valid for the target being assembled for, and cannot be repeated. Directives
+support the range of values specified by the field they reference in
+:ref:`amdgpu-amdhsa-kernel-descriptor`. If a directive is not specified, it is
+assumed to have its default value, unless it is marked as "Required", in which
+case it is an error to omit the directive. This list of directives is
+terminated by an ``.end_amdhsa_kernel`` directive.
+
+ .. table:: AMDHSA Kernel Assembler Directives
+ :name: amdhsa-kernel-directives-table
+
+ ======================================================== ================ ============ ===================
+ Directive Default Supported On Description
+ ======================================================== ================ ============ ===================
+ ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX9 Controls GROUP_SEGMENT_FIXED_SIZE in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX9 Controls PRIVATE_SEGMENT_FIXED_SIZE in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX9 Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX9 Controls ENABLE_SGPR_DISPATCH_PTR in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX9 Controls ENABLE_SGPR_QUEUE_PTR in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX9 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX9 Controls ENABLE_SGPR_DISPATCH_ID in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX9 Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX9 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
+ :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
+ ``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX9 Controls ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX9 Controls ENABLE_SGPR_WORKGROUP_ID_X in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX9 Controls ENABLE_SGPR_WORKGROUP_ID_Y in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX9 Controls ENABLE_SGPR_WORKGROUP_ID_Z in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX9 Controls ENABLE_SGPR_WORKGROUP_INFO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX9 Controls ENABLE_VGPR_WORKITEM_ID in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ Possible values are defined in
+ :ref:`amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table`.
+ ``.amdhsa_next_free_vgpr`` Required GFX6-GFX9 Maximum VGPR number explicitly referenced, plus one.
+ Used to calculate GRANULATED_WORKITEM_VGPR_COUNT in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ ``.amdhsa_next_free_sgpr`` Required GFX6-GFX9 Maximum SGPR number explicitly referenced, plus one.
+ Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ ``.amdhsa_reserve_vcc`` 1 GFX6-GFX9 Whether the kernel may use the special VCC SGPR.
+ Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ ``.amdhsa_reserve_flat_scratch`` 1 GFX7-GFX9 Whether the kernel may use flat instructions to access
+ scratch memory. Used to calculate
+ GRANULATED_WAVEFRONT_SGPR_COUNT in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ ``.amdhsa_reserve_xnack_mask`` Target GFX8-GFX9 Whether the kernel may trigger XNACK replay.
+ Feature Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
+ Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ (+xnack)
+ ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX9 Controls FLOAT_ROUND_MODE_32 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ Possible values are defined in
+ :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
+ ``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX9 Controls FLOAT_ROUND_MODE_16_64 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ Possible values are defined in
+ :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
+ ``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX9 Controls FLOAT_DENORM_MODE_32 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ Possible values are defined in
+ :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
+ ``.amdhsa_float_denorm_mode_16_64`` 3 GFX6-GFX9 Controls FLOAT_DENORM_MODE_16_64 in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ Possible values are defined in
+ :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
+ ``.amdhsa_dx10_clamp`` 1 GFX6-GFX9 Controls ENABLE_DX10_CLAMP in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ ``.amdhsa_ieee_mode`` 1 GFX6-GFX9 Controls ENABLE_IEEE_MODE in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ ``.amdhsa_fp16_overflow`` 0 GFX9 Controls FP16_OVFL in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+ ``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_exception_fp_ieee_div_zero`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_exception_fp_ieee_overflow`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_exception_fp_ieee_underflow`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_exception_fp_ieee_inexact`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+ ======================================================== ================ ============ ===================
+
+Example HSA Source Code (-mattr=+code-object-v3)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here is an example of a minimal assembly source file, defining one HSA kernel:
+
+.. code-block:: nasm
+
+ .amdgcn_target "amdgcn-amd-amdhsa--gfx900+xnack" // optional
+
+ .text
+ .globl hello_world
+ .p2align 8
+ .type hello_world,@function
+ hello_world:
+ s_load_dwordx2 s[0:1], s[0:1] 0x0
+ v_mov_b32 v0, 3.14159
+ s_waitcnt lgkmcnt(0)
+ v_mov_b32 v1, s0
+ v_mov_b32 v2, s1
+ flat_store_dword v[1:2], v0
+ s_endpgm
+ .Lfunc_end0:
+ .size hello_world, .Lfunc_end0-hello_world
+
+ .rodata
+ .p2align 6
+ .amdhsa_kernel hello_world
+ .amdhsa_user_sgpr_kernarg_segment_ptr 1
+ .amdhsa_next_free_vgpr .amdgcn.next_free_vgpr
+ .amdhsa_next_free_sgpr .amdgcn.next_free_sgpr
+ .end_amdhsa_kernel
+
+
Additional Documentation
========================
@@ -4142,8 +4612,6 @@ Additional Documentation
.. [AMD-GCN-GFX7] `AMD Sea Islands Series ISA <http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf>`_
.. [AMD-GCN-GFX8] `AMD GCN3 Instruction Set Architecture <http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf>`__
.. [AMD-GCN-GFX9] `AMD "Vega" Instruction Set Architecture <http://developer.amd.com/wordpress/media/2013/12/Vega_Shader_ISA_28July2017.pdf>`__
-.. [AMD-OpenCL_Programming-Guide] `AMD Accelerated Parallel Processing OpenCL Programming Guide <http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf>`_
-.. [AMD-APP-SDK] `AMD Accelerated Parallel Processing APP SDK Documentation <http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/documentation/>`__
.. [AMD-ROCm] `ROCm: Open Platform for Development, Discovery and Education Around GPU Computing <http://gpuopen.com/compute-product/rocm/>`__
.. [AMD-ROCm-github] `ROCm github <http://github.com/RadeonOpenCompute>`__
.. [HSA] `Heterogeneous System Architecture (HSA) Foundation <http://www.hsafoundation.com/>`__
@@ -4152,4 +4620,4 @@ Additional Documentation
.. [YAML] `YAML Ain't Markup Language (YAMLâ„¢) Version 1.2 <http://www.yaml.org/spec/1.2/spec.html>`__
.. [OpenCL] `The OpenCL Specification Version 2.0 <http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf>`__
.. [HRF] `Heterogeneous-race-free Memory Models <http://benedictgaster.org/wp-content/uploads/2014/01/asplos269-FINAL.pdf>`__
-.. [AMD-AMDGPU-Compute-Application-Binary-Interface] `AMDGPU Compute Application Binary Interface <https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc/blob/master/AMDGPU-ABI.md>`__
+.. [CLANG-ATTR] `Attributes in Clang <http://clang.llvm.org/docs/AttributeReference.html>`__
diff --git a/docs/AdvancedBuilds.rst b/docs/AdvancedBuilds.rst
index dc808a0ab83f..c559bdeb2802 100644
--- a/docs/AdvancedBuilds.rst
+++ b/docs/AdvancedBuilds.rst
@@ -151,7 +151,7 @@ The PGO came cache generates the following additional targets:
=======================
In the ancient lore of compilers non-determinism is like the multi-headed hydra.
-Whenever it's head pops up, terror and chaos ensue.
+Whenever its head pops up, terror and chaos ensue.
Historically one of the tests to verify that a compiler was deterministic would
be a three stage build. The idea of a three stage build is you take your sources
diff --git a/docs/AliasAnalysis.rst b/docs/AliasAnalysis.rst
index 0a5cb00a48d3..14decfeca6e7 100644
--- a/docs/AliasAnalysis.rst
+++ b/docs/AliasAnalysis.rst
@@ -389,11 +389,6 @@ in its ``getAnalysisUsage`` that it does so. Some passes attempt to use
``AU.addPreserved<AliasAnalysis>``, however this doesn't actually have any
effect.
-``AliasAnalysisCounter`` (``-count-aa``) are implemented as ``ModulePass``
-classes, so if your alias analysis uses ``FunctionPass``, it won't be able to
-use these utilities. If you try to use them, the pass manager will silently
-route alias analysis queries directly to ``BasicAliasAnalysis`` instead.
-
Similarly, the ``opt -p`` option introduces ``ModulePass`` passes between each
pass, which prevents the use of ``FunctionPass`` alias analysis passes.
@@ -408,17 +403,10 @@ before it appears in an alias query. However, popular clients such as ``GVN``
don't support this, and are known to trigger errors when run with the
``AliasAnalysisDebugger``.
-Due to several of the above limitations, the most obvious use for the
-``AliasAnalysisCounter`` utility, collecting stats on all alias queries in a
-compilation, doesn't work, even if the ``AliasAnalysis`` implementations don't
-use ``FunctionPass``. There's no way to set a default, much less a default
-sequence, and there's no way to preserve it.
-
The ``AliasSetTracker`` class (which is used by ``LICM``) makes a
-non-deterministic number of alias queries. This can cause stats collected by
-``AliasAnalysisCounter`` to have fluctuations among identical runs, for
-example. Another consequence is that debugging techniques involving pausing
-execution after a predetermined number of queries can be unreliable.
+non-deterministic number of alias queries. This can cause debugging techniques
+involving pausing execution after a predetermined number of queries to be
+unreliable.
Many alias queries can be reformulated in terms of other alias queries. When
multiple ``AliasAnalysis`` queries are chained together, it would make sense to
@@ -676,21 +664,6 @@ you're using the ``AliasSetTracker`` class. To use it, use something like:
% opt -ds-aa -print-alias-sets -disable-output
-The ``-count-aa`` pass
-^^^^^^^^^^^^^^^^^^^^^^
-
-The ``-count-aa`` pass is useful to see how many queries a particular pass is
-making and what responses are returned by the alias analysis. As an example:
-
-.. code-block:: bash
-
- % opt -basicaa -count-aa -ds-aa -count-aa -licm
-
-will print out how many queries (and what responses are returned) by the
-``-licm`` pass (of the ``-ds-aa`` pass) and how many queries are made of the
-``-basicaa`` pass by the ``-ds-aa`` pass. This can be useful when debugging a
-transformation or an alias analysis implementation.
-
The ``-aa-eval`` pass
^^^^^^^^^^^^^^^^^^^^^
diff --git a/docs/BitCodeFormat.rst b/docs/BitCodeFormat.rst
index 429c945e7120..5e1c5cacb439 100644
--- a/docs/BitCodeFormat.rst
+++ b/docs/BitCodeFormat.rst
@@ -62,10 +62,12 @@ understanding the encoding.
Magic Numbers
-------------
-The first two bytes of a bitcode file are 'BC' (``0x42``, ``0x43``). The second
-two bytes are an application-specific magic number. Generic bitcode tools can
-look at only the first two bytes to verify the file is bitcode, while
-application-specific programs will want to look at all four.
+The first four bytes of a bitstream are used as an application-specific magic
+number. Generic bitcode tools may look at the first four bytes to determine
+whether the stream is a known stream type. However, these tools should *not*
+determine whether a bitstream is valid based on its magic number alone. New
+application-specific bitstream formats are being developed all the time; tools
+should not reject them just because they have a hitherto unseen magic number.
.. _primitives:
@@ -496,12 +498,9 @@ LLVM IR Magic Number
The magic number for LLVM IR files is:
:raw-html:`<tt><blockquote>`
-[0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`]
+['B'\ :sub:`8`, 'C'\ :sub:`8`, 0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`]
:raw-html:`</blockquote></tt>`
-When combined with the bitcode magic number and viewed as bytes, this is
-``"BC 0xC0DE"``.
-
.. _Signed VBRs:
Signed VBRs
@@ -904,7 +903,7 @@ PARAMATTR_CODE_ENTRY Record
The ``ENTRY`` record (code 2) contains a variable number of values describing a
unique set of function parameter attributes. Each *attrgrp* value is used as a
-key with which to look up an entry in the the attribute group table described
+key with which to look up an entry in the attribute group table described
in the ``PARAMATTR_GROUP_BLOCK`` block.
.. _PARAMATTR_CODE_ENTRY_OLD:
@@ -1055,6 +1054,9 @@ The integer codes are mapped to well-known attributes as follows.
* code 53: ``speculatable``
* code 54: ``strictfp``
* code 55: ``sanitize_hwaddress``
+* code 56: ``nocf_check``
+* code 57: ``optforfuzzing``
+* code 58: ``shadowcallstack``
.. note::
The ``allocsize`` attribute has a special encoding for its arguments. Its two
diff --git a/docs/Bugpoint.rst b/docs/Bugpoint.rst
index 27732e0fffbd..f3bb54cffb54 100644
--- a/docs/Bugpoint.rst
+++ b/docs/Bugpoint.rst
@@ -198,14 +198,14 @@ desired ranges. For example:
static int calledCount = 0;
calledCount++;
- DEBUG(if (calledCount < 212) return false);
- DEBUG(if (calledCount > 217) return false);
- DEBUG(if (calledCount == 213) return false);
- DEBUG(if (calledCount == 214) return false);
- DEBUG(if (calledCount == 215) return false);
- DEBUG(if (calledCount == 216) return false);
- DEBUG(dbgs() << "visitXOR calledCount: " << calledCount << "\n");
- DEBUG(dbgs() << "I: "; I->dump());
+ LLVM_DEBUG(if (calledCount < 212) return false);
+ LLVM_DEBUG(if (calledCount > 217) return false);
+ LLVM_DEBUG(if (calledCount == 213) return false);
+ LLVM_DEBUG(if (calledCount == 214) return false);
+ LLVM_DEBUG(if (calledCount == 215) return false);
+ LLVM_DEBUG(if (calledCount == 216) return false);
+ LLVM_DEBUG(dbgs() << "visitXOR calledCount: " << calledCount << "\n");
+ LLVM_DEBUG(dbgs() << "I: "; I->dump());
could be added to ``visitXOR`` to limit ``visitXor`` to being applied only to
calls 212 and 217. This is from an actual test case and raises an important
diff --git a/docs/CFIVerify.rst b/docs/CFIVerify.rst
index 7424d01c90b5..64033472f573 100644
--- a/docs/CFIVerify.rst
+++ b/docs/CFIVerify.rst
@@ -25,8 +25,8 @@ This tool will be present as a part of the LLVM toolchain, and will reside in
the "/llvm/tools/llvm-cfi-verify" directory, relative to the LLVM trunk. It will
be tested in two methods:
-- Unit tests to validate code sections, present in "/llvm/unittests/llvm-cfi-
- verify".
+- Unit tests to validate code sections, present in
+ "/llvm/unittests/tools/llvm-cfi-verify".
- Integration tests, present in "/llvm/tools/clang/test/LLVMCFIVerify". These
integration tests are part of clang as part of a continuous integration
framework, ensuring updates to the compiler that reduce CFI coverage on
@@ -86,6 +86,8 @@ Only machine code sections that are marked as executable will be subject to this
analysis. Non-executable sections do not require analysis as any execution
present in these sections has already violated the control flow integrity.
-Suitable extensions may be made at a later date to include anaylsis for indirect
+Suitable extensions may be made at a later date to include analysis for indirect
control flow operations across DSO boundaries. Currently, these CFI features are
only experimental with an unstable ABI, making them unsuitable for analysis.
+
+The tool currently only supports the x86, x86_64, and AArch64 architectures.
diff --git a/docs/CMake.rst b/docs/CMake.rst
index 05edec64da33..cbcadc212498 100644
--- a/docs/CMake.rst
+++ b/docs/CMake.rst
@@ -12,8 +12,8 @@ Introduction
does not build the project, it generates the files needed by your build tool
(GNU make, Visual Studio, etc.) for building LLVM.
-If **you are a new contributor**, please start with the :doc:`GettingStarted`
-page. This page is geared for existing contributors moving from the
+If **you are a new contributor**, please start with the :doc:`GettingStarted`
+page. This page is geared for existing contributors moving from the
legacy configure/make system.
If you are really anxious about getting a functional LLVM build, go to the
@@ -370,6 +370,14 @@ LLVM-specific variables
**LLVM_USE_INTEL_JITEVENTS**:BOOL
Enable building support for Intel JIT Events API. Defaults to OFF.
+**LLVM_ENABLE_LIBPFM**:BOOL
+ Enable building with libpfm to support hardware counter measurements in LLVM
+ tools.
+ Defaults to ON.
+
+ **LLVM_USE_PERF**:BOOL
+ Enable building support for Perf (linux profiling tool) JIT support. Defaults to OFF.
+
**LLVM_ENABLE_ZLIB**:BOOL
Enable building with zlib to support compression/uncompression in LLVM tools.
Defaults to ON.
@@ -409,10 +417,10 @@ LLVM-specific variables
**LLVM_BUILD_DOCS**:BOOL
Adds all *enabled* documentation targets (i.e. Doxgyen and Sphinx targets) as
dependencies of the default build targets. This results in all of the (enabled)
- documentation targets being as part of a normal build. If the ``install``
- target is run then this also enables all built documentation targets to be
- installed. Defaults to OFF. To enable a particular documentation target, see
- see LLVM_ENABLE_SPHINX and LLVM_ENABLE_DOXYGEN.
+ documentation targets being as part of a normal build. If the ``install``
+ target is run then this also enables all built documentation targets to be
+ installed. Defaults to OFF. To enable a particular documentation target, see
+ see LLVM_ENABLE_SPHINX and LLVM_ENABLE_DOXYGEN.
**LLVM_ENABLE_DOXYGEN**:BOOL
Enables the generation of browsable HTML documentation using doxygen.
@@ -509,7 +517,7 @@ LLVM-specific variables
OS X Only: If enabled CMake will generate a target named
'install-xcode-toolchain'. This target will create a directory at
$CMAKE_INSTALL_PREFIX/Toolchains containing an xctoolchain directory which can
- be used to override the default system tools.
+ be used to override the default system tools.
**LLVM_BUILD_LLVM_DYLIB**:BOOL
If enabled, the target for building the libLLVM shared library is added.
@@ -530,7 +538,7 @@ LLVM-specific variables
library (ON) or as a static library (OFF). Its default value is OFF. On
Windows, shared libraries may be used when building with MinGW, including
mingw-w64, but not when building with the Microsoft toolchain.
-
+
.. note:: BUILD_SHARED_LIBS is only recommended for use by LLVM developers.
If you want to build LLVM as a shared library, you should use the
``LLVM_BUILD_LLVM_DYLIB`` option.
@@ -551,6 +559,14 @@ LLVM-specific variables
<http://clang.llvm.org/docs/SourceBasedCodeCoverage.html>`_ instrumentation
is enabled while building llvm.
+**LLVM_CCACHE_BUILD**:BOOL
+ If enabled and the ``ccache`` program is available, then LLVM will be
+ built using ``ccache`` to speed up rebuilds of LLVM and its components.
+ Defaults to OFF. The size and location of the cache maintained
+ by ``ccache`` can be adjusted via the LLVM_CCACHE_MAXSIZE and LLVM_CCACHE_DIR
+ options, which are passed to the CCACHE_MAXSIZE and CCACHE_DIR environment
+ variables, respectively.
+
CMake Caches
============
diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst
index 5c0fb064959e..c0e3d53e9698 100644
--- a/docs/CodeGenerator.rst
+++ b/docs/CodeGenerator.rst
@@ -566,7 +566,7 @@ MI bundle support does not change the physical representations of
MachineBasicBlock and MachineInstr. All the MIs (including top level and nested
ones) are stored as sequential list of MIs. The "bundled" MIs are marked with
the 'InsideBundle' flag. A top level MI with the special BUNDLE opcode is used
-to represent the start of a bundle. It's legal to mix BUNDLE MIs with indiviual
+to represent the start of a bundle. It's legal to mix BUNDLE MIs with individual
MIs that are not inside bundles nor represent bundles.
MachineInstr passes should operate on a MI bundle as a single unit. Member
@@ -1584,7 +1584,7 @@ Emitting function stack size information
A section containing metadata on function stack sizes will be emitted when
``TargetLoweringObjectFile::StackSizesSection`` is not null, and
``TargetOptions::EmitStackSizeSection`` is set (-stack-size-section). The
-section will contain an array of pairs of function symbol references (8 byte)
+section will contain an array of pairs of function symbol values (pointer size)
and stack sizes (unsigned LEB128). The stack size values only include the space
allocated in the function prologue. Functions with dynamic stack allocations are
not included.
diff --git a/docs/CodingStandards.rst b/docs/CodingStandards.rst
index 231c034be19d..feb3bf0eb03a 100644
--- a/docs/CodingStandards.rst
+++ b/docs/CodingStandards.rst
@@ -91,9 +91,9 @@ guidance below to help you know what to expect.
Each toolchain provides a good reference for what it accepts:
-* Clang: http://clang.llvm.org/cxx_status.html
-* GCC: http://gcc.gnu.org/projects/cxx0x.html
-* MSVC: http://msdn.microsoft.com/en-us/library/hh567368.aspx
+* Clang: https://clang.llvm.org/cxx_status.html
+* GCC: https://gcc.gnu.org/projects/cxx-status.html#cxx11
+* MSVC: https://msdn.microsoft.com/en-us/library/hh567368.aspx
In most cases, the MSVC list will be the dominating factor. Here is a summary
of the features that are expected to work. Features not on this list are
@@ -184,7 +184,7 @@ you hit a type trait which doesn't work we can then add support to LLVM's
traits header to emulate it.
.. _the libstdc++ manual:
- http://gcc.gnu.org/onlinedocs/gcc-4.8.0/libstdc++/manual/manual/status.html#status.iso.2011
+ https://gcc.gnu.org/onlinedocs/gcc-4.8.0/libstdc++/manual/manual/status.html#status.iso.2011
Other Languages
---------------
@@ -591,7 +591,7 @@ understood for formatting nested function calls. Examples:
This formatting scheme also makes it particularly easy to get predictable,
consistent, and automatic formatting with tools like `Clang Format`_.
-.. _Clang Format: http://clang.llvm.org/docs/ClangFormat.html
+.. _Clang Format: https://clang.llvm.org/docs/ClangFormat.html
Language and Compiler Issues
----------------------------
@@ -667,14 +667,14 @@ Do not use Static Constructors
Static constructors and destructors (e.g. global variables whose types have a
constructor or destructor) should not be added to the code base, and should be
removed wherever possible. Besides `well known problems
-<http://yosefk.com/c++fqa/ctors.html#fqa-10.12>`_ where the order of
+<https://yosefk.com/c++fqa/ctors.html#fqa-10.12>`_ where the order of
initialization is undefined between globals in different source files, the
entire concept of static constructors is at odds with the common use case of
LLVM as a library linked into a larger application.
Consider the use of LLVM as a JIT linked into another application (perhaps for
-`OpenGL, custom languages <http://llvm.org/Users.html>`_, `shaders in movies
-<http://llvm.org/devmtg/2010-11/Gritz-OpenShadingLang.pdf>`_, etc). Due to the
+`OpenGL, custom languages <https://llvm.org/Users.html>`_, `shaders in movies
+<https://llvm.org/devmtg/2010-11/Gritz-OpenShadingLang.pdf>`_, etc). Due to the
design of static constructors, they must be executed at startup time of the
entire application, regardless of whether or how LLVM is used in that larger
application. There are two problems with this:
@@ -692,7 +692,7 @@ target or other library into an application, but static constructors violate
this goal.
That said, LLVM unfortunately does contain static constructors. It would be a
-`great project <http://llvm.org/PR11944>`_ for someone to purge all static
+`great project <https://llvm.org/PR11944>`_ for someone to purge all static
constructors from LLVM, and then enable the ``-Wglobal-constructors`` warning
flag (when building with Clang) to ensure we do not regress in the future.
@@ -826,33 +826,71 @@ As a rule of thumb, in case an ordered result is expected, remember to
sort an unordered container before iteration. Or use ordered containers
like vector/MapVector/SetVector if you want to iterate pointer keys.
+Beware of non-deterministic sorting order of equal elements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+std::sort uses a non-stable sorting algorithm in which the order of equal
+elements is not guaranteed to be preserved. Thus using std::sort for a
+container having equal elements may result in non-determinstic behavior.
+To uncover such instances of non-determinism, LLVM has introduced a new
+llvm::sort wrapper function. For an EXPENSIVE_CHECKS build this will randomly
+shuffle the container before sorting. As a rule of thumb, always make sure to
+use llvm::sort instead of std::sort.
+
Style Issues
============
The High-Level Issues
---------------------
-A Public Header File **is** a Module
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Self-contained Headers
+^^^^^^^^^^^^^^^^^^^^^^
+
+Header files should be self-contained (compile on their own) and end in .h.
+Non-header files that are meant for inclusion should end in .inc and be used
+sparingly.
-C++ doesn't do too well in the modularity department. There is no real
-encapsulation or data hiding (unless you use expensive protocol classes), but it
-is what we have to work with. When you write a public header file (in the LLVM
-source tree, they live in the top level "``include``" directory), you are
-defining a module of functionality.
+All header files should be self-contained. Users and refactoring tools should
+not have to adhere to special conditions to include the header. Specifically, a
+header should have header guards and include all other headers it needs.
-Ideally, modules should be completely independent of each other, and their
-header files should only ``#include`` the absolute minimum number of headers
-possible. A module is not just a class, a function, or a namespace: it's a
-collection of these that defines an interface. This interface may be several
-functions, classes, or data structures, but the important issue is how they work
-together.
+There are rare cases where a file designed to be included is not
+self-contained. These are typically intended to be included at unusual
+locations, such as the middle of another file. They might not use header
+guards, and might not include their prerequisites. Name such files with the
+.inc extension. Use sparingly, and prefer self-contained headers when possible.
-In general, a module should be implemented by one or more ``.cpp`` files. Each
+In general, a header should be implemented by one or more ``.cpp`` files. Each
of these ``.cpp`` files should include the header that defines their interface
-first. This ensures that all of the dependences of the module header have been
-properly added to the module header itself, and are not implicit. System
-headers should be included after user headers for a translation unit.
+first. This ensures that all of the dependences of the header have been
+properly added to the header itself, and are not implicit. System headers
+should be included after user headers for a translation unit.
+
+Library Layering
+^^^^^^^^^^^^^^^^
+
+A directory of header files (for example ``include/llvm/Foo``) defines a
+library (``Foo``). Dependencies between libraries are defined by the
+``LLVMBuild.txt`` file in their implementation (``lib/Foo``). One library (both
+its headers and implementation) should only use things from the libraries
+listed in its dependencies.
+
+Some of this constraint can be enforced by classic Unix linkers (Mac & Windows
+linkers, as well as lld, do not enforce this constraint). A Unix linker
+searches left to right through the libraries specified on its command line and
+never revisits a library. In this way, no circular dependencies between
+libraries can exist.
+
+This doesn't fully enforce all inter-library dependencies, and importantly
+doesn't enforce header file circular dependencies created by inline functions.
+A good way to answer the "is this layered correctly" would be to consider
+whether a Unix linker would succeed at linking the program if all inline
+functions were defined out-of-line. (& for all valid orderings of dependencies
+- since linking resolution is linear, it's possible that some implicit
+dependencies can sneak through: A depends on B and C, so valid orderings are
+"C B A" or "B C A", in both cases the explicit dependencies come before their
+use. But in the first case, B could still link successfully if it implicitly
+depended on C, or the opposite in the second case)
.. _minimal list of #includes:
@@ -1659,12 +1697,12 @@ A lot of these comments and recommendations have been culled from other sources.
Two particularly important books for our work are:
#. `Effective C++
- <http://www.amazon.com/Effective-Specific-Addison-Wesley-Professional-Computing/dp/0321334876>`_
+ <https://www.amazon.com/Effective-Specific-Addison-Wesley-Professional-Computing/dp/0321334876>`_
by Scott Meyers. Also interesting and useful are "More Effective C++" and
"Effective STL" by the same author.
#. `Large-Scale C++ Software Design
- <http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620/ref=sr_1_1>`_
+ <https://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620>`_
by John Lakos
If you get some free time, and you haven't read them: do so, you might learn
diff --git a/docs/CommandGuide/FileCheck.rst b/docs/CommandGuide/FileCheck.rst
index 9078f65e01c5..b0324f40463d 100644
--- a/docs/CommandGuide/FileCheck.rst
+++ b/docs/CommandGuide/FileCheck.rst
@@ -77,6 +77,10 @@ OPTIONS
-verify``. With this option FileCheck will verify that input does not contain
warnings not covered by any ``CHECK:`` patterns.
+.. option:: --dump-input-on-failure
+
+ When the check fails, dump all of the original input.
+
.. option:: --enable-var-scope
Enables scope for regex variables.
@@ -95,6 +99,23 @@ OPTIONS
Show the version number of this program.
+.. option:: -v
+
+ Print directive pattern matches.
+
+.. option:: -vv
+
+ Print information helpful in diagnosing internal FileCheck issues, such as
+ discarded overlapping ``CHECK-DAG:`` matches, implicit EOF pattern matches,
+ and ``CHECK-NOT:`` patterns that do not have matches. Implies ``-v``.
+
+.. option:: --allow-deprecated-dag-overlap
+
+ Enable overlapping among matches in a group of consecutive ``CHECK-DAG:``
+ directives. This option is deprecated and is only provided for convenience
+ as old tests are migrated to the new non-overlapping ``CHECK-DAG:``
+ implementation.
+
EXIT STATUS
-----------
@@ -241,6 +262,25 @@ For example, the following works like you'd expect:
it and the previous directive. A "``CHECK-SAME:``" cannot be the first
directive in a file.
+The "CHECK-EMPTY:" directive
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you need to check that the next line has nothing on it, not even whitespace,
+you can use the "``CHECK-EMPTY:``" directive.
+
+.. code-block:: llvm
+
+ foo
+
+ bar
+ ; CHECK: foo
+ ; CHECK-EMPTY:
+ ; CHECK-NEXT: bar
+
+Just like "``CHECK-NEXT:``" the directive will fail if there is more than one
+newline before it finds the next blank line, and it cannot be the first
+directive in a file.
+
The "CHECK-NOT:" directive
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -341,6 +381,25 @@ real bugs away.
In those cases, to enforce the order, use a non-DAG directive between DAG-blocks.
+A ``CHECK-DAG:`` directive skips matches that overlap the matches of any
+preceding ``CHECK-DAG:`` directives in the same ``CHECK-DAG:`` block. Not only
+is this non-overlapping behavior consistent with other directives, but it's
+also necessary to handle sets of non-unique strings or patterns. For example,
+the following directives look for unordered log entries for two tasks in a
+parallel program, such as the OpenMP runtime:
+
+.. code-block:: text
+
+ // CHECK-DAG: [[THREAD_ID:[0-9]+]]: task_begin
+ // CHECK-DAG: [[THREAD_ID]]: task_end
+ //
+ // CHECK-DAG: [[THREAD_ID:[0-9]+]]: task_begin
+ // CHECK-DAG: [[THREAD_ID]]: task_end
+
+The second pair of directives is guaranteed not to match the same log entries
+as the first pair even though the patterns are identical and even if the text
+of the log entries is identical because the thread ID manages to be reused.
+
The "CHECK-LABEL:" directive
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/docs/CommandGuide/dsymutil.rst b/docs/CommandGuide/dsymutil.rst
index a29bc3c295c7..ceaa54019a81 100644
--- a/docs/CommandGuide/dsymutil.rst
+++ b/docs/CommandGuide/dsymutil.rst
@@ -35,6 +35,15 @@ OPTIONS
Produce a flat dSYM file. A ``.dwarf`` extension will be appended to the
executable name unless the output file is specified using the -o option.
+
+.. option:: -z, --minimize
+
+ When used when creating a dSYM file, this option will suppress the emission of
+ the .debug_inlines, .debug_pubnames, and .debug_pubtypes sections since
+ dsymutil currently has better equivalents: .apple_names and .apple_types. When
+ used in conjunction with --update option, this option will cause redundant
+ accelerator tables to be removed.
+
.. option:: --no-odr
Do not use ODR (One Definition Rule) for uniquing C++ types.
@@ -61,10 +70,27 @@ OPTIONS
Specifies a ``path`` to prepend to all debug symbol object file paths.
+.. option:: --papertrail
+
+ When running dsymutil as part of your build system, it can be desirable for
+ warnings to be part of the end product, rather than just being emitted to the
+ output stream. When enabled warnings are embedded in the linked DWARF debug
+ information.
+
.. option:: -s, --symtab
Dumps the symbol table found in *executable* or object file(s) and exits.
+.. option:: --toolchain
+
+ Embed the toolchain in the dSYM bundle's property list.
+
+.. option:: -u, --update
+
+ Update an existing dSYM file to contain the latest accelerator tables and
+ other DWARF optimizations. This option will rebuild the '.apple_names' and
+ '.apple_types' hashed accelerator tables.
+
.. option:: -v, --verbose
Display verbose information when linking.
diff --git a/docs/CommandGuide/index.rst b/docs/CommandGuide/index.rst
index 805df00c1738..95efffdb6569 100644
--- a/docs/CommandGuide/index.rst
+++ b/docs/CommandGuide/index.rst
@@ -31,6 +31,7 @@ Basic Commands
llvm-symbolizer
llvm-dwarfdump
dsymutil
+ llvm-mca
Debugging Tools
~~~~~~~~~~~~~~~
@@ -52,5 +53,6 @@ Developer Tools
tblgen
lit
llvm-build
+ llvm-exegesis
llvm-pdbutil
llvm-readobj
diff --git a/docs/CommandGuide/lit.rst b/docs/CommandGuide/lit.rst
index fbe1a9ab1843..0d39311152d2 100644
--- a/docs/CommandGuide/lit.rst
+++ b/docs/CommandGuide/lit.rst
@@ -85,6 +85,10 @@ OUTPUT OPTIONS
Echo all commands to stdout, as they are being executed.
This can be valuable for debugging test failures, as the last echoed command
will be the one which has failed.
+ :program:`lit` normally inserts a no-op command (``:`` in the case of bash)
+ with argument ``'RUN: at line N'`` before each command pipeline, and this
+ option also causes those no-op commands to be echoed to stdout to help you
+ locate the source line of the failed command.
This option implies ``--verbose``.
.. option:: -a, --show-all
diff --git a/docs/CommandGuide/llc.rst b/docs/CommandGuide/llc.rst
index 95945e68d13f..11dfc902d20c 100644
--- a/docs/CommandGuide/llc.rst
+++ b/docs/CommandGuide/llc.rst
@@ -135,7 +135,7 @@ End-user Options
.. option:: -stack-size-section
Emit the .stack_sizes section which contains stack size metadata. The section
- contains an array of pairs of function symbol references (8 byte) and stack
+ contains an array of pairs of function symbol values (pointer size) and stack
sizes (unsigned LEB128). The stack size values only include the space allocated
in the function prologue. Functions with dynamic stack allocations are not
included.
diff --git a/docs/CommandGuide/llvm-cov.rst b/docs/CommandGuide/llvm-cov.rst
index 478ba0fb15e0..6f1b6e46c48a 100644
--- a/docs/CommandGuide/llvm-cov.rst
+++ b/docs/CommandGuide/llvm-cov.rst
@@ -246,6 +246,10 @@ OPTIONS
Show code coverage only for functions that match the given regular expression.
+.. option:: -ignore-filename-regex=<PATTERN>
+
+ Skip source code files with file paths that match the given regular expression.
+
.. option:: -format=<FORMAT>
Use the specified output format. The supported formats are: "text", "html".
@@ -323,8 +327,8 @@ the binaries *BIN*,... using the profile data *PROFILE*. It can optionally be
filtered to only show the coverage for the files listed in *SOURCES*.
If no source files are provided, a summary line is printed for each file in the
-coverage data. If any files are provided, summaries are shown for each function
-in the listed files instead.
+coverage data. If any files are provided, summaries can be shown for each
+function in the listed files if the ``-show-functions`` option is enabled.
For information on compiling programs for coverage and generating profile data,
see :ref:`llvm-cov-show`.
@@ -351,6 +355,10 @@ OPTIONS
Show statistics for all function instantiations. Defaults to false.
+.. option:: -ignore-filename-regex=<PATTERN>
+
+ Skip source code files with file paths that match the given regular expression.
+
.. program:: llvm-cov export
.. _llvm-cov-export:
@@ -361,14 +369,15 @@ EXPORT COMMAND
SYNOPSIS
^^^^^^^^
-:program:`llvm-cov export` [*options*] -instr-profile *PROFILE* *BIN* [*-object BIN,...*] [[*-object BIN*]]
+:program:`llvm-cov export` [*options*] -instr-profile *PROFILE* *BIN* [*-object BIN,...*] [[*-object BIN*]] [*SOURCES*]
DESCRIPTION
^^^^^^^^^^^
The :program:`llvm-cov export` command exports regions, functions, expansions,
and summaries of the coverage of the binaries *BIN*,... using the profile data
-*PROFILE* as JSON.
+*PROFILE* as JSON. It can optionally be filtered to only export the coverage
+for the files listed in *SOURCES*.
For information on compiling programs for coverage and generating profile data,
see :ref:`llvm-cov-show`.
@@ -389,3 +398,7 @@ OPTIONS
will not export coverage information for smaller units such as individual
functions or regions. The result will be the same as produced by :program:
`llvm-cov report` command, but presented in JSON format rather than text.
+
+.. option:: -ignore-filename-regex=<PATTERN>
+
+ Skip source code files with file paths that match the given regular expression.
diff --git a/docs/CommandGuide/llvm-exegesis-analysis.png b/docs/CommandGuide/llvm-exegesis-analysis.png
new file mode 100644
index 000000000000..e232f5f12355
--- /dev/null
+++ b/docs/CommandGuide/llvm-exegesis-analysis.png
Binary files differ
diff --git a/docs/CommandGuide/llvm-exegesis.rst b/docs/CommandGuide/llvm-exegesis.rst
new file mode 100644
index 000000000000..d60434f5d027
--- /dev/null
+++ b/docs/CommandGuide/llvm-exegesis.rst
@@ -0,0 +1,186 @@
+llvm-exegesis - LLVM Machine Instruction Benchmark
+==================================================
+
+SYNOPSIS
+--------
+
+:program:`llvm-exegesis` [*options*]
+
+DESCRIPTION
+-----------
+
+:program:`llvm-exegesis` is a benchmarking tool that uses information available
+in LLVM to measure host machine instruction characteristics like latency or port
+decomposition.
+
+Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
+generates a code snippet that makes execution as serial (resp. as parallel) as
+possible so that we can measure the latency (resp. uop decomposition) of the
+instruction.
+The code snippet is jitted and executed on the host subtarget. The time taken
+(resp. resource usage) is measured using hardware performance counters. The
+result is printed out as YAML to the standard output.
+
+The main goal of this tool is to automatically (in)validate the LLVM's TableDef
+scheduling models. To that end, we also provide analysis of the results.
+
+EXAMPLES: benchmarking
+----------------------
+
+Assume you have an X86-64 machine. To measure the latency of a single
+instruction, run:
+
+.. code-block:: bash
+
+ $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
+
+Measuring the uop decomposition of an instruction works similarly:
+
+.. code-block:: bash
+
+ $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
+
+The output is a YAML document (the default is to write to stdout, but you can
+redirect the output to a file using `-benchmarks-file`):
+
+.. code-block:: none
+
+ ---
+ key:
+ opcode_name: ADD64rr
+ mode: latency
+ config: ''
+ cpu_name: haswell
+ llvm_triple: x86_64-unknown-linux-gnu
+ num_repetitions: 10000
+ measurements:
+ - { key: latency, value: 1.0058, debug_string: '' }
+ error: ''
+ info: 'explicit self cycles, selecting one aliasing configuration.
+ Snippet:
+ ADD64rr R8, R8, R10
+ '
+ ...
+
+To measure the latency of all instructions for the host architecture, run:
+
+.. code-block:: bash
+
+ #!/bin/bash
+ readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
+ for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
+ do
+ ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
+ done
+
+FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
+
+EXAMPLES: analysis
+----------------------
+
+Assuming you have a set of benchmarked instructions (either latency or uops) as
+YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
+following command:
+
+.. code-block:: bash
+
+ $ llvm-exegesis -mode=analysis \
+ -benchmarks-file=/tmp/benchmarks.yaml \
+ -analysis-clusters-output-file=/tmp/clusters.csv \
+ -analysis-inconsistencies-output-file=/tmp/inconsistencies.txt
+
+This will group the instructions into clusters with the same performance
+characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
+following format:
+
+.. code-block:: none
+
+ cluster_id,opcode_name,config,sched_class
+ ...
+ 2,ADD32ri8_DB,,WriteALU,1.00
+ 2,ADD32ri_DB,,WriteALU,1.01
+ 2,ADD32rr,,WriteALU,1.01
+ 2,ADD32rr_DB,,WriteALU,1.00
+ 2,ADD32rr_REV,,WriteALU,1.00
+ 2,ADD64i32,,WriteALU,1.01
+ 2,ADD64ri32,,WriteALU,1.01
+ 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
+ 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
+ 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
+ 2,ADD64ri8,,WriteALU,1.00
+ 2,SETBr,,WriteSETCC,1.01
+ ...
+
+:program:`llvm-exegesis` will also analyze the clusters to point out
+inconsistencies in the scheduling information. The output is an html file. For
+example, `/tmp/inconsistencies.html` will contain messages like the following :
+
+.. image:: llvm-exegesis-analysis.png
+ :align: center
+
+Note that the scheduling class names will be resolved only when
+:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
+be shown. This does not invalidate any of the analysis results though.
+
+
+OPTIONS
+-------
+
+.. option:: -help
+
+ Print a summary of command line options.
+
+.. option:: -opcode-index=<LLVM opcode index>
+
+ Specify the opcode to measure, by index.
+ Either `opcode-index` or `opcode-name` must be set.
+
+.. option:: -opcode-name=<LLVM opcode name>
+
+ Specify the opcode to measure, by name.
+ Either `opcode-index` or `opcode-name` must be set.
+
+.. option:: -mode=[latency|uops|analysis]
+
+ Specify the run mode.
+
+.. option:: -num-repetitions=<Number of repetition>
+
+ Specify the number of repetitions of the asm snippet.
+ Higher values lead to more accurate measurements but lengthen the benchmark.
+
+.. option:: -benchmarks-file=</path/to/file>
+
+ File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark
+ results. "-" uses stdin/stdout.
+
+.. option:: -analysis-clusters-output-file=</path/to/file>
+
+ If provided, write the analysis clusters as CSV to this file. "-" prints to
+ stdout.
+
+.. option:: -analysis-inconsistencies-output-file=</path/to/file>
+
+ If non-empty, write inconsistencies found during analysis to this file. `-`
+ prints to stdout.
+
+.. option:: -analysis-numpoints=<dbscan numPoints parameter>
+
+ Specify the numPoints parameters to be used for DBSCAN clustering
+ (`analysis` mode).
+
+.. option:: -analysis-espilon=<dbscan epsilon parameter>
+
+ Specify the numPoints parameters to be used for DBSCAN clustering
+ (`analysis` mode).
+
+.. option:: -ignore-invalid-sched-class=false
+
+ If set, ignore instructions that do not have a sched class (class idx = 0).
+
+
+EXIT STATUS
+-----------
+
+:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
+printed to standard error, and the tool returns a non 0 value.
diff --git a/docs/CommandGuide/llvm-mca.rst b/docs/CommandGuide/llvm-mca.rst
new file mode 100644
index 000000000000..dd2320b15ffb
--- /dev/null
+++ b/docs/CommandGuide/llvm-mca.rst
@@ -0,0 +1,551 @@
+llvm-mca - LLVM Machine Code Analyzer
+=====================================
+
+SYNOPSIS
+--------
+
+:program:`llvm-mca` [*options*] [input]
+
+DESCRIPTION
+-----------
+
+:program:`llvm-mca` is a performance analysis tool that uses information
+available in LLVM (e.g. scheduling models) to statically measure the performance
+of machine code in a specific CPU.
+
+Performance is measured in terms of throughput as well as processor resource
+consumption. The tool currently works for processors with an out-of-order
+backend, for which there is a scheduling model available in LLVM.
+
+The main goal of this tool is not just to predict the performance of the code
+when run on the target, but also help with diagnosing potential performance
+issues.
+
+Given an assembly code sequence, llvm-mca estimates the Instructions Per Cycle
+(IPC), as well as hardware resource pressure. The analysis and reporting style
+were inspired by the IACA tool from Intel.
+
+:program:`llvm-mca` allows the usage of special code comments to mark regions of
+the assembly code to be analyzed. A comment starting with substring
+``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with
+substring ``LLVM-MCA-END`` marks the end of a code region. For example:
+
+.. code-block:: none
+
+ # LLVM-MCA-BEGIN My Code Region
+ ...
+ # LLVM-MCA-END
+
+Multiple regions can be specified provided that they do not overlap. A code
+region can have an optional description. If no user-defined region is specified,
+then :program:`llvm-mca` assumes a default region which contains every
+instruction in the input file. Every region is analyzed in isolation, and the
+final performance report is the union of all the reports generated for every
+code region.
+
+Inline assembly directives may be used from source code to annotate the
+assembly text:
+
+.. code-block:: c++
+
+ int foo(int a, int b) {
+ __asm volatile("# LLVM-MCA-BEGIN foo");
+ a += 42;
+ __asm volatile("# LLVM-MCA-END");
+ a *= b;
+ return a;
+ }
+
+So for example, you can compile code with clang, output assembly, and pipe it
+directly into llvm-mca for analysis:
+
+.. code-block:: bash
+
+ $ clang foo.c -O2 -target x86_64-unknown-unknown -S -o - | llvm-mca -mcpu=btver2
+
+Or for Intel syntax:
+
+.. code-block:: bash
+
+ $ clang foo.c -O2 -target x86_64-unknown-unknown -mllvm -x86-asm-syntax=intel -S -o - | llvm-mca -mcpu=btver2
+
+OPTIONS
+-------
+
+If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
+input. Otherwise, it will read from the specified filename.
+
+If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
+to standard output if the input is from standard input. If the :option:`-o`
+option specifies "``-``", then the output will also be sent to standard output.
+
+
+.. option:: -help
+
+ Print a summary of command line options.
+
+.. option:: -mtriple=<target triple>
+
+ Specify a target triple string.
+
+.. option:: -march=<arch>
+
+ Specify the architecture for which to analyze the code. It defaults to the
+ host default target.
+
+.. option:: -mcpu=<cpuname>
+
+ Specify the processor for which to analyze the code. By default, the cpu name
+ is autodetected from the host.
+
+.. option:: -output-asm-variant=<variant id>
+
+ Specify the output assembly variant for the report generated by the tool.
+ On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
+ the AT&T (vic. Intel) assembly format for the code printed out by the tool in
+ the analysis report.
+
+.. option:: -dispatch=<width>
+
+ Specify a different dispatch width for the processor. The dispatch width
+ defaults to field 'IssueWidth' in the processor scheduling model. If width is
+ zero, then the default dispatch width is used.
+
+.. option:: -register-file-size=<size>
+
+ Specify the size of the register file. When specified, this flag limits how
+ many temporary registers are available for register renaming purposes. A value
+ of zero for this flag means "unlimited number of temporary registers".
+
+.. option:: -iterations=<number of iterations>
+
+ Specify the number of iterations to run. If this flag is set to 0, then the
+ tool sets the number of iterations to a default value (i.e. 100).
+
+.. option:: -noalias=<bool>
+
+ If set, the tool assumes that loads and stores don't alias. This is the
+ default behavior.
+
+.. option:: -lqueue=<load queue size>
+
+ Specify the size of the load queue in the load/store unit emulated by the tool.
+ By default, the tool assumes an unbound number of entries in the load queue.
+ A value of zero for this flag is ignored, and the default load queue size is
+ used instead.
+
+.. option:: -squeue=<store queue size>
+
+ Specify the size of the store queue in the load/store unit emulated by the
+ tool. By default, the tool assumes an unbound number of entries in the store
+ queue. A value of zero for this flag is ignored, and the default store queue
+ size is used instead.
+
+.. option:: -timeline
+
+ Enable the timeline view.
+
+.. option:: -timeline-max-iterations=<iterations>
+
+ Limit the number of iterations to print in the timeline view. By default, the
+ timeline view prints information for up to 10 iterations.
+
+.. option:: -timeline-max-cycles=<cycles>
+
+ Limit the number of cycles in the timeline view. By default, the number of
+ cycles is set to 80.
+
+.. option:: -resource-pressure
+
+ Enable the resource pressure view. This is enabled by default.
+
+.. option:: -register-file-stats
+
+ Enable register file usage statistics.
+
+.. option:: -dispatch-stats
+
+ Enable extra dispatch statistics. This view collects and analyzes instruction
+ dispatch events, as well as static/dynamic dispatch stall events. This view
+ is disabled by default.
+
+.. option:: -scheduler-stats
+
+ Enable extra scheduler statistics. This view collects and analyzes instruction
+ issue events. This view is disabled by default.
+
+.. option:: -retire-stats
+
+ Enable extra retire control unit statistics. This view is disabled by default.
+
+.. option:: -instruction-info
+
+ Enable the instruction info view. This is enabled by default.
+
+.. option:: -all-stats
+
+ Print all hardware statistics. This enables extra statistics related to the
+ dispatch logic, the hardware schedulers, the register file(s), and the retire
+ control unit. This option is disabled by default.
+
+.. option:: -all-views
+
+ Enable all the view.
+
+.. option:: -instruction-tables
+
+ Prints resource pressure information based on the static information
+ available from the processor model. This differs from the resource pressure
+ view because it doesn't require that the code is simulated. It instead prints
+ the theoretical uniform distribution of resource pressure for every
+ instruction in sequence.
+
+
+EXIT STATUS
+-----------
+
+:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
+to standard error, and the tool returns 1.
+
+HOW MCA WORKS
+-------------
+
+MCA takes assembly code as input. The assembly code is parsed into a sequence
+of MCInst with the help of the existing LLVM target assembly parsers. The
+parsed sequence of MCInst is then analyzed by a ``Pipeline`` module to generate
+a performance report.
+
+The Pipeline module simulates the execution of the machine code sequence in a
+loop of iterations (default is 100). During this process, the pipeline collects
+a number of execution related statistics. At the end of this process, the
+pipeline generates and prints a report from the collected statistics.
+
+Here is an example of a performance report generated by MCA for a dot-product
+of two packed float vectors of four elements. The analysis is conducted for
+target x86, cpu btver2. The following result can be produced via the following
+command using the example located at
+``test/tools/llvm-mca/X86/BtVer2/dot-product.s``:
+
+.. code-block:: bash
+
+ $ llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=300 dot-product.s
+
+.. code-block:: none
+
+ Iterations: 300
+ Instructions: 900
+ Total Cycles: 610
+ Dispatch Width: 2
+ IPC: 1.48
+ Block RThroughput: 2.0
+
+
+ Instruction Info:
+ [1]: #uOps
+ [2]: Latency
+ [3]: RThroughput
+ [4]: MayLoad
+ [5]: MayStore
+ [6]: HasSideEffects (U)
+
+ [1] [2] [3] [4] [5] [6] Instructions:
+ 1 2 1.00 vmulps %xmm0, %xmm1, %xmm2
+ 1 3 1.00 vhaddps %xmm2, %xmm2, %xmm3
+ 1 3 1.00 vhaddps %xmm3, %xmm3, %xmm4
+
+
+ Resources:
+ [0] - JALU0
+ [1] - JALU1
+ [2] - JDiv
+ [3] - JFPA
+ [4] - JFPM
+ [5] - JFPU0
+ [6] - JFPU1
+ [7] - JLAGU
+ [8] - JMul
+ [9] - JSAGU
+ [10] - JSTC
+ [11] - JVALU0
+ [12] - JVALU1
+ [13] - JVIMUL
+
+
+ Resource pressure per iteration:
+ [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
+ - - - 2.00 1.00 2.00 1.00 - - - - - - -
+
+ Resource pressure by instruction:
+ [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
+ - - - - 1.00 - 1.00 - - - - - - - vmulps %xmm0, %xmm1, %xmm2
+ - - - 1.00 - 1.00 - - - - - - - - vhaddps %xmm2, %xmm2, %xmm3
+ - - - 1.00 - 1.00 - - - - - - - - vhaddps %xmm3, %xmm3, %xmm4
+
+According to this report, the dot-product kernel has been executed 300 times,
+for a total of 900 dynamically executed instructions.
+
+The report is structured in three main sections. The first section collects a
+few performance numbers; the goal of this section is to give a very quick
+overview of the performance throughput. In this example, the two important
+performance indicators are the predicted total number of cycles, and the IPC.
+IPC is probably the most important throughput indicator. A big delta between
+the Dispatch Width and the computed IPC is an indicator of potential
+performance issues.
+
+The second section of the report shows the latency and reciprocal
+throughput of every instruction in the sequence. That section also reports
+extra information related to the number of micro opcodes, and opcode properties
+(i.e., 'MayLoad', 'MayStore', and 'HasSideEffects').
+
+The third section is the *Resource pressure view*. This view reports
+the average number of resource cycles consumed every iteration by instructions
+for every processor resource unit available on the target. Information is
+structured in two tables. The first table reports the number of resource cycles
+spent on average every iteration. The second table correlates the resource
+cycles to the machine instruction in the sequence. For example, every iteration
+of the instruction vmulps always executes on resource unit [6]
+(JFPU1 - floating point pipeline #1), consuming an average of 1 resource cycle
+per iteration. Note that on AMD Jaguar, vector floating-point multiply can
+only be issued to pipeline JFPU1, while horizontal floating-point additions can
+only be issued to pipeline JFPU0.
+
+The resource pressure view helps with identifying bottlenecks caused by high
+usage of specific hardware resources. Situations with resource pressure mainly
+concentrated on a few resources should, in general, be avoided. Ideally,
+pressure should be uniformly distributed between multiple resources.
+
+Timeline View
+^^^^^^^^^^^^^
+MCA's timeline view produces a detailed report of each instruction's state
+transitions through an instruction pipeline. This view is enabled by the
+command line option ``-timeline``. As instructions transition through the
+various stages of the pipeline, their states are depicted in the view report.
+These states are represented by the following characters:
+
+* D : Instruction dispatched.
+* e : Instruction executing.
+* E : Instruction executed.
+* R : Instruction retired.
+* = : Instruction already dispatched, waiting to be executed.
+* \- : Instruction executed, waiting to be retired.
+
+Below is the timeline view for a subset of the dot-product example located in
+``test/tools/llvm-mca/X86/BtVer2/dot-product.s`` and processed by
+MCA using the following command:
+
+.. code-block:: bash
+
+ $ llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=3 -timeline dot-product.s
+
+.. code-block:: none
+
+ Timeline view:
+ 012345
+ Index 0123456789
+
+ [0,0] DeeER. . . vmulps %xmm0, %xmm1, %xmm2
+ [0,1] D==eeeER . . vhaddps %xmm2, %xmm2, %xmm3
+ [0,2] .D====eeeER . vhaddps %xmm3, %xmm3, %xmm4
+ [1,0] .DeeE-----R . vmulps %xmm0, %xmm1, %xmm2
+ [1,1] . D=eeeE---R . vhaddps %xmm2, %xmm2, %xmm3
+ [1,2] . D====eeeER . vhaddps %xmm3, %xmm3, %xmm4
+ [2,0] . DeeE-----R . vmulps %xmm0, %xmm1, %xmm2
+ [2,1] . D====eeeER . vhaddps %xmm2, %xmm2, %xmm3
+ [2,2] . D======eeeER vhaddps %xmm3, %xmm3, %xmm4
+
+
+ Average Wait times (based on the timeline view):
+ [0]: Executions
+ [1]: Average time spent waiting in a scheduler's queue
+ [2]: Average time spent waiting in a scheduler's queue while ready
+ [3]: Average time elapsed from WB until retire stage
+
+ [0] [1] [2] [3]
+ 0. 3 1.0 1.0 3.3 vmulps %xmm0, %xmm1, %xmm2
+ 1. 3 3.3 0.7 1.0 vhaddps %xmm2, %xmm2, %xmm3
+ 2. 3 5.7 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4
+
+The timeline view is interesting because it shows instruction state changes
+during execution. It also gives an idea of how MCA processes instructions
+executed on the target, and how their timing information might be calculated.
+
+The timeline view is structured in two tables. The first table shows
+instructions changing state over time (measured in cycles); the second table
+(named *Average Wait times*) reports useful timing statistics, which should
+help diagnose performance bottlenecks caused by long data dependencies and
+sub-optimal usage of hardware resources.
+
+An instruction in the timeline view is identified by a pair of indices, where
+the first index identifies an iteration, and the second index is the
+instruction index (i.e., where it appears in the code sequence). Since this
+example was generated using 3 iterations: ``-iterations=3``, the iteration
+indices range from 0-2 inclusively.
+
+Excluding the first and last column, the remaining columns are in cycles.
+Cycles are numbered sequentially starting from 0.
+
+From the example output above, we know the following:
+
+* Instruction [1,0] was dispatched at cycle 1.
+* Instruction [1,0] started executing at cycle 2.
+* Instruction [1,0] reached the write back stage at cycle 4.
+* Instruction [1,0] was retired at cycle 10.
+
+Instruction [1,0] (i.e., vmulps from iteration #1) does not have to wait in the
+scheduler's queue for the operands to become available. By the time vmulps is
+dispatched, operands are already available, and pipeline JFPU1 is ready to
+serve another instruction. So the instruction can be immediately issued on the
+JFPU1 pipeline. That is demonstrated by the fact that the instruction only
+spent 1cy in the scheduler's queue.
+
+There is a gap of 5 cycles between the write-back stage and the retire event.
+That is because instructions must retire in program order, so [1,0] has to wait
+for [0,2] to be retired first (i.e., it has to wait until cycle 10).
+
+In the example, all instructions are in a RAW (Read After Write) dependency
+chain. Register %xmm2 written by vmulps is immediately used by the first
+vhaddps, and register %xmm3 written by the first vhaddps is used by the second
+vhaddps. Long data dependencies negatively impact the ILP (Instruction Level
+Parallelism).
+
+In the dot-product example, there are anti-dependencies introduced by
+instructions from different iterations. However, those dependencies can be
+removed at register renaming stage (at the cost of allocating register aliases,
+and therefore consuming temporary registers).
+
+Table *Average Wait times* helps diagnose performance issues that are caused by
+the presence of long latency instructions and potentially long data dependencies
+which may limit the ILP. Note that MCA, by default, assumes at least 1cy
+between the dispatch event and the issue event.
+
+When the performance is limited by data dependencies and/or long latency
+instructions, the number of cycles spent while in the *ready* state is expected
+to be very small when compared with the total number of cycles spent in the
+scheduler's queue. The difference between the two counters is a good indicator
+of how large of an impact data dependencies had on the execution of the
+instructions. When performance is mostly limited by the lack of hardware
+resources, the delta between the two counters is small. However, the number of
+cycles spent in the queue tends to be larger (i.e., more than 1-3cy),
+especially when compared to other low latency instructions.
+
+Extra Statistics to Further Diagnose Performance Issues
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The ``-all-stats`` command line option enables extra statistics and performance
+counters for the dispatch logic, the reorder buffer, the retire control unit,
+and the register file.
+
+Below is an example of ``-all-stats`` output generated by MCA for the
+dot-product example discussed in the previous sections.
+
+.. code-block:: none
+
+ Dynamic Dispatch Stall Cycles:
+ RAT - Register unavailable: 0
+ RCU - Retire tokens unavailable: 0
+ SCHEDQ - Scheduler full: 272
+ LQ - Load queue full: 0
+ SQ - Store queue full: 0
+ GROUP - Static restrictions on the dispatch group: 0
+
+
+ Dispatch Logic - number of cycles where we saw N instructions dispatched:
+ [# dispatched], [# cycles]
+ 0, 24 (3.9%)
+ 1, 272 (44.6%)
+ 2, 314 (51.5%)
+
+
+ Schedulers - number of cycles where we saw N instructions issued:
+ [# issued], [# cycles]
+ 0, 7 (1.1%)
+ 1, 306 (50.2%)
+ 2, 297 (48.7%)
+
+
+ Scheduler's queue usage:
+ JALU01, 0/20
+ JFPU01, 18/18
+ JLSAGU, 0/12
+
+
+ Retire Control Unit - number of cycles where we saw N instructions retired:
+ [# retired], [# cycles]
+ 0, 109 (17.9%)
+ 1, 102 (16.7%)
+ 2, 399 (65.4%)
+
+
+ Register File statistics:
+ Total number of mappings created: 900
+ Max number of mappings used: 35
+
+ * Register File #1 -- JFpuPRF:
+ Number of physical registers: 72
+ Total number of mappings created: 900
+ Max number of mappings used: 35
+
+ * Register File #2 -- JIntegerPRF:
+ Number of physical registers: 64
+ Total number of mappings created: 0
+ Max number of mappings used: 0
+
+If we look at the *Dynamic Dispatch Stall Cycles* table, we see the counter for
+SCHEDQ reports 272 cycles. This counter is incremented every time the dispatch
+logic is unable to dispatch a group of two instructions because the scheduler's
+queue is full.
+
+Looking at the *Dispatch Logic* table, we see that the pipeline was only able
+to dispatch two instructions 51.5% of the time. The dispatch group was limited
+to one instruction 44.6% of the cycles, which corresponds to 272 cycles. The
+dispatch statistics are displayed by either using the command option
+``-all-stats`` or ``-dispatch-stats``.
+
+The next table, *Schedulers*, presents a histogram displaying a count,
+representing the number of instructions issued on some number of cycles. In
+this case, of the 610 simulated cycles, single
+instructions were issued 306 times (50.2%) and there were 7 cycles where
+no instructions were issued.
+
+The *Scheduler's queue usage* table shows that the maximum number of buffer
+entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01
+reached its maximum (18 of 18 queue entries). Note that AMD Jaguar implements
+three schedulers:
+
+* JALU01 - A scheduler for ALU instructions.
+* JFPU01 - A scheduler floating point operations.
+* JLSAGU - A scheduler for address generation.
+
+The dot-product is a kernel of three floating point instructions (a vector
+multiply followed by two horizontal adds). That explains why only the floating
+point scheduler appears to be used.
+
+A full scheduler queue is either caused by data dependency chains or by a
+sub-optimal usage of hardware resources. Sometimes, resource pressure can be
+mitigated by rewriting the kernel using different instructions that consume
+different scheduler resources. Schedulers with a small queue are less resilient
+to bottlenecks caused by the presence of long data dependencies.
+The scheduler statistics are displayed by
+using the command option ``-all-stats`` or ``-scheduler-stats``.
+
+The next table, *Retire Control Unit*, presents a histogram displaying a count,
+representing the number of instructions retired on some number of cycles. In
+this case, of the 610 simulated cycles, two instructions were retired during
+the same cycle 399 times (65.4%) and there were 109 cycles where no
+instructions were retired. The retire statistics are displayed by using the
+command option ``-all-stats`` or ``-retire-stats``.
+
+The last table presented is *Register File statistics*. Each physical register
+file (PRF) used by the pipeline is presented in this table. In the case of AMD
+Jaguar, there are two register files, one for floating-point registers
+(JFpuPRF) and one for integer registers (JIntegerPRF). The table shows that of
+the 900 instructions processed, there were 900 mappings created. Since this
+dot-product example utilized only floating point registers, the JFPuPRF was
+responsible for creating the 900 mappings. However, we see that the pipeline
+only used a maximum of 35 of 72 available register slots at any given time. We
+can conclude that the floating point PRF was the only register file used for
+the example, and that it was never resource constrained. The register file
+statistics are displayed by using the command option ``-all-stats`` or
+``-register-file-stats``.
+
+In this example, we can conclude that the IPC is mostly limited by data
+dependencies, and not by resource pressure.
diff --git a/docs/CommandGuide/llvm-nm.rst b/docs/CommandGuide/llvm-nm.rst
index da7edea4743b..2a2f8275ffc9 100644
--- a/docs/CommandGuide/llvm-nm.rst
+++ b/docs/CommandGuide/llvm-nm.rst
@@ -93,6 +93,10 @@ OPTIONS
Print only symbols whose definitions are external; that is, accessible
from other files.
+.. option:: --no-weak, -W
+
+ Don't print any weak symbols in the output.
+
.. option:: --format=format, -f format
Select an output format; *format* may be *sysv*, *posix*, or *bsd*. The default
diff --git a/docs/CommandGuide/opt.rst b/docs/CommandGuide/opt.rst
index 7b9255d26423..2b2fffa063a0 100644
--- a/docs/CommandGuide/opt.rst
+++ b/docs/CommandGuide/opt.rst
@@ -96,7 +96,7 @@ OPTIONS
.. option:: -debug
If this is a debug build, this option will enable debug printouts from passes
- which use the ``DEBUG()`` macro. See the `LLVM Programmer's Manual
+ which use the ``LLVM_DEBUG()`` macro. See the `LLVM Programmer's Manual
<../ProgrammersManual.html>`_, section ``#DEBUG`` for more information.
.. option:: -load=<plugin>
diff --git a/docs/CommandGuide/tblgen.rst b/docs/CommandGuide/tblgen.rst
index a42b04dbf8be..55b542948469 100644
--- a/docs/CommandGuide/tblgen.rst
+++ b/docs/CommandGuide/tblgen.rst
@@ -57,6 +57,11 @@ OPTIONS
Print all records to standard output (default).
+.. option:: -dump-json
+
+ Print a JSON representation of all records, suitable for further
+ automated processing.
+
.. option:: -print-enums
Print enumeration values for a class.
@@ -109,9 +114,13 @@ OPTIONS
Generate subtarget enumerations.
-.. option:: -gen-intrinsic
+.. option:: -gen-intrinsic-enums
+
+ Generate intrinsic enums.
+
+.. option:: -gen-intrinsic-impl
- Generate intrinsic information.
+ Generate intrinsic implementation.
.. option:: -gen-tgt-intrinsic
diff --git a/docs/CommandLine.rst b/docs/CommandLine.rst
index 5d2a39d45a17..9a6a196b431c 100644
--- a/docs/CommandLine.rst
+++ b/docs/CommandLine.rst
@@ -886,12 +886,12 @@ To do this, set up your .h file with your option, like this for example:
// debug build, then the code specified as the option to the macro will be
// executed. Otherwise it will not be.
#ifdef NDEBUG
- #define DEBUG(X)
+ #define LLVM_DEBUG(X)
#else
- #define DEBUG(X) do { if (DebugFlag) { X; } } while (0)
+ #define LLVM_DEBUG(X) do { if (DebugFlag) { X; } } while (0)
#endif
-This allows clients to blissfully use the ``DEBUG()`` macro, or the
+This allows clients to blissfully use the ``LLVM_DEBUG()`` macro, or the
``DebugFlag`` explicitly if they want to. Now we just need to be able to set
the ``DebugFlag`` boolean when the option is set. To do this, we pass an
additional argument to our command line argument processor, and we specify where
@@ -1716,7 +1716,7 @@ line option outside of the library. In these cases the library does or should
provide an external storage location that is accessible to users of the
library. Examples of this include the ``llvm::DebugFlag`` exported by the
``lib/Support/Debug.cpp`` file and the ``llvm::TimePassesIsEnabled`` flag
-exported by the ``lib/VMCore/PassManager.cpp`` file.
+exported by the ``lib/IR/PassManager.cpp`` file.
.. todo::
diff --git a/docs/CompilerWriterInfo.rst b/docs/CompilerWriterInfo.rst
index 60f102472c63..838c36a4099c 100644
--- a/docs/CompilerWriterInfo.rst
+++ b/docs/CompilerWriterInfo.rst
@@ -40,7 +40,7 @@ Lanai
MIPS
----
-* `MIPS Processor Architecture <https://imgtec.com/mips/architectures/>`_
+* `MIPS Processor Architecture <https://www.mips.com/products/>`_
* `MIPS 64-bit ELF Object File Specification <http://techpubs.sgi.com/library/manuals/4000/007-4658-001/pdf/007-4658-001.pdf>`_
diff --git a/docs/Contributing.rst b/docs/Contributing.rst
new file mode 100644
index 000000000000..1f7951dbc6eb
--- /dev/null
+++ b/docs/Contributing.rst
@@ -0,0 +1,127 @@
+==================================
+Contributing to LLVM
+==================================
+
+
+Thank you for your interest in contributing to LLVM! There are multiple ways to
+contribute, and we appreciate all contributions. In case you
+have questions, you can either use the `Developer's List (llvm-dev)`_
+or the #llvm channel on `irc.oftc.net`_.
+
+If you want to contribute code, please familiarize yourself with the :doc:`DeveloperPolicy`.
+
+.. contents::
+ :local:
+
+
+Ways to Contribute
+==================
+
+Bug Reports
+-----------
+If you are working with LLVM and run into a bug, we definitely want to know
+about it. Please let us know and follow the instructions in
+:doc:`HowToSubmitABug` to create a bug report.
+
+Bug Fixes
+---------
+If you are interested in contributing code to LLVM, bugs labeled with the
+`beginner keyword`_ in the `bug tracker`_ are a good way to get familiar with
+the code base. If you are interested in fixing a bug, please create an account
+for the bug tracker and assign it to yourself, to let people know you are working on
+it.
+
+Then try to reproduce and fix the bug with upstream LLVM. Start by building
+LLVM from source as described in :doc:`GettingStarted` and
+and use the built binaries to reproduce the failure described in the bug. Use
+a debug build (`-DCMAKE_BUILD_TYPE=Debug`) or a build with assertions
+(`-DLLVM_ENABLE_ASSERTIONS=On`, enabled for Debug builds).
+
+Bigger Pieces of Work
+---------------------
+In case you are interested in taking on a bigger piece of work, a list of
+interesting projects is maintained at the `LLVM's Open Projects page`_. In case
+you are interested in working on any of these projects, please send a mail to
+the `LLVM Developer's mailing list`_, so that we know the project is being
+worked on.
+
+
+How to Submit a Patch
+=====================
+Once you have a patch ready, it is time to submit it. The patch should:
+
+* include a small unit test
+* conform to the :doc:`CodingStandards`. You can use the `clang-format-diff.py`_ or `git-clang-format`_ tools to automatically format your patch properly.
+* not contain any unrelated changes
+* be an isolated change. Independent changes should be submitted as separate patches as this makes reviewing easier.
+
+To get a patch accepted, it has to be reviewed by the LLVM community. This can
+be done using `LLVM's Phabricator`_ or the llvm-commits mailing list.
+Please follow :ref:`Phabricator#requesting-a-review-via-the-web-interface <phabricator-request-review-web>`
+to request a review using Phabricator.
+
+To make sure the right people see your patch, please select suitable reviewers
+and add them to your patch when requesting a review. Suitable reviewers are the
+code owner (see CODE_OWNERS.txt) and other people doing work in the area your
+patch touches. If you are using Phabricator, add them to the `Reviewers` field
+when creating a review and if you are using `llvm-commits`, add them to the CC of
+your email.
+
+A reviewer may request changes or ask questions during the review. If you are
+uncertain on how to provide test cases, documentation, etc., feel free to ask
+for guidance during the review. Please address the feedback and re-post an
+updated version of your patch. This cycle continues until all requests and comments
+have been addressed and a reviewer accepts the patch with a `Looks good to me` or `LGTM`.
+Once that is done the change can be committed. If you do not have commit
+access, please let people know during the review and someone should commit it
+on your behalf.
+
+If you have received no comments on your patch for a week, you can request a
+review by 'ping'ing a patch by responding to the email thread containing the
+patch, or the Phabricator review with "Ping." The common courtesy 'ping' rate
+is once a week. Please remember that you are asking for valuable time from other
+professional developers.
+
+
+Helpful Information About LLVM
+==============================
+:doc:`LLVM's documentation <index>` provides a wealth of information about LLVM's internals as
+well as various user guides. The pages listed below should provide a good overview
+of LLVM's high-level design, as well as its internals:
+
+:doc:`GettingStarted`
+ Discusses how to get up and running quickly with the LLVM infrastructure.
+ Everything from unpacking and compilation of the distribution to execution
+ of some tools.
+
+:doc:`LangRef`
+ Defines the LLVM intermediate representation.
+
+:doc:`ProgrammersManual`
+ Introduction to the general layout of the LLVM sourcebase, important classes
+ and APIs, and some tips & tricks.
+
+:ref:`index-subsystem-docs`
+ A collection of pages documenting various subsystems of LLVM.
+
+`LLVM for Grad Students`__
+ This is an introduction to the LLVM infrastructure by Adrian Sampson. While it
+ has been written for grad students, it provides a good, compact overview of
+ LLVM's architecture, LLVM's IR and how to write a new pass.
+
+ .. __: http://www.cs.cornell.edu/~asampson/blog/llvm.html
+
+`Intro to LLVM`__
+ Book chapter providing a compiler hacker's introduction to LLVM.
+
+ .. __: http://www.aosabook.org/en/llvm.html
+
+.. _Developer's List (llvm-dev): http://lists.llvm.org/mailman/listinfo/llvm-dev
+.. _irc.oftc.net: irc://irc.oftc.net/llvm
+.. _beginner keyword: https://bugs.llvm.org/buglist.cgi?bug_status=NEW&bug_status=REOPENED&keywords=beginner%2C%20&keywords_type=allwords&list_id=130748&query_format=advanced&resolution=---
+.. _bug tracker: https://bugs.llvm.org
+.. _clang-format-diff.py: https://reviews.llvm.org/source/clang/browse/cfe/trunk/tools/clang-format/clang-format-diff.py
+.. _git-clang-format: https://reviews.llvm.org/source/clang/browse/cfe/trunk/tools/clang-format/git-clang-format
+.. _LLVM's Phabricator: https://reviews.llvm.org/
+.. _LLVM's Open Projects page: https://llvm.org/OpenProjects.html#what
+.. _LLVM Developer's mailing list: http://lists.llvm.org/mailman/listinfo/llvm-dev
diff --git a/docs/Coroutines.rst b/docs/Coroutines.rst
index 1bea04ebdd2a..f3667585c6c2 100644
--- a/docs/Coroutines.rst
+++ b/docs/Coroutines.rst
@@ -880,6 +880,32 @@ Example:
%phi = phi i8* [ null, %entry ], [ %alloc, %coro.alloc ]
%frame = call i8* @llvm.coro.begin(token %id, i8* %phi)
+.. _coro.noop:
+
+'llvm.coro.noop' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+ declare i8* @llvm.coro.noop()
+
+Overview:
+"""""""""
+
+The '``llvm.coro.noop``' intrinsic returns an address of the coroutine frame of
+a coroutine that does nothing when resumed or destroyed.
+
+Arguments:
+""""""""""
+
+None
+
+Semantics:
+""""""""""
+
+This intrinsic is lowered to refer to a private constant coroutine frame. The
+resume and destroy handlers for this frame are empty functions that do nothing.
+Note that in different translation units llvm.coro.noop may return different pointers.
+
.. _coro.frame:
'llvm.coro.frame' Intrinsic
diff --git a/docs/Docker.rst b/docs/Docker.rst
index e606e1b71a2c..7862c5ab7a3d 100644
--- a/docs/Docker.rst
+++ b/docs/Docker.rst
@@ -53,24 +53,15 @@ serve as a basis for anyone who wants to create their own Docker image with
LLVM components, compiled from sources. The sources are checked out from the
upstream svn repository when building the image.
-Inside each subfolder we host Dockerfiles for two images:
-
-- ``build/`` image is used to compile LLVM, it installs a system compiler and all
- build dependencies of LLVM. After the build process is finished, the build
- image will have an archive with compiled components at ``/tmp/clang.tar.gz``.
-- ``release/`` image usually only contains LLVM components, compiled by the
- ``build/`` image, and also libstdc++ and binutils to make image minimally
- useful for C++ development. The assumption is that you usually want clang to
- be one of the provided components.
-
-To build both of those images, use ``build_docker_image.sh`` script.
-It will checkout LLVM sources and build clang in the ``build`` container, copy results
-of the build to the local filesystem and then build the ``release`` container using
-those. The ``build_docker_image.sh`` accepts a list of LLVM repositories to
-checkout, and arguments for CMake invocation.
+The resulting image contains only the requested LLVM components and a few extra
+packages to make the image minimally useful for C++ development, e.g. libstdc++
+and binutils.
+
+The interface to run the build is ``build_docker_image.sh`` script. It accepts a
+list of LLVM repositories to checkout and arguments for CMake invocation.
If you want to write your own docker image, start with an ``example/`` subfolder.
-It provides incomplete Dockerfiles with (very few) FIXMEs explaining the steps
+It provides an incomplete Dockerfile with (very few) FIXMEs explaining the steps
you need to take in order to make your Dockerfiles functional.
Usage
@@ -110,10 +101,10 @@ this command will do that:
-DBOOTSTRAP_CMAKE_BUILD_TYPE=Release \
-DCLANG_ENABLE_BOOTSTRAP=ON -DCLANG_BOOTSTRAP_TARGETS="install-clang;install-clang-headers"
-This will produce two images, a release image ``clang-debian8:staging`` and a
-build image ``clang-debian8-build:staging`` from the latest upstream revision.
-After the image is built you can run bash inside a container based on your
-image like this:
+This will produce a new image ``clang-debian8:staging`` from the latest
+upstream revision.
+After the image is built you can run bash inside a container based on your image
+like this:
.. code-block:: bash
@@ -181,19 +172,14 @@ debian8-based image using the latest ``google/stable`` sources for you:
Minimizing docker image size
============================
-Due to Docker restrictions we use two images (i.e., build and release folders)
-for the release image to be as small as possible. It's much easier to achieve
-that using two images, because Docker would store a filesystem layer for each
-command in the Dockerfile, i.e. if you install some packages in one command,
-then remove those in a separate command, the size of the resulting image will
-still be proportinal to the size of an image with installed packages.
-Therefore, we strive to provide a very simple release image which only copies
-compiled clang and does not do anything else.
-
-Docker 1.13 added a ``--squash`` flag that allows to flatten the layers of the
-image, i.e. remove the parts that were actually deleted. That is an easier way
-to produce the smallest images possible by using just a single image. We do not
-use it because as of today the flag is in experimental stage and not everyone
-may have the latest docker version available. When the flag is out of
-experimental stage, we should investigate replacing two images approach with
-just a single image, built using ``--squash`` flag.
+Due to how Docker's filesystem works, all intermediate writes are persisted in
+the resulting image, even if they are removed in the following commands.
+To minimize the resulting image size we use `multi-stage Docker builds
+<https://docs.docker.com/develop/develop-images/multistage-build/>`_.
+Internally Docker builds two images. The first image does all the work: installs
+build dependencies, checks out LLVM source code, compiles LLVM, etc.
+The first image is only used during build and does not have a descriptive name,
+i.e. it is only accessible via the hash value after the build is finished.
+The second image is our resulting image. It contains only the built binaries
+and not any build dependencies. It is also accessible via a descriptive name
+(specified by -d and -t flags).
diff --git a/docs/ExceptionHandling.rst b/docs/ExceptionHandling.rst
index ff8b6456c2bc..18ff53cd3b6f 100644
--- a/docs/ExceptionHandling.rst
+++ b/docs/ExceptionHandling.rst
@@ -365,7 +365,7 @@ abstract interface.
When used in the native Windows C++ exception handling implementation, this
intrinsic serves as a placeholder to delimit code before a catch handler is
-outlined. When the handler is is outlined, this intrinsic will be replaced
+outlined. When the handler is outlined, this intrinsic will be replaced
by instructions that retrieve the exception object pointer from the frame
allocation block.
@@ -839,3 +839,66 @@ or ``catchswitch`` to unwind.
Finally, the funclet pads' unwind destinations cannot form a cycle. This
ensures that EH lowering can construct "try regions" with a tree-like
structure, which funclet-based personalities may require.
+
+Exception Handling support on the target
+=================================================
+
+In order to support exception handling on particular target, there are a few
+items need to be implemented.
+
+* CFI directives
+
+ First, you have to assign each target register with a unique DWARF number.
+ Then in ``TargetFrameLowering``'s ``emitPrologue``, you have to emit `CFI
+ directives <https://sourceware.org/binutils/docs/as/CFI-directives.html>`_
+ to specify how to calculate the CFA (Canonical Frame Address) and how register
+ is restored from the address pointed by the CFA with an offset. The assembler
+ is instructed by CFI directives to build ``.eh_frame`` section, which is used
+ by th unwinder to unwind stack during exception handling.
+
+* ``getExceptionPointerRegister`` and ``getExceptionSelectorRegister``
+
+ ``TargetLowering`` must implement both functions. The *personality function*
+ passes the *exception structure* (a pointer) and *selector value* (an integer)
+ to the landing pad through the registers specified by ``getExceptionPointerRegister``
+ and ``getExceptionSelectorRegister`` respectively. On most platforms, they
+ will be GPRs and will be the same as the ones specified in the calling convention.
+
+* ``EH_RETURN``
+
+ The ISD node represents the undocumented GCC extension ``__builtin_eh_return (offset, handler)``,
+ which adjusts the stack by offset and then jumps to the handler. ``__builtin_eh_return``
+ is used in GCC unwinder (`libgcc <https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html>`_),
+ but not in LLVM unwinder (`libunwind <https://clang.llvm.org/docs/Toolchain.html#unwind-library>`_).
+ If you are on the top of ``libgcc`` and have particular requirement on your target,
+ you have to handle ``EH_RETURN`` in ``TargetLowering``.
+
+If you don't leverage the existing runtime (``libstdc++`` and ``libgcc``),
+you have to take a look on `libc++ <https://libcxx.llvm.org/>`_ and
+`libunwind <https://clang.llvm.org/docs/Toolchain.html#unwind-library>`_
+to see what have to be done there. For ``libunwind``, you have to do the following
+
+* ``__libunwind_config.h``
+
+ Define macros for your target.
+
+* ``include/libunwind.h``
+
+ Define enum for the target registers.
+
+* ``src/Registers.hpp``
+
+ Define ``Registers`` class for your target, implement setter and getter functions.
+
+* ``src/UnwindCursor.hpp``
+
+ Define ``dwarfEncoding`` and ``stepWithCompactEncoding`` for your ``Registers``
+ class.
+
+* ``src/UnwindRegistersRestore.S``
+
+ Write an assembly function to restore all your target registers from the memory.
+
+* ``src/UnwindRegistersSave.S``
+
+ Write an assembly function to save all your target registers on the memory.
diff --git a/docs/Extensions.rst b/docs/Extensions.rst
index 32eeadd78ba6..fac2289921ec 100644
--- a/docs/Extensions.rst
+++ b/docs/Extensions.rst
@@ -182,6 +182,30 @@ which gnu as does not support. For gas compatibility, sections with a name
starting with ".debug" are implicitly discardable.
+ARM64/COFF-Dependent
+--------------------
+
+Relocations
+^^^^^^^^^^^
+
+The following additional symbol variants are supported:
+
+**:secrel_lo12:** generates a relocation that corresponds to the COFF relocation
+types ``IMAGE_REL_ARM64_SECREL_LOW12A`` or ``IMAGE_REL_ARM64_SECREL_LOW12L``.
+
+**:secrel_hi12:** generates a relocation that corresponds to the COFF relocation
+type ``IMAGE_REL_ARM64_SECREL_HIGH12A``.
+
+.. code-block:: gas
+
+ add x0, x0, :secrel_hi12:symbol
+ ldr x0, [x0, :secrel_lo12:symbol]
+
+ add x1, x1, :secrel_hi12:symbol
+ add x1, x1, :secrel_lo12:symbol
+ ...
+
+
ELF-Dependent
-------------
@@ -221,6 +245,186 @@ which is equivalent to just
.section .foo,"a",@progbits
.section .bar,"ao",@progbits,.foo
+``.linker-options`` Section (linker options)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In order to support passing linker options from the frontend to the linker, a
+special section of type ``SHT_LLVM_LINKER_OPTIONS`` (usually named
+``.linker-options`` though the name is not significant as it is identified by
+the type). The contents of this section is a simple pair-wise encoding of
+directives for consideration by the linker. The strings are encoded as standard
+null-terminated UTF-8 strings. They are emitted inline to avoid having the
+linker traverse the object file for retrieving the value. The linker is
+permitted to not honour the option and instead provide a warning/error to the
+user that the requested option was not honoured.
+
+The section has type ``SHT_LLVM_LINKER_OPTIONS`` and has the ``SHF_EXCLUDE``
+flag to ensure that the section is treated as opaque by linkers which do not
+support the feature and will not be emitted into the final linked binary.
+
+This would be equivalent to the follow raw assembly:
+
+.. code-block:: gas
+
+ .section ".linker-options","e",@llvm_linker_options
+ .asciz "option 1"
+ .asciz "value 1"
+ .asciz "option 2"
+ .asciz "value 2"
+
+The following directives are specified:
+
+ - lib
+
+ The parameter identifies a library to be linked against. The library will
+ be looked up in the default and any specified library search paths
+ (specified to this point).
+
+ - libpath
+
+ The paramter identifies an additional library search path to be considered
+ when looking up libraries after the inclusion of this option.
+
+``SHT_LLVM_CALL_GRAPH_PROFILE`` Section (Call Graph Profile)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This section is used to pass a call graph profile to the linker which can be
+used to optimize the placement of sections. It contains a sequence of
+(from symbol, to symbol, weight) tuples.
+
+It shall have a type of ``SHT_LLVM_CALL_GRAPH_PROFILE`` (0x6fff4c02), shall
+have the ``SHF_EXCLUDE`` flag set, the ``sh_link`` member shall hold the section
+header index of the associated symbol table, and shall have a ``sh_entsize`` of
+16. It should be named ``.llvm.call-graph-profile``.
+
+The contents of the section shall be a sequence of ``Elf_CGProfile`` entries.
+
+.. code-block:: c
+
+ typedef struct {
+ Elf_Word cgp_from;
+ Elf_Word cgp_to;
+ Elf_Xword cgp_weight;
+ } Elf_CGProfile;
+
+cgp_from
+ The symbol index of the source of the edge.
+
+cgp_to
+ The symbol index of the destination of the edge.
+
+cgp_weight
+ The weight of the edge.
+
+This is represented in assembly as:
+
+.. code-block:: gas
+
+ .cg_profile from, to, 42
+
+``.cg_profile`` directives are processed at the end of the file. It is an error
+if either ``from`` or ``to`` are undefined temporary symbols. If either symbol
+is a temporary symbol, then the section symbol is used instead. If either
+symbol is undefined, then that symbol is defined as if ``.weak symbol`` has been
+written at the end of the file. This forces the symbol to show up in the symbol
+table.
+
+``SHT_LLVM_ADDRSIG`` Section (address-significance table)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This section is used to mark symbols as address-significant, i.e. the address
+of the symbol is used in a comparison or leaks outside the translation unit. It
+has the same meaning as the absence of the LLVM attributes ``unnamed_addr``
+and ``local_unnamed_addr``.
+
+Any sections referred to by symbols that are not marked as address-significant
+in any object file may be safely merged by a linker without breaking the
+address uniqueness guarantee provided by the C and C++ language standards.
+
+The contents of the section are a sequence of ULEB128-encoded integers
+referring to the symbol table indexes of the address-significant symbols.
+
+There are two associated assembly directives:
+
+.. code-block:: gas
+
+ .addrsig
+
+This instructs the assembler to emit an address-significance table. Without
+this directive, all symbols are considered address-significant.
+
+.. code-block:: gas
+
+ .addrsig_sym sym
+
+This marks ``sym`` as address-significant.
+
+CodeView-Dependent
+------------------
+
+``.cv_file`` Directive
+^^^^^^^^^^^^^^^^^^^^^^
+Syntax:
+ ``.cv_file`` *FileNumber FileName* [ *checksum* ] [ *checksumkind* ]
+
+``.cv_func_id`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^
+Introduces a function ID that can be used with ``.cv_loc``.
+
+Syntax:
+ ``.cv_func_id`` *FunctionId*
+
+``.cv_inline_site_id`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Introduces a function ID that can be used with ``.cv_loc``. Includes
+``inlined at`` source location information for use in the line table of the
+caller, whether the caller is a real function or another inlined call site.
+
+Syntax:
+ ``.cv_inline_site_id`` *FunctionId* ``within`` *Function* ``inlined_at`` *FileNumber Line* [ *Colomn* ]
+
+``.cv_loc`` Directive
+^^^^^^^^^^^^^^^^^^^^^
+The first number is a file number, must have been previously assigned with a
+``.file`` directive, the second number is the line number and optionally the
+third number is a column position (zero if not specified). The remaining
+optional items are ``.loc`` sub-directives.
+
+Syntax:
+ ``.cv_loc`` *FunctionId FileNumber* [ *Line* ] [ *Column* ] [ *prologue_end* ] [ ``is_stmt`` *value* ]
+
+``.cv_linetable`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Syntax:
+ ``.cv_linetable`` *FunctionId* ``,`` *FunctionStart* ``,`` *FunctionEnd*
+
+``.cv_inline_linetable`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Syntax:
+ ``.cv_inline_linetable`` *PrimaryFunctionId* ``,`` *FileNumber Line FunctionStart FunctionEnd*
+
+``.cv_def_range`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The *GapStart* and *GapEnd* options may be repeated as needed.
+
+Syntax:
+ ``.cv_def_range`` *RangeStart RangeEnd* [ *GapStart GapEnd* ] ``,`` *bytes*
+
+``.cv_stringtable`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``.cv_filechecksums`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``.cv_filechecksumoffset`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Syntax:
+ ``.cv_filechecksumoffset`` *FileNumber*
+
+``.cv_fpo_data`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+Syntax:
+ ``.cv_fpo_data`` *procsym*
Target Specific Behaviour
=========================
diff --git a/docs/GarbageCollection.rst b/docs/GarbageCollection.rst
index 4ef174410b78..e4f5802f8879 100644
--- a/docs/GarbageCollection.rst
+++ b/docs/GarbageCollection.rst
@@ -433,7 +433,7 @@ data structure, but there are only 20 lines of meaningful code.)
.. code-block:: c++
- /// @brief The map for a single function's stack frame. One of these is
+ /// The map for a single function's stack frame. One of these is
/// compiled as constant data into the executable for each function.
///
/// Storage of metadata values is elided if the %metadata parameter to
@@ -444,7 +444,7 @@ data structure, but there are only 20 lines of meaningful code.)
const void *Meta[0]; //< Metadata for each root.
};
- /// @brief A link in the dynamic shadow stack. One of these is embedded in
+ /// A link in the dynamic shadow stack. One of these is embedded in
/// the stack frame of each function on the call stack.
struct StackEntry {
StackEntry *Next; //< Link to next stack entry (the caller's).
@@ -452,13 +452,13 @@ data structure, but there are only 20 lines of meaningful code.)
void *Roots[0]; //< Stack roots (in-place array).
};
- /// @brief The head of the singly-linked list of StackEntries. Functions push
+ /// The head of the singly-linked list of StackEntries. Functions push
/// and pop onto this in their prologue and epilogue.
///
/// Since there is only a global list, this technique is not threadsafe.
StackEntry *llvm_gc_root_chain;
- /// @brief Calls Visitor(root, meta) for each GC root on the stack.
+ /// Calls Visitor(root, meta) for each GC root on the stack.
/// root and meta are exactly the values passed to
/// @llvm.gcroot.
///
@@ -1032,7 +1032,7 @@ a realistic example:
// Emit PointCount.
OS.AddComment("safe point count");
- AP.EmitInt32(MD.size());
+ AP.emitInt32(MD.size());
// And each safe point...
for (GCFunctionInfo::iterator PI = MD.begin(),
@@ -1049,18 +1049,18 @@ a realistic example:
// Emit the stack frame size.
OS.AddComment("stack frame size (in words)");
- AP.EmitInt32(MD.getFrameSize() / IntPtrSize);
+ AP.emitInt32(MD.getFrameSize() / IntPtrSize);
// Emit stack arity, i.e. the number of stacked arguments.
unsigned RegisteredArgs = IntPtrSize == 4 ? 5 : 6;
unsigned StackArity = MD.getFunction().arg_size() > RegisteredArgs ?
MD.getFunction().arg_size() - RegisteredArgs : 0;
OS.AddComment("stack arity");
- AP.EmitInt32(StackArity);
+ AP.emitInt32(StackArity);
// Emit the number of live roots in the function.
OS.AddComment("live root count");
- AP.EmitInt32(MD.live_size(PI));
+ AP.emitInt32(MD.live_size(PI));
// And for each live root...
for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI),
@@ -1068,7 +1068,7 @@ a realistic example:
LI != LE; ++LI) {
// Emit live root's offset within the stack frame.
OS.AddComment("stack index (offset / wordsize)");
- AP.EmitInt32(LI->StackOffset);
+ AP.emitInt32(LI->StackOffset);
}
}
}
diff --git a/docs/GettingStarted.rst b/docs/GettingStarted.rst
index ed2e936d1360..7cfd67ce7157 100644
--- a/docs/GettingStarted.rst
+++ b/docs/GettingStarted.rst
@@ -200,7 +200,7 @@ will need about 1-3 GB of space. A full build of LLVM and Clang will need aroun
is so large because of all the debugging information and the fact that the
libraries are statically linked into multiple tools).
-If you you are space-constrained, you can build only selected tools or only
+If you are space-constrained, you can build only selected tools or only
selected targets. The Release build requires considerably less space.
The LLVM suite *may* compile on other platforms, but it is not guaranteed to do
@@ -324,7 +324,7 @@ However, some Linux distributions and some other or older BSDs sometimes have
extremely old versions of GCC. These steps attempt to help you upgrade you
compiler even on such a system. However, if at all possible, we encourage you
to use a recent version of a distribution with a modern system compiler that
-meets these requirements. Note that it is tempting to to install a prior
+meets these requirements. Note that it is tempting to install a prior
version of Clang and libc++ to be the host compiler, however libc++ was not
well tested or set up to build on Linux until relatively recently. As
a consequence, this guide suggests just using libstdc++ and a modern GCC as the
@@ -492,8 +492,16 @@ Git Mirror
Git mirrors are available for a number of LLVM subprojects. These mirrors sync
automatically with each Subversion commit and contain all necessary git-svn
marks (so, you can recreate git-svn metadata locally). Note that right now
-mirrors reflect only ``trunk`` for each project. You can do the read-only Git
-clone of LLVM via:
+mirrors reflect only ``trunk`` for each project.
+
+.. note::
+
+ On Windows, first you will want to do ``git config --global core.autocrlf
+ false`` before you clone. This goes a long way toward ensuring that
+ line-endings will be handled correctly (the LLVM project mostly uses Linux
+ line-endings).
+
+You can do the read-only Git clone of LLVM via:
.. code-block:: console
@@ -912,7 +920,7 @@ where they are built (a Canadian Cross build). To generate build files for
cross-compiling CMake provides a variable ``CMAKE_TOOLCHAIN_FILE`` which can
define compiler flags and variables used during the CMake test operations.
-The result of such a build is executables that are not runnable on on the build
+The result of such a build is executables that are not runnable on the build
host but can be executed on the target. As an example the following CMake
invocation can generate build files targeting iOS. This will work on Mac OS X
with the latest Xcode:
diff --git a/docs/GoldPlugin.rst b/docs/GoldPlugin.rst
index 78d38ccb32bd..b429eadd8ef9 100644
--- a/docs/GoldPlugin.rst
+++ b/docs/GoldPlugin.rst
@@ -6,13 +6,16 @@ Introduction
============
Building with link time optimization requires cooperation from
-the system linker. LTO support on Linux systems requires that you use the
-`gold linker`_ or ld.bfd from binutils >= 2.21.51.0.2, as they support LTO via plugins. This is the same mechanism
+the system linker. LTO support on Linux systems is available via the
+`gold linker`_ which supports LTO via plugins. This is the same mechanism
used by the `GCC LTO`_ project.
The LLVM gold plugin implements the gold plugin interface on top of
:ref:`libLTO`. The same plugin can also be used by other tools such as
-``ar`` and ``nm``.
+``ar`` and ``nm``. Note that ld.bfd from binutils version 2.21.51.0.2
+and above also supports LTO via plugins. However, usage of the LLVM
+gold plugin with ld.bfd is not tested and therefore not officially
+supported or recommended.
.. _`gold linker`: http://sourceware.org/binutils
.. _`GCC LTO`: http://gcc.gnu.org/wiki/LinkTimeOptimization
@@ -23,25 +26,44 @@ The LLVM gold plugin implements the gold plugin interface on top of
How to build it
===============
-Check for plugin support by running ``/usr/bin/ld -plugin``. If it complains
-"missing argument" then you have plugin support. If not, such as an "unknown option"
-error then you will either need to build gold or install a recent version
-of ld.bfd with plugin support and then build gold plugin.
+You need to have gold with plugin support and build the LLVMgold plugin.
+The gold linker is installed as ld.gold. To see whether gold is the default
+on your system, run ``/usr/bin/ld -v``. It will report "GNU
+gold" or else "GNU ld" if not. If gold is already installed at
+``/usr/bin/ld.gold``, one option is to simply make that the default by
+backing up your existing ``/usr/bin/ld`` and creating a symbolic link
+with ``ln -s /usr/bin/ld.gold /usr/bin/ld``. Alternatively, you can build
+with clang's ``-fuse-ld=gold`` or add ``-fuse-ld=gold`` to LDFLAGS, which will
+cause the clang driver to invoke ``/usr/bin/ld.gold`` directly.
-* Download, configure and build ld.bfd with plugin support:
+If you have gold installed, check for plugin support by running
+``/usr/bin/ld.gold -plugin``. If it complains "missing argument" then
+you have plugin support. If not, and you get an error such as "unknown option",
+then you will either need to build gold or install a version with plugin
+support.
+
+* Download, configure and build gold with plugin support:
.. code-block:: bash
$ git clone --depth 1 git://sourceware.org/git/binutils-gdb.git binutils
$ mkdir build
$ cd build
- $ ../binutils/configure --disable-werror # ld.bfd includes plugin support by default
- $ make all-ld
+ $ ../binutils/configure --enable-gold --enable-plugins --disable-werror
+ $ make all-gold
- That should leave you with ``build/ld/ld-new`` which supports
+ That should leave you with ``build/gold/ld-new`` which supports
the ``-plugin`` option. Running ``make`` will additionally build
``build/binutils/ar`` and ``nm-new`` binaries supporting plugins.
+ Once you're ready to switch to using gold, backup your existing
+ ``/usr/bin/ld`` then replace it with ``ld-new``. Alternatively, install
+ in ``/usr/bin/ld.gold`` and use ``-fuse-ld=gold`` as described earlier.
+
+ Optionally, add ``--enable=gold=default`` to the above configure invocation
+ to automatically install the newly built gold as the default linker with
+ ``make install``.
+
* Build the LLVMgold plugin. Run CMake with
``-DLLVM_BINUTILS_INCDIR=/path/to/binutils/include``. The correct include
path will contain the file ``plugin-api.h``.
@@ -49,19 +71,12 @@ of ld.bfd with plugin support and then build gold plugin.
Usage
=====
-The linker takes a ``-plugin`` option that points to the path of
-the plugin ``.so`` file. To find out what link command ``gcc``
-would run in a given situation, run ``gcc -v [...]`` and
-look for the line where it runs ``collect2``. Replace that with
-``ld-new -plugin /path/to/LLVMgold.so`` to test it out. Once you're
-ready to switch to using gold, backup your existing ``/usr/bin/ld``
-then replace it with ``ld-new``.
-
You should produce bitcode files from ``clang`` with the option
``-flto``. This flag will also cause ``clang`` to look for the gold plugin in
the ``lib`` directory under its prefix and pass the ``-plugin`` option to
-``ld``. It will not look for an alternate linker, which is why you need
-gold to be the installed system linker in your path.
+``ld``. It will not look for an alternate linker without ``-fuse-ld=gold``,
+which is why you otherwise need gold to be the installed system linker in
+your path.
``ar`` and ``nm`` also accept the ``-plugin`` option and it's possible to
to install ``LLVMgold.so`` to ``/usr/lib/bfd-plugins`` for a seamless setup.
diff --git a/docs/HowToSubmitABug.rst b/docs/HowToSubmitABug.rst
index 25cb2c8c80d3..7881a6e8dcc3 100644
--- a/docs/HowToSubmitABug.rst
+++ b/docs/HowToSubmitABug.rst
@@ -38,7 +38,7 @@ Crashing Bugs
More often than not, bugs in the compiler cause it to crash---often due to
an assertion failure of some sort. The most important piece of the puzzle
-is to figure out if it is crashing in the GCC front-end or if it is one of
+is to figure out if it is crashing in the Clang front-end or if it is one of
the LLVM libraries (e.g. the optimizer or code generator) that has
problems.
diff --git a/docs/LangRef.rst b/docs/LangRef.rst
index 589786255af7..38bed417104d 100644
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -80,7 +80,7 @@ identifiers, for different purposes:
characters may be escaped using ``"\xx"`` where ``xx`` is the ASCII
code for the character in hexadecimal. In this way, any character can
be used in a name value, even quotes themselves. The ``"\01"`` prefix
- can be used on global variables to suppress mangling.
+ can be used on global values to suppress mangling.
#. Unnamed values are represented as an unsigned numeric value with
their prefix. For example, ``%12``, ``@2``, ``%44``.
#. Constants, which are described in the section Constants_ below.
@@ -324,9 +324,9 @@ added in the future:
limitations:
- On *X86-32* only supports up to 4 bit type parameters. No
- floating point types are supported.
+ floating-point types are supported.
- On *X86-64* only supports up to 10 bit type parameters and 6
- floating point parameters.
+ floating-point parameters.
This calling convention supports `tail call
optimization <CodeGenerator.html#id80>`_ but requires both the
@@ -883,8 +883,8 @@ The selection kind must be one of the following:
The linker may choose any COMDAT key but the sections must contain the
same amount of data.
-Note that the Mach-O platform doesn't support COMDATs and ELF only supports
-``any`` as a selection kind.
+Note that the Mach-O platform doesn't support COMDATs, and ELF and WebAssembly
+only support ``any`` as a selection kind.
Here is an example of a COMDAT group where a function will only be selected if
the COMDAT key's section is the largest:
@@ -1048,7 +1048,7 @@ Currently, only the following parameter attributes are defined:
When the call site is reached, the argument allocation must have
been the most recent stack allocation that is still live, or the
- results are undefined. It is possible to allocate additional stack
+ behavior is undefined. It is possible to allocate additional stack
space after an argument allocation and before its call site, but it
must be cleared off with :ref:`llvm.stackrestore
<int_stackrestore>`.
@@ -1064,6 +1064,8 @@ Currently, only the following parameter attributes are defined:
to trap and to be properly aligned. This is not a valid attribute
for return values.
+.. _attr_align:
+
``align <n>``
This indicates that the pointer value may be assumed by the optimizer to
have the specified alignment.
@@ -1120,9 +1122,8 @@ Currently, only the following parameter attributes are defined:
``nonnull``
This indicates that the parameter or return pointer is not null. This
attribute may only be applied to pointer typed parameters. This is not
- checked or enforced by LLVM, the caller must ensure that the pointer
- passed in is non-null, or the callee must ensure that the returned pointer
- is non-null.
+ checked or enforced by LLVM; if the parameter or return pointer is null,
+ the behavior is undefined.
``dereferenceable(<n>)``
This indicates that the parameter or return pointer is dereferenceable. This
@@ -1385,11 +1386,13 @@ example:
``inaccessiblememonly``
This attribute indicates that the function may only access memory that
is not accessible by the module being compiled. This is a weaker form
- of ``readnone``.
+ of ``readnone``. If the function reads or writes other memory, the
+ behavior is undefined.
``inaccessiblemem_or_argmemonly``
This attribute indicates that the function may only access memory that is
either not accessible by the module being compiled, or is pointed to
- by its pointer arguments. This is a weaker form of ``argmemonly``
+ by its pointer arguments. This is a weaker form of ``argmemonly``. If the
+ function reads or writes other memory, the behavior is undefined.
``inlinehint``
This attribute indicates that the source code contained a hint that
inlining this function is desirable (such as the "inline" keyword in
@@ -1432,7 +1435,7 @@ example:
internal linkage and only has one call site, so the original
call is dead after inlining.
``noimplicitfloat``
- This attributes disables implicit floating point instructions.
+ This attributes disables implicit floating-point instructions.
``noinline``
This attribute indicates that the inliner should never inline this
function in any situation. This attribute may not be used together
@@ -1459,6 +1462,17 @@ example:
trap or generate asynchronous exceptions. Exception handling schemes
that are recognized by LLVM to handle asynchronous exceptions, such
as SEH, will still provide their implementation defined semantics.
+``"null-pointer-is-valid"``
+ If ``"null-pointer-is-valid"`` is set to ``"true"``, then ``null`` address
+ in address-space 0 is considered to be a valid address for memory loads and
+ stores. Any analysis or optimization should not treat dereferencing a
+ pointer to ``null`` as undefined behavior in this function.
+ Note: Comparing address of a global variable to ``null`` may still
+ evaluate to false because of a limitation in querying this attribute inside
+ constant expressions.
+``optforfuzzing``
+ This attribute indicates that this function should be optimized
+ for maximum fuzzing signal.
``optnone``
This function attribute indicates that most optimization passes will skip
this function, with the exception of interprocedural optimization passes.
@@ -1529,6 +1543,10 @@ example:
On an argument, this attribute indicates that the function does not
dereference that pointer argument, even though it may read or write the
memory that the pointer points to if accessed through other pointers.
+
+ If a readnone function reads or writes memory visible to the program, or
+ has other side-effects, the behavior is undefined. If a function reads from
+ or writes to a readnone pointer argument, the behavior is undefined.
``readonly``
On a function, this attribute indicates that the function does not write
through any pointer arguments (including ``byval`` arguments) or otherwise
@@ -1544,6 +1562,10 @@ example:
On an argument, this attribute indicates that the function does not write
through this pointer argument, even though it may write to the memory that
the pointer points to.
+
+ If a readonly function writes memory visible to the program, or
+ has other side-effects, the behavior is undefined. If a function writes to
+ a readonly pointer argument, the behavior is undefined.
``"stack-probe-size"``
This attribute controls the behavior of stack probes: either
the ``"probe-stack"`` attribute, or ABI-required stack probes, if any.
@@ -1559,6 +1581,8 @@ example:
inlined into a function that has no ``"stack-probe-size"`` attribute
at all, the resulting function has the ``"stack-probe-size"`` attribute
of the callee.
+``"no-stack-arg-probe"``
+ This attribute disables ABI-required stack probes, if any.
``writeonly``
On a function, this attribute indicates that the function may write to but
does not read from memory.
@@ -1566,14 +1590,22 @@ example:
On an argument, this attribute indicates that the function may write to but
does not read through this pointer argument (even though it may read from
the memory that the pointer points to).
+
+ If a writeonly function reads memory visible to the program, or
+ has other side-effects, the behavior is undefined. If a function reads
+ from a writeonly pointer argument, the behavior is undefined.
``argmemonly``
This attribute indicates that the only memory accesses inside function are
loads and stores from objects pointed to by its pointer-typed arguments,
with arbitrary offsets. Or in other words, all memory operations in the
function can refer to memory only using pointers based on its function
arguments.
+
Note that ``argmemonly`` can be used together with ``readonly`` attribute
in order to specify that function reads only from its arguments.
+
+ If an argmemonly function reads or writes memory other than the pointer
+ arguments, or has other side-effects, the behavior is undefined.
``returns_twice``
This attribute indicates that this function can return twice. The C
``setjmp`` is an example of such a function. The compiler disables
@@ -1680,9 +1712,9 @@ example:
resulting function will have an ``sspstrong`` attribute.
``strictfp``
This attribute indicates that the function was called from a scope that
- requires strict floating point semantics. LLVM will not attempt any
- optimizations that require assumptions about the floating point rounding
- mode or that might alter the state of floating point status flags that
+ requires strict floating-point semantics. LLVM will not attempt any
+ optimizations that require assumptions about the floating-point rounding
+ mode or that might alter the state of floating-point status flags that
might otherwise be set or cleared by calling this function.
``"thunk"``
This attribute indicates that the function will delegate to some other
@@ -1695,6 +1727,17 @@ example:
show that no exceptions passes by it. This is normally the case for
the ELF x86-64 abi, but it can be disabled for some compilation
units.
+``nocf_check``
+ This attribute indicates that no control-flow check will be performed on
+ the attributed entity. It disables -fcf-protection=<> for a specific
+ entity to fine grain the HW control flow protection mechanism. The flag
+ is target independent and currently appertains to a function or function
+ pointer.
+``shadowcallstack``
+ This attribute indicates that the ShadowCallStack checks are enabled for
+ the function. The instrumentation checks that the return address for the
+ function has not changed between the function prolog and eiplog. It is
+ currently x86_64-specific.
.. _glattrs:
@@ -1903,13 +1946,22 @@ as follows:
must be a multiple of 8-bits. If omitted, the natural stack
alignment defaults to "unspecified", which does not prevent any
alignment promotions.
+``P<address space>``
+ Specifies the address space that corresponds to program memory.
+ Harvard architectures can use this to specify what space LLVM
+ should place things such as functions into. If omitted, the
+ program memory space defaults to the default address space of 0,
+ which corresponds to a Von Neumann architecture that has code
+ and data in the same space.
``A<address space>``
- Specifies the address space of objects created by '``alloca``'.
+ Specifies the address space of objects created by '``alloca``'.
Defaults to the default address space of 0.
-``p[n]:<size>:<abi>:<pref>``
+``p[n]:<size>:<abi>:<pref>:<idx>``
This specifies the *size* of a pointer and its ``<abi>`` and
- ``<pref>``\erred alignments for address space ``n``. All sizes are in
- bits. The address space, ``n``, is optional, and if not specified,
+ ``<pref>``\erred alignments for address space ``n``. The fourth parameter
+ ``<idx>`` is a size of index that used for address calculation. If not
+ specified, the default index size is equal to the pointer size. All sizes
+ are in bits. The address space, ``n``, is optional, and if not specified,
denotes the default address space 0. The value of ``n`` must be
in the range [1,2^23).
``i<size>:<abi>:<pref>``
@@ -1919,7 +1971,7 @@ as follows:
This specifies the alignment for a vector type of a given bit
``<size>``.
``f<size>:<abi>:<pref>``
- This specifies the alignment for a floating point type of a given bit
+ This specifies the alignment for a floating-point type of a given bit
``<size>``. Only values of ``<size>`` that are supported by the target
will work. 32 (float) and 64 (double) are supported on all targets; 80
or 128 (different flavors of long double) are also supported on some
@@ -1927,17 +1979,22 @@ as follows:
``a:<abi>:<pref>``
This specifies the alignment for an object of aggregate type.
``m:<mangling>``
- If present, specifies that llvm names are mangled in the output. The
+ If present, specifies that llvm names are mangled in the output. Symbols
+ prefixed with the mangling escape character ``\01`` are passed through
+ directly to the assembler without the escape character. The mangling style
options are
* ``e``: ELF mangling: Private symbols get a ``.L`` prefix.
* ``m``: Mips mangling: Private symbols get a ``$`` prefix.
* ``o``: Mach-O mangling: Private symbols get ``L`` prefix. Other
symbols get a ``_`` prefix.
- * ``w``: Windows COFF prefix: Similar to Mach-O, but stdcall and fastcall
- functions also get a suffix based on the frame size.
- * ``x``: Windows x86 COFF prefix: Similar to Windows COFF, but use a ``_``
- prefix for ``__cdecl`` functions.
+ * ``x``: Windows x86 COFF mangling: Private symbols get the usual prefix.
+ Regular C symbols get a ``_`` prefix. Functions with ``__stdcall``,
+ ``__fastcall``, and ``__vectorcall`` have custom mangling that appends
+ ``@N`` where N is the number of bytes used to pass parameters. C++ symbols
+ starting with ``?`` are not mangled in any way.
+ * ``w``: Windows COFF mangling: Similar to ``x``, except that normal C
+ symbols do not receive a ``_`` prefix.
``n<size1>:<size2>:<size3>...``
This specifies a set of native integer widths for the target CPU in
bits. For example, it might contain ``n32`` for 32-bit PowerPC,
@@ -2271,6 +2328,32 @@ or ``syncscope("<target-scope>")`` *synchronizes with* and participates in the
seq\_cst total orderings of other operations that are not marked
``syncscope("singlethread")`` or ``syncscope("<target-scope>")``.
+.. _floatenv:
+
+Floating-Point Environment
+--------------------------
+
+The default LLVM floating-point environment assumes that floating-point
+instructions do not have side effects. Results assume the round-to-nearest
+rounding mode. No floating-point exception state is maintained in this
+environment. Therefore, there is no attempt to create or preserve invalid
+operation (SNaN) or division-by-zero exceptions in these examples:
+
+.. code-block:: llvm
+
+ %A = fdiv 0x7ff0000000000001, %X ; 64-bit SNaN hex value
+ %B = fdiv %X, 0.0
+ Safe:
+ %A = NaN
+ %B = NaN
+
+The benefit of this exception-free assumption is that floating-point
+operations may be speculated freely without any other fast-math relaxations
+to the floating-point model.
+
+Code that requires different behavior than this should use the
+:ref:`Constrained Floating-Point Intrinsics <constrainedfp>`.
+
.. _fastmath:
Fast-Math Flags
@@ -2279,18 +2362,18 @@ Fast-Math Flags
LLVM IR floating-point operations (:ref:`fadd <i_fadd>`,
:ref:`fsub <i_fsub>`, :ref:`fmul <i_fmul>`, :ref:`fdiv <i_fdiv>`,
:ref:`frem <i_frem>`, :ref:`fcmp <i_fcmp>`) and :ref:`call <i_call>`
-may use the following flags to enable otherwise unsafe
+may use the following flags to enable otherwise unsafe
floating-point transformations.
``nnan``
No NaNs - Allow optimizations to assume the arguments and result are not
- NaN. Such optimizations are required to retain defined behavior over
- NaNs, but the value of the result is undefined.
+ NaN. If an argument is a nan, or the result would be a nan, it produces
+ a :ref:`poison value <poisonvalues>` instead.
``ninf``
No Infs - Allow optimizations to assume the arguments and result are not
- +/-Inf. Such optimizations are required to retain defined behavior over
- +/-Inf, but the value of the result is undefined.
+ +/-Inf. If an argument is +/-Inf, or the result would be +/-Inf, it
+ produces a :ref:`poison value <poisonvalues>` instead.
``nsz``
No Signed Zeros - Allow optimizations to treat the sign of a zero
@@ -2306,12 +2389,12 @@ floating-point transformations.
``afn``
Approximate functions - Allow substitution of approximate calculations for
- functions (sin, log, sqrt, etc). See floating-point intrinsic definitions
- for places where this can apply to LLVM's intrinsic math functions.
+ functions (sin, log, sqrt, etc). See floating-point intrinsic definitions
+ for places where this can apply to LLVM's intrinsic math functions.
``reassoc``
- Allow reassociation transformations for floating-point instructions.
- This may dramatically change results in floating point.
+ Allow reassociation transformations for floating-point instructions.
+ This may dramatically change results in floating-point.
``fast``
This flag implies all of the others.
@@ -2500,7 +2583,7 @@ Examples:
.. _t_floating:
-Floating Point Types
+Floating-Point Types
""""""""""""""""""""
.. list-table::
@@ -2510,22 +2593,26 @@ Floating Point Types
- Description
* - ``half``
- - 16-bit floating point value
+ - 16-bit floating-point value
* - ``float``
- - 32-bit floating point value
+ - 32-bit floating-point value
* - ``double``
- - 64-bit floating point value
+ - 64-bit floating-point value
* - ``fp128``
- - 128-bit floating point value (112-bit mantissa)
+ - 128-bit floating-point value (112-bit mantissa)
* - ``x86_fp80``
- - 80-bit floating point value (X87)
+ - 80-bit floating-point value (X87)
* - ``ppc_fp128``
- - 128-bit floating point value (two 64-bits)
+ - 128-bit floating-point value (two 64-bits)
+
+The binary format of half, float, double, and fp128 correspond to the
+IEEE-754-2008 specifications for binary16, binary32, binary64, and binary128
+respectively.
X86_mmx Type
""""""""""""
@@ -2600,7 +2687,7 @@ type. Vector types are considered :ref:`first class <t_firstclass>`.
< <# elements> x <elementtype> >
The number of elements is a constant integer value larger than 0;
-elementtype may be any integer, floating point or pointer type. Vectors
+elementtype may be any integer, floating-point or pointer type. Vectors
of size zero are not allowed.
:Examples:
@@ -2711,7 +2798,7 @@ Here are some examples of multidimensional arrays:
+-----------------------------+----------------------------------------------------------+
| ``[3 x [4 x i32]]`` | 3x4 array of 32-bit integer values. |
+-----------------------------+----------------------------------------------------------+
-| ``[12 x [10 x float]]`` | 12x10 array of single precision floating point values. |
+| ``[12 x [10 x float]]`` | 12x10 array of single precision floating-point values. |
+-----------------------------+----------------------------------------------------------+
| ``[2 x [3 x [4 x i16]]]`` | 2x3x4 array of 16-bit integer values. |
+-----------------------------+----------------------------------------------------------+
@@ -2812,14 +2899,14 @@ Simple Constants
Standard integers (such as '4') are constants of the
:ref:`integer <t_integer>` type. Negative numbers may be used with
integer types.
-**Floating point constants**
- Floating point constants use standard decimal notation (e.g.
+**Floating-point constants**
+ Floating-point constants use standard decimal notation (e.g.
123.421), exponential notation (e.g. 1.23421e+2), or a more precise
hexadecimal notation (see below). The assembler requires the exact
decimal value of a floating-point constant. For example, the
assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating
- decimal in binary. Floating point constants must have a :ref:`floating
- point <t_floating>` type.
+ decimal in binary. Floating-point constants must have a
+ :ref:`floating-point <t_floating>` type.
**Null pointer constants**
The identifier '``null``' is recognized as a null pointer constant
and must be of :ref:`pointer type <t_pointer>`.
@@ -2828,12 +2915,12 @@ Simple Constants
and must be of :ref:`token type <t_token>`.
The one non-intuitive notation for constants is the hexadecimal form of
-floating point constants. For example, the form
+floating-point constants. For example, the form
'``double 0x432ff973cafa8000``' is equivalent to (but harder to read
-than) '``double 4.5e+15``'. The only time hexadecimal floating point
+than) '``double 4.5e+15``'. The only time hexadecimal floating-point
constants are required (and the only time that they are generated by the
-disassembler) is when a floating point constant must be emitted but it
-cannot be represented as a decimal floating point number in a reasonable
+disassembler) is when a floating-point constant must be emitted but it
+cannot be represented as a decimal floating-point number in a reasonable
number of digits. For example, NaN's, infinities, and other special
values are represented in their IEEE hexadecimal format so that assembly
and disassembly do not cause any bits to change in the constants.
@@ -3024,17 +3111,17 @@ uses with" concept would not hold.
.. code-block:: llvm
- %A = fdiv undef, %X
- %B = fdiv %X, undef
+ %A = sdiv undef, %X
+ %B = sdiv %X, undef
Safe:
- %A = undef
+ %A = 0
b: unreachable
These examples show the crucial difference between an *undefined value*
and *undefined behavior*. An undefined value (like '``undef``') is
allowed to have an arbitrary bit-pattern. This means that the ``%A``
-operation can be constant folded to '``undef``', because the '``undef``'
-could be an SNaN, and ``fdiv`` is not (currently) defined on SNaN's.
+operation can be constant folded to '``0``', because the '``undef``'
+could be zero, and zero divided by any value is zero.
However, in the second example, we can make a more aggressive
assumption: because the ``undef`` is allowed to be an arbitrary value,
we are allowed to assume that it could be zero. Since a divide by zero
@@ -3051,11 +3138,11 @@ optimizer can assume that it occurs in dead code.
a: <deleted>
b: unreachable
-These examples reiterate the ``fdiv`` example: a store *of* an undefined
-value can be assumed to not have any effect; we can assume that the
-value is overwritten with bits that happen to match what was already
-there. However, a store *to* an undefined location could clobber
-arbitrary memory, therefore, it has undefined behavior.
+A store *of* an undefined value can be assumed to not have any effect;
+we can assume that the value is overwritten with bits that happen to
+match what was already there. However, a store *to* an undefined
+location could clobber arbitrary memory, therefore, it has undefined
+behavior.
.. _poisonvalues:
@@ -3201,37 +3288,37 @@ The following is the syntax for constant expressions:
``sext (CST to TYPE)``
Perform the :ref:`sext operation <i_sext>` on constants.
``fptrunc (CST to TYPE)``
- Truncate a floating point constant to another floating point type.
+ Truncate a floating-point constant to another floating-point type.
The size of CST must be larger than the size of TYPE. Both types
- must be floating point.
+ must be floating-point.
``fpext (CST to TYPE)``
- Floating point extend a constant to another type. The size of CST
+ Floating-point extend a constant to another type. The size of CST
must be smaller or equal to the size of TYPE. Both types must be
- floating point.
+ floating-point.
``fptoui (CST to TYPE)``
- Convert a floating point constant to the corresponding unsigned
+ Convert a floating-point constant to the corresponding unsigned
integer constant. TYPE must be a scalar or vector integer type. CST
- must be of scalar or vector floating point type. Both CST and TYPE
+ must be of scalar or vector floating-point type. Both CST and TYPE
must be scalars, or vectors of the same number of elements. If the
- value won't fit in the integer type, the results are undefined.
+ value won't fit in the integer type, the result is a
+ :ref:`poison value <poisonvalues>`.
``fptosi (CST to TYPE)``
- Convert a floating point constant to the corresponding signed
+ Convert a floating-point constant to the corresponding signed
integer constant. TYPE must be a scalar or vector integer type. CST
- must be of scalar or vector floating point type. Both CST and TYPE
+ must be of scalar or vector floating-point type. Both CST and TYPE
must be scalars, or vectors of the same number of elements. If the
- value won't fit in the integer type, the results are undefined.
+ value won't fit in the integer type, the result is a
+ :ref:`poison value <poisonvalues>`.
``uitofp (CST to TYPE)``
- Convert an unsigned integer constant to the corresponding floating
- point constant. TYPE must be a scalar or vector floating point type.
- CST must be of scalar or vector integer type. Both CST and TYPE must
- be scalars, or vectors of the same number of elements. If the value
- won't fit in the floating point type, the results are undefined.
+ Convert an unsigned integer constant to the corresponding
+ floating-point constant. TYPE must be a scalar or vector floating-point
+ type. CST must be of scalar or vector integer type. Both CST and TYPE must
+ be scalars, or vectors of the same number of elements.
``sitofp (CST to TYPE)``
- Convert a signed integer constant to the corresponding floating
- point constant. TYPE must be a scalar or vector floating point type.
+ Convert a signed integer constant to the corresponding floating-point
+ constant. TYPE must be a scalar or vector floating-point type.
CST must be of scalar or vector integer type. Both CST and TYPE must
- be scalars, or vectors of the same number of elements. If the value
- won't fit in the floating point type, the results are undefined.
+ be scalars, or vectors of the same number of elements.
``ptrtoint (CST to TYPE)``
Perform the :ref:`ptrtoint operation <i_ptrtoint>` on constants.
``inttoptr (CST to TYPE)``
@@ -3280,7 +3367,7 @@ The following is the syntax for constant expressions:
may be any of the :ref:`binary <binaryops>` or :ref:`bitwise
binary <bitwiseops>` operations. The constraints on operands are
the same as those for the corresponding instruction (e.g. no bitwise
- operations on floating point values are allowed).
+ operations on floating-point values are allowed).
Other Values
============
@@ -3650,8 +3737,8 @@ ARM and ARM's Thumb2 mode:
``d0-d31``, or ``q0-q15``.
- ``x``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s15``,
``d0-d7``, or ``q0-q3``.
-- ``t``: A floating-point/SIMD register, only supports 32-bit values:
- ``s0-s31``.
+- ``t``: A low floating-point/SIMD register: ``s0-s31``, ``d0-d16``, or
+ ``q0-q8``.
ARM's Thumb1 mode:
@@ -3670,8 +3757,8 @@ ARM's Thumb1 mode:
``d0-d31``, or ``q0-q15``.
- ``x``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s15``,
``d0-d7``, or ``q0-q3``.
-- ``t``: A floating-point/SIMD register, only supports 32-bit values:
- ``s0-s31``.
+- ``t``: A low floating-point/SIMD register: ``s0-s31``, ``d0-d16``, or
+ ``q0-q8``.
Hexagon:
@@ -3748,16 +3835,16 @@ PowerPC:
- ``wc``: An individual CR bit in a CR register.
- ``wa``, ``wd``, ``wf``: Any 128-bit VSX vector register, from the full VSX
register set (overlapping both the floating-point and vector register files).
-- ``ws``: A 32 or 64-bit floating point register, from the full VSX register
+- ``ws``: A 32 or 64-bit floating-point register, from the full VSX register
set.
Sparc:
- ``I``: An immediate 13-bit signed integer.
- ``r``: A 32-bit integer register.
-- ``f``: Any floating-point register on SparcV8, or a floating point
+- ``f``: Any floating-point register on SparcV8, or a floating-point
register in the "low" half of the registers on SparcV9.
-- ``e``: Any floating point register. (Same as ``f`` on SparcV8.)
+- ``e``: Any floating-point register. (Same as ``f`` on SparcV8.)
SystemZ:
@@ -3779,7 +3866,7 @@ SystemZ:
address context evaluates as zero).
- ``h``: A 32-bit value in the high part of a 64bit data register
(LLVM-specific)
-- ``f``: A 32, 64, or 128-bit floating point register.
+- ``f``: A 32, 64, or 128-bit floating-point register.
X86:
@@ -4307,7 +4394,11 @@ DISubrange
""""""""""
``DISubrange`` nodes are the elements for ``DW_TAG_array_type`` variants of
-:ref:`DICompositeType`. ``count: -1`` indicates an empty array.
+:ref:`DICompositeType`.
+
+- ``count: -1`` indicates an empty array.
+- ``count: !9`` describes the count with a :ref:`DILocalVariable`.
+- ``count: !11`` describes the count with a :ref:`DIGlobalVariable`.
.. code-block:: llvm
@@ -4315,6 +4406,20 @@ DISubrange
!1 = !DISubrange(count: 5, lowerBound: 1) ; array counting from 1
!2 = !DISubrange(count: -1) ; empty array.
+ ; Scopes used in rest of example
+ !6 = !DIFile(filename: "vla.c", directory: "/path/to/file")
+ !7 = distinct !DICompileUnit(language: DW_LANG_C99, ...
+ !8 = distinct !DISubprogram(name: "foo", scope: !7, file: !6, line: 5, ...
+
+ ; Use of local variable as count value
+ !9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+ !10 = !DILocalVariable(name: "count", scope: !8, file: !6, line: 42, type: !9)
+ !11 = !DISubrange(count !10, lowerBound: 0)
+
+ ; Use of global variable as count value
+ !12 = !DIGlobalVariable(name: "count", scope: !8, file: !6, line: 22, type: !9)
+ !13 = !DISubrange(count !12, lowerBound: 0)
+
.. _DIEnumerator:
DIEnumerator
@@ -4362,6 +4467,8 @@ DINamespace
!0 = !DINamespace(name: "myawesomeproject", scope: !1, file: !2, line: 7)
+.. _DIGlobalVariable:
+
DIGlobalVariable
""""""""""""""""
@@ -4494,7 +4601,7 @@ The current supported vocabulary is limited:
- ``DW_OP_plus_uconst, 93`` adds ``93`` to the working expression.
- ``DW_OP_LLVM_fragment, 16, 8`` specifies the offset and size (``16`` and ``8``
here, respectively) of the variable fragment from the working expression. Note
- that contrary to DW_OP_bit_piece, the offset is describing the the location
+ that contrary to DW_OP_bit_piece, the offset is describing the location
within the described source variable.
- ``DW_OP_swap`` swaps top two stack entries.
- ``DW_OP_xderef`` provides extended dereference mechanism. The entry at the top
@@ -4660,7 +4767,7 @@ As a concrete example, the type descriptor graph for the following program
void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) {
outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0)
outer->inner_a.i = 0; // tag1: (OuterStructTy, IntScalarTy, 12)
- outer->inner_a.f = 0.0; // tag2: (OuterStructTy, IntScalarTy, 16)
+ outer->inner_a.f = 0.0; // tag2: (OuterStructTy, FloatScalarTy, 16)
*f = 0.0; // tag3: (FloatScalarTy, FloatScalarTy, 0)
}
@@ -4825,7 +4932,7 @@ For example,
'``fpmath``' Metadata
^^^^^^^^^^^^^^^^^^^^^
-``fpmath`` metadata may be attached to any instruction of floating point
+``fpmath`` metadata may be attached to any instruction of floating-point
type. It can be used to express the maximum acceptable error in the
result of that instruction, in ULPs, thus potentially allowing the
compiler to use a more efficient but less accurate method of computing
@@ -4851,10 +4958,11 @@ representing the maximum relative error, for example:
``range`` metadata may be attached only to ``load``, ``call`` and ``invoke`` of
integer types. It expresses the possible ranges the loaded value or the value
-returned by the called function at this call site is in. The ranges are
-represented with a flattened list of integers. The loaded value or the value
-returned is known to be in the union of the ranges defined by each consecutive
-pair. Each pair has the following properties:
+returned by the called function at this call site is in. If the loaded or
+returned value is not in the specified range, the behavior is undefined. The
+ranges are represented with a flattened list of integers. The loaded value or
+the value returned is known to be in the union of the ranges defined by each
+consecutive pair. Each pair has the following properties:
- The type must match the type loaded by the instruction.
- The pair ``a,b`` represents the range ``[a,b)``.
@@ -5089,6 +5197,59 @@ For example:
!0 = !{!"llvm.loop.unroll.full"}
+'``llvm.loop.unroll_and_jam``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata is treated very similarly to the ``llvm.loop.unroll`` metadata
+above, but affect the unroll and jam pass. In addition any loop with
+``llvm.loop.unroll`` metadata but no ``llvm.loop.unroll_and_jam`` metadata will
+disable unroll and jam (so ``llvm.loop.unroll`` metadata will be left to the
+unroller, plus ``llvm.loop.unroll.disable`` metadata will disable unroll and jam
+too.)
+
+The metadata for unroll and jam otherwise is the same as for ``unroll``.
+``llvm.loop.unroll_and_jam.enable``, ``llvm.loop.unroll_and_jam.disable`` and
+``llvm.loop.unroll_and_jam.count`` do the same as for unroll.
+``llvm.loop.unroll_and_jam.full`` is not supported. Again these are only hints
+and the normal safety checks will still be performed.
+
+'``llvm.loop.unroll_and_jam.count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata suggests an unroll and jam factor to use, similarly to
+``llvm.loop.unroll.count``. The first operand is the string
+``llvm.loop.unroll_and_jam.count`` and the second operand is a positive integer
+specifying the unroll factor. For example:
+
+.. code-block:: llvm
+
+ !0 = !{!"llvm.loop.unroll_and_jam.count", i32 4}
+
+If the trip count of the loop is less than the unroll count the loop
+will be partially unroll and jammed.
+
+'``llvm.loop.unroll_and_jam.disable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata disables loop unroll and jamming. The metadata has a single
+operand which is the string ``llvm.loop.unroll_and_jam.disable``. For example:
+
+.. code-block:: llvm
+
+ !0 = !{!"llvm.loop.unroll_and_jam.disable"}
+
+'``llvm.loop.unroll_and_jam.enable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata suggests that the loop should be fully unroll and jammed if the
+trip count is known at compile time and partially unrolled if the trip count is
+not known at compile time. The metadata has a single operand which is the
+string ``llvm.loop.unroll_and_jam.enable``. For example:
+
+.. code-block:: llvm
+
+ !0 = !{!"llvm.loop.unroll_and_jam.enable"}
+
'``llvm.loop.licm_versioning.disable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -5231,11 +5392,12 @@ Irreducible loop header weights are typically based on profile data.
'``invariant.group``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The ``invariant.group`` metadata may be attached to ``load``/``store`` instructions.
+The experimental ``invariant.group`` metadata may be attached to
+``load``/``store`` instructions referencing a single metadata with no entries.
The existence of the ``invariant.group`` metadata on the instruction tells
the optimizer that every ``load`` and ``store`` to the same pointer operand
-within the same invariant group can be assumed to load or store the same
-value (but see the ``llvm.invariant.group.barrier`` intrinsic which affects
+can be assumed to load or store the same
+value (but see the ``llvm.launder.invariant.group`` intrinsic which affects
when two pointers are considered the same). Pointers returned by bitcast or
getelementptr with only zero indices are considered the same.
@@ -5251,7 +5413,6 @@ Examples:
%a = load i8, i8* %ptr, !invariant.group !0 ; Can assume that value under %ptr didn't change
call void @foo(i8* %ptr)
- %b = load i8, i8* %ptr, !invariant.group !1 ; Can't assume anything, because group changed
%newPtr = call i8* @getPointer(i8* %ptr)
%c = load i8, i8* %newPtr, !invariant.group !0 ; Can't assume anything, because we only have information about %ptr
@@ -5260,16 +5421,15 @@ Examples:
store i8 %unknownValue, i8* %ptr, !invariant.group !0 ; Can assume that %unknownValue == 42
call void @foo(i8* %ptr)
- %newPtr2 = call i8* @llvm.invariant.group.barrier(i8* %ptr)
- %d = load i8, i8* %newPtr2, !invariant.group !0 ; Can't step through invariant.group.barrier to get value of %ptr
+ %newPtr2 = call i8* @llvm.launder.invariant.group(i8* %ptr)
+ %d = load i8, i8* %newPtr2, !invariant.group !0 ; Can't step through launder.invariant.group to get value of %ptr
...
declare void @foo(i8*)
declare i8* @getPointer(i8*)
- declare i8* @llvm.invariant.group.barrier(i8*)
+ declare i8* @llvm.launder.invariant.group(i8*)
- !0 = !{!"magic ptr"}
- !1 = !{!"other ptr"}
+ !0 = !{}
The invariant.group metadata must be dropped when replacing one pointer by
another based on aliasing information. This is because invariant.group is tied
@@ -5281,6 +5441,8 @@ to the SSA value of the pointer operand.
; if %x mustalias %y then we can replace the above instruction with
%v = load i8, i8* %y
+Note that this is an experimental feature, which means that its semantics might
+change in the future.
'``type``' Metadata
^^^^^^^^^^^^^^^^^^^
@@ -5617,6 +5779,310 @@ Each individual option is required to be either a valid option for the target's
linker, or an option that is reserved by the target specific assembly writer or
object file emitter. No other aspect of these options is defined by the IR.
+.. _summary:
+
+ThinLTO Summary
+===============
+
+Compiling with `ThinLTO <https://clang.llvm.org/docs/ThinLTO.html>`_
+causes the building of a compact summary of the module that is emitted into
+the bitcode. The summary is emitted into the LLVM assembly and identified
+in syntax by a caret ('``^``').
+
+*Note that temporarily the summary entries are skipped when parsing the
+assembly, although the parsing support is actively being implemented. The
+following describes when the summary entries will be parsed once implemented.*
+The summary will be parsed into a ModuleSummaryIndex object under the
+same conditions where summary index is currently built from bitcode.
+Specifically, tools that test the Thin Link portion of a ThinLTO compile
+(i.e. llvm-lto and llvm-lto2), or when parsing a combined index
+for a distributed ThinLTO backend via clang's "``-fthinlto-index=<>``" flag.
+Additionally, it will be parsed into a bitcode output, along with the Module
+IR, via the "``llvm-as``" tool. Tools that parse the Module IR for the purposes
+of optimization (e.g. "``clang -x ir``" and "``opt``"), will ignore the
+summary entries (just as they currently ignore summary entries in a bitcode
+input file).
+
+There are currently 3 types of summary entries in the LLVM assembly:
+:ref:`module paths<module_path_summary>`,
+:ref:`global values<gv_summary>`, and
+:ref:`type identifiers<typeid_summary>`.
+
+.. _module_path_summary:
+
+Module Path Summary Entry
+-------------------------
+
+Each module path summary entry lists a module containing global values included
+in the summary. For a single IR module there will be one such entry, but
+in a combined summary index produced during the thin link, there will be
+one module path entry per linked module with summary.
+
+Example:
+
+.. code-block:: llvm
+
+ ^0 = module: (path: "/path/to/file.o", hash: (2468601609, 1329373163, 1565878005, 638838075, 3148790418))
+
+The ``path`` field is a string path to the bitcode file, and the ``hash``
+field is the 160-bit SHA-1 hash of the IR bitcode contents, used for
+incremental builds and caching.
+
+.. _gv_summary:
+
+Global Value Summary Entry
+--------------------------
+
+Each global value summary entry corresponds to a global value defined or
+referenced by a summarized module.
+
+Example:
+
+.. code-block:: llvm
+
+ ^4 = gv: (name: "f"[, summaries: (Summary)[, (Summary)]*]?) ; guid = 14740650423002898831
+
+For declarations, there will not be a summary list. For definitions, a
+global value will contain a list of summaries, one per module containing
+a definition. There can be multiple entries in a combined summary index
+for symbols with weak linkage.
+
+Each ``Summary`` format will depend on whether the global value is a
+:ref:`function<function_summary>`, :ref:`variable<variable_summary>`, or
+:ref:`alias<alias_summary>`.
+
+.. _function_summary:
+
+Function Summary
+^^^^^^^^^^^^^^^^
+
+If the global value is a function, the ``Summary`` entry will look like:
+
+.. code-block:: llvm
+
+ function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 2[, FuncFlags]?[, Calls]?[, TypeIdInfo]?[, Refs]?
+
+The ``module`` field includes the summary entry id for the module containing
+this definition, and the ``flags`` field contains information such as
+the linkage type, a flag indicating whether it is legal to import the
+definition, whether it is globally live and whether the linker resolved it
+to a local definition (the latter two are populated during the thin link).
+The ``insts`` field contains the number of IR instructions in the function.
+Finally, there are several optional fields: :ref:`FuncFlags<funcflags_summary>`,
+:ref:`Calls<calls_summary>`, :ref:`TypeIdInfo<typeidinfo_summary>`,
+:ref:`Refs<refs_summary>`.
+
+.. _variable_summary:
+
+Global Variable Summary
+^^^^^^^^^^^^^^^^^^^^^^^
+
+If the global value is a variable, the ``Summary`` entry will look like:
+
+.. code-block:: llvm
+
+ variable: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0)[, Refs]?
+
+The variable entry contains a subset of the fields in a
+:ref:`function summary <function_summary>`, see the descriptions there.
+
+.. _alias_summary:
+
+Alias Summary
+^^^^^^^^^^^^^
+
+If the global value is an alias, the ``Summary`` entry will look like:
+
+.. code-block:: llvm
+
+ alias: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), aliasee: ^2)
+
+The ``module`` and ``flags`` fields are as described for a
+:ref:`function summary <function_summary>`. The ``aliasee`` field
+contains a reference to the global value summary entry of the aliasee.
+
+.. _funcflags_summary:
+
+Function Flags
+^^^^^^^^^^^^^^
+
+The optional ``FuncFlags`` field looks like:
+
+.. code-block:: llvm
+
+ funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0)
+
+If unspecified, flags are assumed to hold the conservative ``false`` value of
+``0``.
+
+.. _calls_summary:
+
+Calls
+^^^^^
+
+The optional ``Calls`` field looks like:
+
+.. code-block:: llvm
+
+ calls: ((Callee)[, (Callee)]*)
+
+where each ``Callee`` looks like:
+
+.. code-block:: llvm
+
+ callee: ^1[, hotness: None]?[, relbf: 0]?
+
+The ``callee`` refers to the summary entry id of the callee. At most one
+of ``hotness`` (which can take the values ``Unknown``, ``Cold``, ``None``,
+``Hot``, and ``Critical``), and ``relbf`` (which holds the integer
+branch frequency relative to the entry frequency, scaled down by 2^8)
+may be specified. The defaults are ``Unknown`` and ``0``, respectively.
+
+.. _refs_summary:
+
+Refs
+^^^^
+
+The optional ``Refs`` field looks like:
+
+.. code-block:: llvm
+
+ refs: ((Ref)[, (Ref)]*)
+
+where each ``Ref`` contains a reference to the summary id of the referenced
+value (e.g. ``^1``).
+
+.. _typeidinfo_summary:
+
+TypeIdInfo
+^^^^^^^^^^
+
+The optional ``TypeIdInfo`` field, used for
+`Control Flow Integrity <http://clang.llvm.org/docs/ControlFlowIntegrity.html>`_,
+looks like:
+
+.. code-block:: llvm
+
+ typeIdInfo: [(TypeTests)]?[, (TypeTestAssumeVCalls)]?[, (TypeCheckedLoadVCalls)]?[, (TypeTestAssumeConstVCalls)]?[, (TypeCheckedLoadConstVCalls)]?
+
+These optional fields have the following forms:
+
+TypeTests
+"""""""""
+
+.. code-block:: llvm
+
+ typeTests: (TypeIdRef[, TypeIdRef]*)
+
+Where each ``TypeIdRef`` refers to a :ref:`type id<typeid_summary>`
+by summary id or ``GUID``.
+
+TypeTestAssumeVCalls
+""""""""""""""""""""
+
+.. code-block:: llvm
+
+ typeTestAssumeVCalls: (VFuncId[, VFuncId]*)
+
+Where each VFuncId has the format:
+
+.. code-block:: llvm
+
+ vFuncId: (TypeIdRef, offset: 16)
+
+Where each ``TypeIdRef`` refers to a :ref:`type id<typeid_summary>`
+by summary id or ``GUID`` preceeded by a ``guid:`` tag.
+
+TypeCheckedLoadVCalls
+"""""""""""""""""""""
+
+.. code-block:: llvm
+
+ typeCheckedLoadVCalls: (VFuncId[, VFuncId]*)
+
+Where each VFuncId has the format described for ``TypeTestAssumeVCalls``.
+
+TypeTestAssumeConstVCalls
+"""""""""""""""""""""""""
+
+.. code-block:: llvm
+
+ typeTestAssumeConstVCalls: (ConstVCall[, ConstVCall]*)
+
+Where each ConstVCall has the format:
+
+.. code-block:: llvm
+
+ VFuncId, args: (Arg[, Arg]*)
+
+and where each VFuncId has the format described for ``TypeTestAssumeVCalls``,
+and each Arg is an integer argument number.
+
+TypeCheckedLoadConstVCalls
+""""""""""""""""""""""""""
+
+.. code-block:: llvm
+
+ typeCheckedLoadConstVCalls: (ConstVCall[, ConstVCall]*)
+
+Where each ConstVCall has the format described for
+``TypeTestAssumeConstVCalls``.
+
+.. _typeid_summary:
+
+Type ID Summary Entry
+---------------------
+
+Each type id summary entry corresponds to a type identifier resolution
+which is generated during the LTO link portion of the compile when building
+with `Control Flow Integrity <http://clang.llvm.org/docs/ControlFlowIntegrity.html>`_,
+so these are only present in a combined summary index.
+
+Example:
+
+.. code-block:: llvm
+
+ ^4 = typeid: (name: "_ZTS1A", summary: (typeTestRes: (kind: allOnes, sizeM1BitWidth: 7[, alignLog2: 0]?[, sizeM1: 0]?[, bitMask: 0]?[, inlineBits: 0]?)[, WpdResolutions]?)) ; guid = 7004155349499253778
+
+The ``typeTestRes`` gives the type test resolution ``kind`` (which may
+be ``unsat``, ``byteArray``, ``inline``, ``single``, or ``allOnes``), and
+the ``size-1`` bit width. It is followed by optional flags, which default to 0,
+and an optional WpdResolutions (whole program devirtualization resolution)
+field that looks like:
+
+.. code-block:: llvm
+
+ wpdResolutions: ((offset: 0, WpdRes)[, (offset: 1, WpdRes)]*
+
+where each entry is a mapping from the given byte offset to the whole-program
+devirtualization resolution WpdRes, that has one of the following formats:
+
+.. code-block:: llvm
+
+ wpdRes: (kind: branchFunnel)
+ wpdRes: (kind: singleImpl, singleImplName: "_ZN1A1nEi")
+ wpdRes: (kind: indir)
+
+Additionally, each wpdRes has an optional ``resByArg`` field, which
+describes the resolutions for calls with all constant integer arguments:
+
+.. code-block:: llvm
+
+ resByArg: (ResByArg[, ResByArg]*)
+
+where ResByArg is:
+
+.. code-block:: llvm
+
+ args: (Arg[, Arg]*), byArg: (kind: UniformRetVal[, info: 0][, byte: 0][, bit: 0])
+
+Where the ``kind`` can be ``Indir``, ``UniformRetVal``, ``UniqueRetVal``
+or ``VirtualConstProp``. The ``info`` field is only used if the kind
+is ``UniformRetVal`` (indicates the uniform return value), or
+``UniqueRetVal`` (holds the return value associated with the unique vtable
+(0 or 1)). The ``byte`` and ``bit`` fields are only used if the target does
+not support the use of absolute symbols to store constants.
+
.. _intrinsicglobalvariables:
Intrinsic Global Variables
@@ -6363,17 +6829,19 @@ The '``fadd``' instruction returns the sum of its two operands.
Arguments:
""""""""""
-The two arguments to the '``fadd``' instruction must be :ref:`floating
-point <t_floating>` or :ref:`vector <t_vector>` of floating point values.
-Both arguments must have identical types.
+The two arguments to the '``fadd``' instruction must be
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
+floating-point values. Both arguments must have identical types.
Semantics:
""""""""""
-The value produced is the floating point sum of the two operands. This
-instruction can also take any number of :ref:`fast-math flags <fastmath>`,
-which are optimization hints to enable otherwise unsafe floating point
-optimizations:
+The value produced is the floating-point sum of the two operands.
+This instruction is assumed to execute in the default :ref:`floating-point
+environment <floatenv>`.
+This instruction can also take any number of :ref:`fast-math
+flags <fastmath>`, which are optimization hints to enable otherwise
+unsafe floating-point optimizations:
Example:
""""""""
@@ -6458,17 +6926,19 @@ instruction present in most other intermediate representations.
Arguments:
""""""""""
-The two arguments to the '``fsub``' instruction must be :ref:`floating
-point <t_floating>` or :ref:`vector <t_vector>` of floating point values.
-Both arguments must have identical types.
+The two arguments to the '``fsub``' instruction must be
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
+floating-point values. Both arguments must have identical types.
Semantics:
""""""""""
-The value produced is the floating point difference of the two operands.
+The value produced is the floating-point difference of the two operands.
+This instruction is assumed to execute in the default :ref:`floating-point
+environment <floatenv>`.
This instruction can also take any number of :ref:`fast-math
flags <fastmath>`, which are optimization hints to enable otherwise
-unsafe floating point optimizations:
+unsafe floating-point optimizations:
Example:
""""""""
@@ -6551,17 +7021,19 @@ The '``fmul``' instruction returns the product of its two operands.
Arguments:
""""""""""
-The two arguments to the '``fmul``' instruction must be :ref:`floating
-point <t_floating>` or :ref:`vector <t_vector>` of floating point values.
-Both arguments must have identical types.
+The two arguments to the '``fmul``' instruction must be
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
+floating-point values. Both arguments must have identical types.
Semantics:
""""""""""
-The value produced is the floating point product of the two operands.
+The value produced is the floating-point product of the two operands.
+This instruction is assumed to execute in the default :ref:`floating-point
+environment <floatenv>`.
This instruction can also take any number of :ref:`fast-math
flags <fastmath>`, which are optimization hints to enable otherwise
-unsafe floating point optimizations:
+unsafe floating-point optimizations:
Example:
""""""""
@@ -6683,17 +7155,19 @@ The '``fdiv``' instruction returns the quotient of its two operands.
Arguments:
""""""""""
-The two arguments to the '``fdiv``' instruction must be :ref:`floating
-point <t_floating>` or :ref:`vector <t_vector>` of floating point values.
-Both arguments must have identical types.
+The two arguments to the '``fdiv``' instruction must be
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
+floating-point values. Both arguments must have identical types.
Semantics:
""""""""""
-The value produced is the floating point quotient of the two operands.
+The value produced is the floating-point quotient of the two operands.
+This instruction is assumed to execute in the default :ref:`floating-point
+environment <floatenv>`.
This instruction can also take any number of :ref:`fast-math
flags <fastmath>`, which are optimization hints to enable otherwise
-unsafe floating point optimizations:
+unsafe floating-point optimizations:
Example:
""""""""
@@ -6824,19 +7298,22 @@ its two operands.
Arguments:
""""""""""
-The two arguments to the '``frem``' instruction must be :ref:`floating
-point <t_floating>` or :ref:`vector <t_vector>` of floating point values.
-Both arguments must have identical types.
+The two arguments to the '``frem``' instruction must be
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
+floating-point values. Both arguments must have identical types.
Semantics:
""""""""""
-Return the same value as a libm '``fmod``' function but without trapping or
-setting ``errno``.
-
-The remainder has the same sign as the dividend. This instruction can also
-take any number of :ref:`fast-math flags <fastmath>`, which are optimization
-hints to enable otherwise unsafe floating-point optimizations:
+The value produced is the floating-point remainder of the two operands.
+This is the same output as a libm '``fmod``' function, but without any
+possibility of setting ``errno``. The remainder has the same sign as the
+dividend.
+This instruction is assumed to execute in the default :ref:`floating-point
+environment <floatenv>`.
+This instruction can also take any number of :ref:`fast-math
+flags <fastmath>`, which are optimization hints to enable otherwise
+unsafe floating-point optimizations:
Example:
""""""""
@@ -6895,7 +7372,7 @@ by the corresponding shift amount in ``op2``.
If the ``nuw`` keyword is present, then the shift produces a poison
value if it shifts out any non-zero bits.
If the ``nsw`` keyword is present, then the shift produces a poison
-value it shifts out any bits that disagree with the resultant sign bit.
+value if it shifts out any bits that disagree with the resultant sign bit.
Example:
""""""""
@@ -7197,7 +7674,8 @@ Semantics:
The result is a scalar of the same type as the element type of ``val``.
Its value is the value at position ``idx`` of ``val``. If ``idx``
-exceeds the length of ``val``, the results are undefined.
+exceeds the length of ``val``, the result is a
+:ref:`poison value <poisonvalues>`.
Example:
""""""""
@@ -7238,8 +7716,8 @@ Semantics:
The result is a vector of the same type as ``val``. Its element values
are those of ``val`` except at position ``idx``, where it gets the value
-``elt``. If ``idx`` exceeds the length of ``val``, the results are
-undefined.
+``elt``. If ``idx`` exceeds the length of ``val``, the result
+is a :ref:`poison value <poisonvalues>`.
Example:
""""""""
@@ -7455,9 +7933,9 @@ memory is automatically released when the function returns. The
'``alloca``' instruction is commonly used to represent automatic
variables that must have an address available. When the function returns
(either with the ``ret`` or ``resume`` instructions), the memory is
-reclaimed. Allocating zero bytes is legal, but the result is undefined.
-The order in which memory is allocated (ie., which way the stack grows)
-is not specified.
+reclaimed. Allocating zero bytes is legal, but the returned pointer may not
+be unique. The order in which memory is allocated (ie., which way the stack
+grows) is not specified.
Example:
""""""""
@@ -7538,18 +8016,20 @@ metadata name ``<index>`` corresponding to a metadata node with no
entries. If a load instruction tagged with the ``!invariant.load``
metadata is executed, the optimizer may assume the memory location
referenced by the load contains the same value at all points in the
-program where the memory location is known to be dereferenceable.
+program where the memory location is known to be dereferenceable;
+otherwise, the behavior is undefined.
The optional ``!invariant.group`` metadata must reference a single metadata name
- ``<index>`` corresponding to a metadata node. See ``invariant.group`` metadata.
+ ``<index>`` corresponding to a metadata node with no entries.
+ See ``invariant.group`` metadata.
The optional ``!nonnull`` metadata must reference a single
metadata name ``<index>`` corresponding to a metadata node with no
entries. The existence of the ``!nonnull`` metadata on the
instruction tells the optimizer that the value loaded is known to
-never be null. This is analogous to the ``nonnull`` attribute
-on parameters and return values. This metadata can only be applied
-to loads of a pointer type.
+never be null. If the value is null at runtime, the behavior is undefined.
+This is analogous to the ``nonnull`` attribute on parameters and return
+values. This metadata can only be applied to loads of a pointer type.
The optional ``!dereferenceable`` metadata must reference a single metadata
name ``<deref_bytes_node>`` corresponding to a metadata node with one ``i64``
@@ -7576,7 +8056,8 @@ The existence of the ``!align`` metadata on the instruction tells the
optimizer that the value loaded is known to be aligned to a boundary specified
by the integer value in the metadata node. The alignment must be a power of 2.
This is analogous to the ''align'' attribute on parameters and return values.
-This metadata can only be applied to loads of a pointer type.
+This metadata can only be applied to loads of a pointer type. If the returned
+value is not appropriately aligned at runtime, the behavior is undefined.
Semantics:
""""""""""
@@ -8273,8 +8754,8 @@ The '``fptrunc``' instruction truncates ``value`` to type ``ty2``.
Arguments:
""""""""""
-The '``fptrunc``' instruction takes a :ref:`floating point <t_floating>`
-value to cast and a :ref:`floating point <t_floating>` type to cast it to.
+The '``fptrunc``' instruction takes a :ref:`floating-point <t_floating>`
+value to cast and a :ref:`floating-point <t_floating>` type to cast it to.
The size of ``value`` must be larger than the size of ``ty2``. This
implies that ``fptrunc`` cannot be used to make a *no-op cast*.
@@ -8282,19 +8763,18 @@ Semantics:
""""""""""
The '``fptrunc``' instruction casts a ``value`` from a larger
-:ref:`floating point <t_floating>` type to a smaller :ref:`floating
-point <t_floating>` type. If the value cannot fit (i.e. overflows) within the
-destination type, ``ty2``, then the results are undefined. If the cast produces
-an inexact result, how rounding is performed (e.g. truncation, also known as
-round to zero) is undefined.
+:ref:`floating-point <t_floating>` type to a smaller :ref:`floating-point
+<t_floating>` type.
+This instruction is assumed to execute in the default :ref:`floating-point
+environment <floatenv>`.
Example:
""""""""
.. code-block:: llvm
- %X = fptrunc double 123.0 to float ; yields float:123.0
- %Y = fptrunc double 1.0E+300 to float ; yields undefined
+ %X = fptrunc double 16777217.0 to float ; yields float:16777216.0
+ %Y = fptrunc double 1.0E+300 to half ; yields half:+infinity
'``fpext .. to``' Instruction
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -8309,24 +8789,24 @@ Syntax:
Overview:
"""""""""
-The '``fpext``' extends a floating point ``value`` to a larger floating
-point value.
+The '``fpext``' extends a floating-point ``value`` to a larger floating-point
+value.
Arguments:
""""""""""
-The '``fpext``' instruction takes a :ref:`floating point <t_floating>`
-``value`` to cast, and a :ref:`floating point <t_floating>` type to cast it
+The '``fpext``' instruction takes a :ref:`floating-point <t_floating>`
+``value`` to cast, and a :ref:`floating-point <t_floating>` type to cast it
to. The source type must be smaller than the destination type.
Semantics:
""""""""""
The '``fpext``' instruction extends the ``value`` from a smaller
-:ref:`floating point <t_floating>` type to a larger :ref:`floating
-point <t_floating>` type. The ``fpext`` cannot be used to make a
+:ref:`floating-point <t_floating>` type to a larger :ref:`floating-point
+<t_floating>` type. The ``fpext`` cannot be used to make a
*no-op cast* because it always changes bits. Use ``bitcast`` to make a
-*no-op cast* for a floating point cast.
+*no-op cast* for a floating-point cast.
Example:
""""""""
@@ -8349,25 +8829,25 @@ Syntax:
Overview:
"""""""""
-The '``fptoui``' converts a floating point ``value`` to its unsigned
+The '``fptoui``' converts a floating-point ``value`` to its unsigned
integer equivalent of type ``ty2``.
Arguments:
""""""""""
The '``fptoui``' instruction takes a value to cast, which must be a
-scalar or vector :ref:`floating point <t_floating>` value, and a type to
+scalar or vector :ref:`floating-point <t_floating>` value, and a type to
cast it to ``ty2``, which must be an :ref:`integer <t_integer>` type. If
-``ty`` is a vector floating point type, ``ty2`` must be a vector integer
+``ty`` is a vector floating-point type, ``ty2`` must be a vector integer
type with the same number of elements as ``ty``
Semantics:
""""""""""
-The '``fptoui``' instruction converts its :ref:`floating
-point <t_floating>` operand into the nearest (rounding towards zero)
-unsigned integer value. If the value cannot fit in ``ty2``, the results
-are undefined.
+The '``fptoui``' instruction converts its :ref:`floating-point
+<t_floating>` operand into the nearest (rounding towards zero)
+unsigned integer value. If the value cannot fit in ``ty2``, the result
+is a :ref:`poison value <poisonvalues>`.
Example:
""""""""
@@ -8391,25 +8871,25 @@ Syntax:
Overview:
"""""""""
-The '``fptosi``' instruction converts :ref:`floating point <t_floating>`
+The '``fptosi``' instruction converts :ref:`floating-point <t_floating>`
``value`` to type ``ty2``.
Arguments:
""""""""""
The '``fptosi``' instruction takes a value to cast, which must be a
-scalar or vector :ref:`floating point <t_floating>` value, and a type to
+scalar or vector :ref:`floating-point <t_floating>` value, and a type to
cast it to ``ty2``, which must be an :ref:`integer <t_integer>` type. If
-``ty`` is a vector floating point type, ``ty2`` must be a vector integer
+``ty`` is a vector floating-point type, ``ty2`` must be a vector integer
type with the same number of elements as ``ty``
Semantics:
""""""""""
-The '``fptosi``' instruction converts its :ref:`floating
-point <t_floating>` operand into the nearest (rounding towards zero)
-signed integer value. If the value cannot fit in ``ty2``, the results
-are undefined.
+The '``fptosi``' instruction converts its :ref:`floating-point
+<t_floating>` operand into the nearest (rounding towards zero)
+signed integer value. If the value cannot fit in ``ty2``, the result
+is a :ref:`poison value <poisonvalues>`.
Example:
""""""""
@@ -8441,17 +8921,18 @@ Arguments:
The '``uitofp``' instruction takes a value to cast, which must be a
scalar or vector :ref:`integer <t_integer>` value, and a type to cast it to
-``ty2``, which must be an :ref:`floating point <t_floating>` type. If
-``ty`` is a vector integer type, ``ty2`` must be a vector floating point
+``ty2``, which must be an :ref:`floating-point <t_floating>` type. If
+``ty`` is a vector integer type, ``ty2`` must be a vector floating-point
type with the same number of elements as ``ty``
Semantics:
""""""""""
The '``uitofp``' instruction interprets its operand as an unsigned
-integer quantity and converts it to the corresponding floating point
-value. If the value cannot fit in the floating point value, the results
-are undefined.
+integer quantity and converts it to the corresponding floating-point
+value. If the value cannot be exactly represented, it is rounded using
+the default rounding mode.
+
Example:
""""""""
@@ -8482,17 +8963,17 @@ Arguments:
The '``sitofp``' instruction takes a value to cast, which must be a
scalar or vector :ref:`integer <t_integer>` value, and a type to cast it to
-``ty2``, which must be an :ref:`floating point <t_floating>` type. If
-``ty`` is a vector integer type, ``ty2`` must be a vector floating point
+``ty2``, which must be an :ref:`floating-point <t_floating>` type. If
+``ty`` is a vector integer type, ``ty2`` must be a vector floating-point
type with the same number of elements as ``ty``
Semantics:
""""""""""
The '``sitofp``' instruction interprets its operand as a signed integer
-quantity and converts it to the corresponding floating point value. If
-the value cannot fit in the floating point value, the results are
-undefined.
+quantity and converts it to the corresponding floating-point value. If the
+value cannot be exactly represented, it is rounded using the default rounding
+mode.
Example:
""""""""
@@ -8804,10 +9285,10 @@ Overview:
The '``fcmp``' instruction returns a boolean value or vector of boolean
values based on comparison of its operands.
-If the operands are floating point scalars, then the result type is a
+If the operands are floating-point scalars, then the result type is a
boolean (:ref:`i1 <t_integer>`).
-If the operands are floating point vectors, then the result type is a
+If the operands are floating-point vectors, then the result type is a
vector of boolean with the same number of elements as the operands being
compared.
@@ -8838,9 +9319,9 @@ not a value, just a keyword. The possible condition codes are:
*Ordered* means that neither operand is a QNAN while *unordered* means
that either operand may be a QNAN.
-Each of ``val1`` and ``val2`` arguments must be either a :ref:`floating
-point <t_floating>` type or a :ref:`vector <t_vector>` of floating point
-type. They must have identical types.
+Each of ``val1`` and ``val2`` arguments must be either a :ref:`floating-point
+<t_floating>` type or a :ref:`vector <t_vector>` of floating-point type.
+They must have identical types.
Semantics:
""""""""""
@@ -8881,12 +9362,12 @@ always yields an :ref:`i1 <t_integer>` result, as follows:
The ``fcmp`` instruction can also optionally take any number of
:ref:`fast-math flags <fastmath>`, which are optimization hints to enable
-otherwise unsafe floating point optimizations.
+otherwise unsafe floating-point optimizations.
Any set of fast-math flags are legal on an ``fcmp`` instruction, but the
only flags that have any effect on its semantics are those that allow
assumptions to be made about the values of input arguments; namely
-``nnan``, ``ninf``, and ``nsz``. See :ref:`fastmath` for more information.
+``nnan``, ``ninf``, and ``reassoc``. See :ref:`fastmath` for more information.
Example:
""""""""
@@ -9033,9 +9514,11 @@ This instruction requires several arguments:
#. Arguments with the :ref:`inalloca <attr_inalloca>` attribute are
forwarded in place.
- Both markers imply that the callee does not access allocas or varargs from
- the caller. Calls marked ``musttail`` must obey the following additional
- rules:
+ Both markers imply that the callee does not access allocas from the caller.
+ The ``tail`` marker additionally implies that the callee does not access
+ varargs from the caller, while ``musttail`` implies that varargs from the
+ caller are passed to the callee. Calls marked ``musttail`` must obey the
+ following additional rules:
- The call must immediately precede a :ref:`ret <i_ret>` instruction,
or a pointer bitcast followed by a ret instruction.
@@ -10165,7 +10648,7 @@ time library.
This instrinsic does *not* empty the instruction pipeline. Modifications
of the current function are outside the scope of the intrinsic.
-'``llvm.instrprof_increment``' Intrinsic
+'``llvm.instrprof.increment``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -10173,13 +10656,13 @@ Syntax:
::
- declare void @llvm.instrprof_increment(i8* <name>, i64 <hash>,
+ declare void @llvm.instrprof.increment(i8* <name>, i64 <hash>,
i32 <num-counters>, i32 <index>)
Overview:
"""""""""
-The '``llvm.instrprof_increment``' intrinsic can be emitted by a
+The '``llvm.instrprof.increment``' intrinsic can be emitted by a
frontend for use with instrumentation based profiling. These will be
lowered by the ``-instrprof`` pass to generate execution counts of a
program at runtime.
@@ -10195,7 +10678,7 @@ The second argument is a hash value that can be used by the consumer
of the profile data to detect changes to the instrumented source, and
the third is the number of counters associated with ``name``. It is an
error if ``hash`` or ``num-counters`` differ between two instances of
-``instrprof_increment`` that refer to the same name.
+``instrprof.increment`` that refer to the same name.
The last argument refers to which of the counters for ``name`` should
be incremented. It should be a value between 0 and ``num-counters``.
@@ -10209,7 +10692,7 @@ structures and the code to increment the appropriate value, in a
format that can be written out by a compiler runtime and consumed via
the ``llvm-profdata`` tool.
-'``llvm.instrprof_increment_step``' Intrinsic
+'``llvm.instrprof.increment.step``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -10217,30 +10700,30 @@ Syntax:
::
- declare void @llvm.instrprof_increment_step(i8* <name>, i64 <hash>,
+ declare void @llvm.instrprof.increment.step(i8* <name>, i64 <hash>,
i32 <num-counters>,
i32 <index>, i64 <step>)
Overview:
"""""""""
-The '``llvm.instrprof_increment_step``' intrinsic is an extension to
-the '``llvm.instrprof_increment``' intrinsic with an additional fifth
+The '``llvm.instrprof.increment.step``' intrinsic is an extension to
+the '``llvm.instrprof.increment``' intrinsic with an additional fifth
argument to specify the step of the increment.
Arguments:
""""""""""
-The first four arguments are the same as '``llvm.instrprof_increment``'
+The first four arguments are the same as '``llvm.instrprof.increment``'
intrinsic.
The last argument specifies the value of the increment of the counter variable.
Semantics:
""""""""""
-See description of '``llvm.instrprof_increment``' instrinsic.
+See description of '``llvm.instrprof.increment``' instrinsic.
-'``llvm.instrprof_value_profile``' Intrinsic
+'``llvm.instrprof.value.profile``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -10248,14 +10731,14 @@ Syntax:
::
- declare void @llvm.instrprof_value_profile(i8* <name>, i64 <hash>,
+ declare void @llvm.instrprof.value.profile(i8* <name>, i64 <hash>,
i64 <value>, i32 <value_kind>,
i32 <index>)
Overview:
"""""""""
-The '``llvm.instrprof_value_profile``' intrinsic can be emitted by a
+The '``llvm.instrprof.value.profile``' intrinsic can be emitted by a
frontend for use with instrumentation based profiling. This will be
lowered by the ``-instrprof`` pass to find out the target values,
instrumented expressions take in a program at runtime.
@@ -10270,7 +10753,7 @@ name of the entity being instrumented. ``name`` should generally be the
The second argument is a hash value that can be used by the consumer
of the profile data to detect changes to the instrumented source. It
is an error if ``hash`` differs between two instances of
-``llvm.instrprof_*`` that refer to the same name.
+``llvm.instrprof.*`` that refer to the same name.
The third argument is the value of the expression being profiled. The profiled
expression's value should be representable as an unsigned 64-bit value. The
@@ -10286,7 +10769,7 @@ Semantics:
This intrinsic represents the point where a call to a runtime routine
should be inserted for value profiling of target expressions. ``-instrprof``
pass will generate the appropriate data structures and replace the
-``llvm.instrprof_value_profile`` intrinsic with the call to the profile
+``llvm.instrprof.value.profile`` intrinsic with the call to the profile
runtime library with proper arguments.
'``llvm.thread.pointer``' Intrinsic
@@ -10339,9 +10822,9 @@ support all bit widths however.
::
declare void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
- i32 <len>, i32 <align>, i1 <isvolatile>)
+ i32 <len>, i1 <isvolatile>)
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
- i64 <len>, i32 <align>, i1 <isvolatile>)
+ i64 <len>, i1 <isvolatile>)
Overview:
"""""""""
@@ -10350,7 +10833,7 @@ The '``llvm.memcpy.*``' intrinsics copy a block of memory from the
source location to the destination location.
Note that, unlike the standard libc function, the ``llvm.memcpy.*``
-intrinsics do not return a value, takes extra alignment/isvolatile
+intrinsics do not return a value, takes extra isvolatile
arguments and the pointers can be in specified address spaces.
Arguments:
@@ -10358,13 +10841,11 @@ Arguments:
The first argument is a pointer to the destination, the second is a
pointer to the source. The third argument is an integer argument
-specifying the number of bytes to copy, the fourth argument is the
-alignment of the source and destination locations, and the fifth is a
+specifying the number of bytes to copy, and the fourth is a
boolean indicating a volatile access.
-If the call to this intrinsic has an alignment value that is not 0 or 1,
-then the caller guarantees that both the source and destination pointers
-are aligned to that boundary.
+The :ref:`align <attr_align>` parameter attribute can be provided
+for the first and second arguments.
If the ``isvolatile`` parameter is ``true``, the ``llvm.memcpy`` call is
a :ref:`volatile operation <volatile>`. The detailed access behavior is not
@@ -10394,9 +10875,9 @@ bit widths however.
::
declare void @llvm.memmove.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
- i32 <len>, i32 <align>, i1 <isvolatile>)
+ i32 <len>, i1 <isvolatile>)
declare void @llvm.memmove.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
- i64 <len>, i32 <align>, i1 <isvolatile>)
+ i64 <len>, i1 <isvolatile>)
Overview:
"""""""""
@@ -10407,21 +10888,19 @@ source location to the destination location. It is similar to the
overlap.
Note that, unlike the standard libc function, the ``llvm.memmove.*``
-intrinsics do not return a value, takes extra alignment/isvolatile
-arguments and the pointers can be in specified address spaces.
+intrinsics do not return a value, takes an extra isvolatile
+argument and the pointers can be in specified address spaces.
Arguments:
""""""""""
The first argument is a pointer to the destination, the second is a
pointer to the source. The third argument is an integer argument
-specifying the number of bytes to copy, the fourth argument is the
-alignment of the source and destination locations, and the fifth is a
+specifying the number of bytes to copy, and the fourth is a
boolean indicating a volatile access.
-If the call to this intrinsic has an alignment value that is not 0 or 1,
-then the caller guarantees that the source and destination pointers are
-aligned to that boundary.
+The :ref:`align <attr_align>` parameter attribute can be provided
+for the first and second arguments.
If the ``isvolatile`` parameter is ``true``, the ``llvm.memmove`` call
is a :ref:`volatile operation <volatile>`. The detailed access behavior is
@@ -10451,9 +10930,9 @@ support all bit widths.
::
declare void @llvm.memset.p0i8.i32(i8* <dest>, i8 <val>,
- i32 <len>, i32 <align>, i1 <isvolatile>)
+ i32 <len>, i1 <isvolatile>)
declare void @llvm.memset.p0i8.i64(i8* <dest>, i8 <val>,
- i64 <len>, i32 <align>, i1 <isvolatile>)
+ i64 <len>, i1 <isvolatile>)
Overview:
"""""""""
@@ -10462,8 +10941,8 @@ The '``llvm.memset.*``' intrinsics fill a block of memory with a
particular byte value.
Note that, unlike the standard libc function, the ``llvm.memset``
-intrinsic does not return a value and takes extra alignment/volatile
-arguments. Also, the destination can be in an arbitrary address space.
+intrinsic does not return a value and takes an extra volatile
+argument. Also, the destination can be in an arbitrary address space.
Arguments:
""""""""""
@@ -10471,11 +10950,10 @@ Arguments:
The first argument is a pointer to the destination to fill, the second
is the byte value with which to fill it, the third argument is an
integer argument specifying the number of bytes to fill, and the fourth
-argument is the known alignment of the destination location.
+is a boolean indicating a volatile access.
-If the call to this intrinsic has an alignment value that is not 0 or 1,
-then the caller guarantees that the destination pointer is aligned to
-that boundary.
+The :ref:`align <attr_align>` parameter attribute can be provided
+for the first arguments.
If the ``isvolatile`` parameter is ``true``, the ``llvm.memset`` call is
a :ref:`volatile operation <volatile>`. The detailed access behavior is not
@@ -10485,9 +10963,7 @@ Semantics:
""""""""""
The '``llvm.memset.*``' intrinsics fill "len" bytes of memory starting
-at the destination location. If the argument is known to be aligned to
-some boundary, this can be specified as the fourth argument, otherwise
-it should be set to 0 or 1 (both meaning no alignment).
+at the destination location.
'``llvm.sqrt.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -10521,10 +10997,10 @@ Semantics:
""""""""""
Return the same value as a corresponding libm '``sqrt``' function but without
-trapping or setting ``errno``. For types specified by IEEE-754, the result
+trapping or setting ``errno``. For types specified by IEEE-754, the result
matches a conforming libm implementation.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.powi.*``' Intrinsic
@@ -10534,7 +11010,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.powi`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -10550,7 +11026,7 @@ Overview:
The '``llvm.powi.*``' intrinsics return the first operand raised to the
specified (positive or negative) power. The order of evaluation of
-multiplications is not defined. When a vector of floating point type is
+multiplications is not defined. When a vector of floating-point type is
used, the second argument remains a scalar integer value.
Arguments:
@@ -10599,7 +11075,7 @@ Semantics:
Return the same value as a corresponding libm '``sin``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.cos.*``' Intrinsic
@@ -10636,7 +11112,7 @@ Semantics:
Return the same value as a corresponding libm '``cos``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.pow.*``' Intrinsic
@@ -10674,7 +11150,7 @@ Semantics:
Return the same value as a corresponding libm '``pow``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.exp.*``' Intrinsic
@@ -10712,7 +11188,7 @@ Semantics:
Return the same value as a corresponding libm '``exp``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.exp2.*``' Intrinsic
@@ -10750,7 +11226,7 @@ Semantics:
Return the same value as a corresponding libm '``exp2``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.log.*``' Intrinsic
@@ -10788,7 +11264,7 @@ Semantics:
Return the same value as a corresponding libm '``log``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.log10.*``' Intrinsic
@@ -10826,7 +11302,7 @@ Semantics:
Return the same value as a corresponding libm '``log10``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.log2.*``' Intrinsic
@@ -10864,7 +11340,7 @@ Semantics:
Return the same value as a corresponding libm '``log2``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.fma.*``' Intrinsic
@@ -10901,7 +11377,7 @@ Semantics:
Return the same value as a corresponding libm '``fma``' function but without
trapping or setting ``errno``.
-When specified with the fast-math-flag 'afn', the result may be approximated
+When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
'``llvm.fabs.*``' Intrinsic
@@ -10911,7 +11387,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.fabs`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -10931,7 +11407,7 @@ operand.
Arguments:
""""""""""
-The argument and return value are floating point numbers of the same
+The argument and return value are floating-point numbers of the same
type.
Semantics:
@@ -10947,7 +11423,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.minnum`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -10968,7 +11444,7 @@ arguments.
Arguments:
""""""""""
-The arguments and return value are floating point numbers of the same
+The arguments and return value are floating-point numbers of the same
type.
Semantics:
@@ -10989,7 +11465,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.maxnum`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -11010,7 +11486,7 @@ arguments.
Arguments:
""""""""""
-The arguments and return value are floating point numbers of the same
+The arguments and return value are floating-point numbers of the same
type.
Semantics:
@@ -11030,7 +11506,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.copysign`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -11050,7 +11526,7 @@ first operand and the sign of the second operand.
Arguments:
""""""""""
-The arguments and return value are floating point numbers of the same
+The arguments and return value are floating-point numbers of the same
type.
Semantics:
@@ -11066,7 +11542,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.floor`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -11085,7 +11561,7 @@ The '``llvm.floor.*``' intrinsics return the floor of the operand.
Arguments:
""""""""""
-The argument and return value are floating point numbers of the same
+The argument and return value are floating-point numbers of the same
type.
Semantics:
@@ -11101,7 +11577,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.ceil`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -11120,7 +11596,7 @@ The '``llvm.ceil.*``' intrinsics return the ceiling of the operand.
Arguments:
""""""""""
-The argument and return value are floating point numbers of the same
+The argument and return value are floating-point numbers of the same
type.
Semantics:
@@ -11136,7 +11612,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.trunc`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -11156,7 +11632,7 @@ nearest integer not larger in magnitude than the operand.
Arguments:
""""""""""
-The argument and return value are floating point numbers of the same
+The argument and return value are floating-point numbers of the same
type.
Semantics:
@@ -11172,7 +11648,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.rint`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -11193,7 +11669,7 @@ operand isn't an integer.
Arguments:
""""""""""
-The argument and return value are floating point numbers of the same
+The argument and return value are floating-point numbers of the same
type.
Semantics:
@@ -11209,7 +11685,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.nearbyint`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -11229,7 +11705,7 @@ nearest integer.
Arguments:
""""""""""
-The argument and return value are floating point numbers of the same
+The argument and return value are floating-point numbers of the same
type.
Semantics:
@@ -11245,7 +11721,7 @@ Syntax:
"""""""
This is an overloaded intrinsic. You can use ``llvm.round`` on any
-floating point or vector of floating point type. Not all targets support
+floating-point or vector of floating-point type. Not all targets support
all types however.
::
@@ -11265,7 +11741,7 @@ nearest integer.
Arguments:
""""""""""
-The argument and return value are floating point numbers of the same
+The argument and return value are floating-point numbers of the same
type.
Semantics:
@@ -11477,6 +11953,98 @@ then the result is the size in bits of the type of ``src`` if
.. _int_overflow:
+'``llvm.fshl.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.fshl`` on any
+integer bit width or any vector of integer elements. Not all targets
+support all bit widths or vector types, however.
+
+::
+
+ declare i8 @llvm.fshl.i8 (i8 %a, i8 %b, i8 %c)
+ declare i67 @llvm.fshl.i67(i67 %a, i67 %b, i67 %c)
+ declare <2 x i32> @llvm.fshl.v2i32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %c)
+
+Overview:
+"""""""""
+
+The '``llvm.fshl``' family of intrinsic functions performs a funnel shift left:
+the first two values are concatenated as { %a : %b } (%a is the most significant
+bits of the wide value), the combined value is shifted left, and the most
+significant bits are extracted to produce a result that is the same size as the
+original arguments. If the first 2 arguments are identical, this is equivalent
+to a rotate left operation. For vector types, the operation occurs for each
+element of the vector. The shift argument is treated as an unsigned amount
+modulo the element size of the arguments.
+
+Arguments:
+""""""""""
+
+The first two arguments are the values to be concatenated. The third
+argument is the shift amount. The arguments may be any integer type or a
+vector with integer element type. All arguments and the return value must
+have the same type.
+
+Example:
+""""""""
+
+.. code-block:: text
+
+ %r = call i8 @llvm.fshl.i8(i8 %x, i8 %y, i8 %z) ; %r = i8: msb_extract((concat(x, y) << (z % 8)), 8)
+ %r = call i8 @llvm.fshl.i8(i8 255, i8 0, i8 15) ; %r = i8: 128 (0b10000000)
+ %r = call i8 @llvm.fshl.i8(i8 15, i8 15, i8 11) ; %r = i8: 120 (0b01111000)
+ %r = call i8 @llvm.fshl.i8(i8 0, i8 255, i8 8) ; %r = i8: 0 (0b00000000)
+
+'``llvm.fshr.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.fshr`` on any
+integer bit width or any vector of integer elements. Not all targets
+support all bit widths or vector types, however.
+
+::
+
+ declare i8 @llvm.fshr.i8 (i8 %a, i8 %b, i8 %c)
+ declare i67 @llvm.fshr.i67(i67 %a, i67 %b, i67 %c)
+ declare <2 x i32> @llvm.fshr.v2i32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %c)
+
+Overview:
+"""""""""
+
+The '``llvm.fshr``' family of intrinsic functions performs a funnel shift right:
+the first two values are concatenated as { %a : %b } (%a is the most significant
+bits of the wide value), the combined value is shifted right, and the least
+significant bits are extracted to produce a result that is the same size as the
+original arguments. If the first 2 arguments are identical, this is equivalent
+to a rotate right operation. For vector types, the operation occurs for each
+element of the vector. The shift argument is treated as an unsigned amount
+modulo the element size of the arguments.
+
+Arguments:
+""""""""""
+
+The first two arguments are the values to be concatenated. The third
+argument is the shift amount. The arguments may be any integer type or a
+vector with integer element type. All arguments and the return value must
+have the same type.
+
+Example:
+""""""""
+
+.. code-block:: text
+
+ %r = call i8 @llvm.fshr.i8(i8 %x, i8 %y, i8 %z) ; %r = i8: lsb_extract((concat(x, y) >> (z % 8)), 8)
+ %r = call i8 @llvm.fshr.i8(i8 255, i8 0, i8 15) ; %r = i8: 254 (0b11111110)
+ %r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11) ; %r = i8: 225 (0b11100001)
+ %r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8) ; %r = i8: 255 (0b11111111)
+
Arithmetic with Overflow Intrinsics
-----------------------------------
@@ -11818,7 +12386,7 @@ Overview:
"""""""""
The '``llvm.canonicalize.*``' intrinsic returns the platform specific canonical
-encoding of a floating point number. This canonicalization is useful for
+encoding of a floating-point number. This canonicalization is useful for
implementing certain numeric primitives such as frexp. The canonical encoding is
defined by IEEE-754-2008 to be:
@@ -11836,7 +12404,7 @@ Examples of non-canonical encodings:
- x87 pseudo denormals, pseudo NaNs, pseudo Infinity, Unnormals. These are
converted to a canonical representation per hardware-specific protocol.
-- Many normal decimal floating point numbers have non-canonical alternative
+- Many normal decimal floating-point numbers have non-canonical alternative
encodings.
- Some machines, like GPUs or ARMv7 NEON, do not support subnormal values.
These are treated as non-canonical encodings of zero and will be flushed to
@@ -11870,7 +12438,7 @@ The canonicalization operation may be optimized away if:
- The input is known to be canonical. For example, it was produced by a
floating-point operation that is required by the standard to be canonical.
- The result is consumed only by (or fused with) other floating-point
- operations. That is, the bits of the floating point value are not examined.
+ operations. That is, the bits of the floating-point value are not examined.
'``llvm.fmuladd.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -11967,7 +12535,7 @@ Syntax:
Overview:
"""""""""
-The '``llvm.experimental.vector.reduce.fadd.*``' intrinsics do a floating point
+The '``llvm.experimental.vector.reduce.fadd.*``' intrinsics do a floating-point
``ADD`` reduction of a vector, returning the result as a scalar. The return type
matches the element-type of the vector input.
@@ -11983,7 +12551,7 @@ The first argument to this intrinsic is a scalar accumulator value, which is
only used when there are no fast-math flags attached. This argument may be undef
when fast-math flags are used.
-The second argument must be a vector of floating point values.
+The second argument must be a vector of floating-point values.
Examples:
"""""""""
@@ -12030,7 +12598,7 @@ Syntax:
Overview:
"""""""""
-The '``llvm.experimental.vector.reduce.fmul.*``' intrinsics do a floating point
+The '``llvm.experimental.vector.reduce.fmul.*``' intrinsics do a floating-point
``MUL`` reduction of a vector, returning the result as a scalar. The return type
matches the element-type of the vector input.
@@ -12046,7 +12614,7 @@ The first argument to this intrinsic is a scalar accumulator value, which is
only used when there are no fast-math flags attached. This argument may be undef
when fast-math flags are used.
-The second argument must be a vector of floating point values.
+The second argument must be a vector of floating-point values.
Examples:
"""""""""
@@ -12217,7 +12785,7 @@ Syntax:
Overview:
"""""""""
-The '``llvm.experimental.vector.reduce.fmax.*``' intrinsics do a floating point
+The '``llvm.experimental.vector.reduce.fmax.*``' intrinsics do a floating-point
``MAX`` reduction of a vector, returning the result as a scalar. The return type
matches the element-type of the vector input.
@@ -12226,7 +12794,7 @@ assume that NaNs are not present in the input vector.
Arguments:
""""""""""
-The argument to this intrinsic must be a vector of floating point values.
+The argument to this intrinsic must be a vector of floating-point values.
'``llvm.experimental.vector.reduce.fmin.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -12242,7 +12810,7 @@ Syntax:
Overview:
"""""""""
-The '``llvm.experimental.vector.reduce.fmin.*``' intrinsics do a floating point
+The '``llvm.experimental.vector.reduce.fmin.*``' intrinsics do a floating-point
``MIN`` reduction of a vector, returning the result as a scalar. The return type
matches the element-type of the vector input.
@@ -12251,16 +12819,16 @@ assume that NaNs are not present in the input vector.
Arguments:
""""""""""
-The argument to this intrinsic must be a vector of floating point values.
+The argument to this intrinsic must be a vector of floating-point values.
-Half Precision Floating Point Intrinsics
+Half Precision Floating-Point Intrinsics
----------------------------------------
-For most target platforms, half precision floating point is a
+For most target platforms, half precision floating-point is a
storage-only format. This means that it is a dense encoding (in memory)
but does not support computation in the format.
-This means that code must first load the half-precision floating point
+This means that code must first load the half-precision floating-point
value as an i16, then convert it to float with
:ref:`llvm.convert.from.fp16 <int_convert_from_fp16>`. Computation can
then be performed on the float value (including extending to double
@@ -12286,7 +12854,7 @@ Overview:
"""""""""
The '``llvm.convert.to.fp16``' intrinsic function performs a conversion from a
-conventional floating point type to half precision floating point format.
+conventional floating-point type to half precision floating-point format.
Arguments:
""""""""""
@@ -12298,7 +12866,7 @@ Semantics:
""""""""""
The '``llvm.convert.to.fp16``' intrinsic function performs a conversion from a
-conventional floating point format to half precision floating point format. The
+conventional floating-point format to half precision floating-point format. The
return value is an ``i16`` which contains the converted number.
Examples:
@@ -12326,8 +12894,8 @@ Overview:
"""""""""
The '``llvm.convert.from.fp16``' intrinsic function performs a
-conversion from half precision floating point format to single precision
-floating point format.
+conversion from half precision floating-point format to single precision
+floating-point format.
Arguments:
""""""""""
@@ -12339,8 +12907,8 @@ Semantics:
""""""""""
The '``llvm.convert.from.fp16``' intrinsic function performs a
-conversion from half single precision floating point format to single
-precision floating point format. The input half-float value is
+conversion from half single precision floating-point format to single
+precision floating-point format. The input half-float value is
represented by an ``i16`` value.
Examples:
@@ -12491,7 +13059,7 @@ LLVM provides intrinsics for predicated vector load and store operations. The pr
Syntax:
"""""""
-This is an overloaded intrinsic. The loaded data is a vector of any integer, floating point or pointer data type.
+This is an overloaded intrinsic. The loaded data is a vector of any integer, floating-point or pointer data type.
::
@@ -12536,7 +13104,7 @@ The result of this operation is equivalent to a regular vector load instruction
Syntax:
"""""""
-This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating point or pointer data type.
+This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating-point or pointer data type.
::
@@ -12586,7 +13154,7 @@ LLVM provides intrinsics for vector gather and scatter operations. They are simi
Syntax:
"""""""
-This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer, floating point or pointer data type gathered together into one vector.
+This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer, floating-point or pointer data type gathered together into one vector.
::
@@ -12640,7 +13208,7 @@ The semantics of this operation are equivalent to a sequence of conditional scal
Syntax:
"""""""
-This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
+This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating-point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
::
@@ -12685,6 +13253,126 @@ The '``llvm.masked.scatter``' intrinsics is designed for writing selected vector
store i32 %val7, i32* %ptr7, align 4
+Masked Vector Expanding Load and Compressing Store Intrinsics
+-------------------------------------------------------------
+
+LLVM provides intrinsics for expanding load and compressing store operations. Data selected from a vector according to a mask is stored in consecutive memory addresses (compressed store), and vice-versa (expanding load). These operations effective map to "if (cond.i) a[j++] = v.i" and "if (cond.i) v.i = a[j++]" patterns, respectively. Note that when the mask starts with '1' bits followed by '0' bits, these operations are identical to :ref:`llvm.masked.store <int_mstore>` and :ref:`llvm.masked.load <int_mload>`.
+
+.. _int_expandload:
+
+'``llvm.masked.expandload.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. Several values of integer, floating point or pointer data type are loaded from consecutive memory addresses and stored into the elements of a vector according to the mask.
+
+::
+
+ declare <16 x float> @llvm.masked.expandload.v16f32 (float* <ptr>, <16 x i1> <mask>, <16 x float> <passthru>)
+ declare <2 x i64> @llvm.masked.expandload.v2i64 (i64* <ptr>, <2 x i1> <mask>, <2 x i64> <passthru>)
+
+Overview:
+"""""""""
+
+Reads a number of scalar values sequentially from memory location provided in '``ptr``' and spreads them in a vector. The '``mask``' holds a bit for each vector lane. The number of elements read from memory is equal to the number of '1' bits in the mask. The loaded elements are positioned in the destination vector according to the sequence of '1' and '0' bits in the mask. E.g., if the mask vector is '10010001', "explandload" reads 3 values from memory addresses ptr, ptr+1, ptr+2 and places them in lanes 0, 3 and 7 accordingly. The masked-off lanes are filled by elements from the corresponding lanes of the '``passthru``' operand.
+
+
+Arguments:
+""""""""""
+
+The first operand is the base pointer for the load. It has the same underlying type as the element of the returned vector. The second operand, mask, is a vector of boolean values with the same number of elements as the return type. The third is a pass-through value that is used to fill the masked-off lanes of the result. The return type and the type of the '``passthru``' operand have the same vector type.
+
+Semantics:
+""""""""""
+
+The '``llvm.masked.expandload``' intrinsic is designed for reading multiple scalar values from adjacent memory addresses into possibly non-adjacent vector lanes. It is useful for targets that support vector expanding loads and allows vectorizing loop with cross-iteration dependency like in the following example:
+
+.. code-block:: c
+
+ // In this loop we load from B and spread the elements into array A.
+ double *A, B; int *C;
+ for (int i = 0; i < size; ++i) {
+ if (C[i] != 0)
+ A[i] = B[j++];
+ }
+
+
+.. code-block:: llvm
+
+ ; Load several elements from array B and expand them in a vector.
+ ; The number of loaded elements is equal to the number of '1' elements in the Mask.
+ %Tmp = call <8 x double> @llvm.masked.expandload.v8f64(double* %Bptr, <8 x i1> %Mask, <8 x double> undef)
+ ; Store the result in A
+ call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> %Tmp, <8 x double>* %Aptr, i32 8, <8 x i1> %Mask)
+
+ ; %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.
+ %MaskI = bitcast <8 x i1> %Mask to i8
+ %MaskIPopcnt = call i8 @llvm.ctpop.i8(i8 %MaskI)
+ %MaskI64 = zext i8 %MaskIPopcnt to i64
+ %BNextInd = add i64 %BInd, %MaskI64
+
+
+Other targets may support this intrinsic differently, for example, by lowering it into a sequence of conditional scalar load operations and shuffles.
+If all mask elements are '1', the intrinsic behavior is equivalent to the regular unmasked vector load.
+
+.. _int_compressstore:
+
+'``llvm.masked.compressstore.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. A number of scalar values of integer, floating point or pointer data type are collected from an input vector and stored into adjacent memory addresses. A mask defines which elements to collect from the vector.
+
+::
+
+ declare void @llvm.masked.compressstore.v8i32 (<8 x i32> <value>, i32* <ptr>, <8 x i1> <mask>)
+ declare void @llvm.masked.compressstore.v16f32 (<16 x float> <value>, float* <ptr>, <16 x i1> <mask>)
+
+Overview:
+"""""""""
+
+Selects elements from input vector '``value``' according to the '``mask``'. All selected elements are written into adjacent memory addresses starting at address '`ptr`', from lower to higher. The mask holds a bit for each vector lane, and is used to select elements to be stored. The number of elements to be stored is equal to the number of active bits in the mask.
+
+Arguments:
+""""""""""
+
+The first operand is the input vector, from which elements are collected and written to memory. The second operand is the base pointer for the store, it has the same underlying type as the element of the input vector operand. The third operand is the mask, a vector of boolean values. The mask and the input vector must have the same number of vector elements.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.masked.compressstore``' intrinsic is designed for compressing data in memory. It allows to collect elements from possibly non-adjacent lanes of a vector and store them contiguously in memory in one IR operation. It is useful for targets that support compressing store operations and allows vectorizing loops with cross-iteration dependences like in the following example:
+
+.. code-block:: c
+
+ // In this loop we load elements from A and store them consecutively in B
+ double *A, B; int *C;
+ for (int i = 0; i < size; ++i) {
+ if (C[i] != 0)
+ B[j++] = A[i]
+ }
+
+
+.. code-block:: llvm
+
+ ; Load elements from A.
+ %Tmp = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* %Aptr, i32 8, <8 x i1> %Mask, <8 x double> undef)
+ ; Store all selected elements consecutively in array B
+ call <void> @llvm.masked.compressstore.v8f64(<8 x double> %Tmp, double* %Bptr, <8 x i1> %Mask)
+
+ ; %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.
+ %MaskI = bitcast <8 x i1> %Mask to i8
+ %MaskIPopcnt = call i8 @llvm.ctpop.i8(i8 %MaskI)
+ %MaskI64 = zext i8 %MaskIPopcnt to i64
+ %BNextInd = add i64 %BInd, %MaskI64
+
+
+Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.
+
+
Memory Use Markers
------------------
@@ -12818,7 +13506,7 @@ Semantics:
This intrinsic indicates that the memory is mutable again.
-'``llvm.invariant.group.barrier``' Intrinsic
+'``llvm.launder.invariant.group``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -12829,40 +13517,82 @@ argument.
::
- declare i8* @llvm.invariant.group.barrier.p0i8(i8* <ptr>)
+ declare i8* @llvm.launder.invariant.group.p0i8(i8* <ptr>)
Overview:
"""""""""
-The '``llvm.invariant.group.barrier``' intrinsic can be used when an invariant
-established by invariant.group metadata no longer holds, to obtain a new pointer
-value that does not carry the invariant information.
+The '``llvm.launder.invariant.group``' intrinsic can be used when an invariant
+established by ``invariant.group`` metadata no longer holds, to obtain a new
+pointer value that carries fresh invariant group information. It is an
+experimental intrinsic, which means that its semantics might change in the
+future.
Arguments:
""""""""""
-The ``llvm.invariant.group.barrier`` takes only one argument, which is
-the pointer to the memory for which the ``invariant.group`` no longer holds.
+The ``llvm.launder.invariant.group`` takes only one argument, which is a pointer
+to the memory.
Semantics:
""""""""""
Returns another pointer that aliases its argument but which is considered different
for the purposes of ``load``/``store`` ``invariant.group`` metadata.
+It does not read any accessible memory and the execution can be speculated.
+
+'``llvm.strip.invariant.group``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Constrained Floating Point Intrinsics
+Syntax:
+"""""""
+This is an overloaded intrinsic. The memory object can belong to any address
+space. The returned pointer must belong to the same address space as the
+argument.
+
+::
+
+ declare i8* @llvm.strip.invariant.group.p0i8(i8* <ptr>)
+
+Overview:
+"""""""""
+
+The '``llvm.strip.invariant.group``' intrinsic can be used when an invariant
+established by ``invariant.group`` metadata no longer holds, to obtain a new pointer
+value that does not carry the invariant information. It is an experimental
+intrinsic, which means that its semantics might change in the future.
+
+
+Arguments:
+""""""""""
+
+The ``llvm.strip.invariant.group`` takes only one argument, which is a pointer
+to the memory.
+
+Semantics:
+""""""""""
+
+Returns another pointer that aliases its argument but which has no associated
+``invariant.group`` metadata.
+It does not read any memory and can be speculated.
+
+
+
+.. _constrainedfp:
+
+Constrained Floating-Point Intrinsics
-------------------------------------
-These intrinsics are used to provide special handling of floating point
-operations when specific rounding mode or floating point exception behavior is
+These intrinsics are used to provide special handling of floating-point
+operations when specific rounding mode or floating-point exception behavior is
required. By default, LLVM optimization passes assume that the rounding mode is
-round-to-nearest and that floating point exceptions will not be monitored.
+round-to-nearest and that floating-point exceptions will not be monitored.
Constrained FP intrinsics are used to support non-default rounding modes and
accurately preserve exception behavior without compromising LLVM's ability to
optimize FP code when the default behavior is used.
-Each of these intrinsics corresponds to a normal floating point operation. The
+Each of these intrinsics corresponds to a normal floating-point operation. The
first two arguments and the return value are the same as the corresponding FP
operation.
@@ -12897,7 +13627,7 @@ the specified rounding mode, but this is not guaranteed. Using a specific
non-dynamic rounding mode which does not match the actual rounding mode at
runtime results in undefined behavior.
-The fourth argument to the constrained floating point intrinsics specifies the
+The fourth argument to the constrained floating-point intrinsics specifies the
required exception behavior. This argument must be one of the following
strings:
@@ -12908,7 +13638,7 @@ strings:
"fpexcept.strict"
If this argument is "fpexcept.ignore" optimization passes may assume that the
-exception status flags will not be read and that floating point exceptions will
+exception status flags will not be read and that floating-point exceptions will
be masked. This allows transformations to be performed that may change the
exception semantics of the original code. For example, FP operations may be
speculatively executed in this case whereas they must not be for either of the
@@ -12922,7 +13652,7 @@ original code. For example, exceptions may be potentially hidden by constant
folding.
If the exception behavior argument is "fpexcept.strict" all transformations must
-strictly preserve the floating point exception semantics of the original code.
+strictly preserve the floating-point exception semantics of the original code.
Any FP exception that would have been raised by the original code must be raised
by the transformed code, and the transformed code must not raise any FP
exceptions that would not have been raised by the original code. This is the
@@ -12930,7 +13660,7 @@ exception behavior argument that will be used if the code being compiled reads
the FP exception status flags, but this mode can also be used with code that
unmasks FP exceptions.
-The number and order of floating point exceptions is NOT guaranteed. For
+The number and order of floating-point exceptions is NOT guaranteed. For
example, a series of FP operations that each may raise exceptions may be
vectorized into a single instruction that raises each unique exception a single
time.
@@ -12960,8 +13690,8 @@ Arguments:
""""""""""
The first two arguments to the '``llvm.experimental.constrained.fadd``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector <t_vector>`
-of floating point values. Both arguments must have identical types.
+intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
+of floating-point values. Both arguments must have identical types.
The third and fourth arguments specify the rounding mode and exception
behavior as described above.
@@ -12969,7 +13699,7 @@ behavior as described above.
Semantics:
""""""""""
-The value produced is the floating point sum of the two value operands and has
+The value produced is the floating-point sum of the two value operands and has
the same type as the operands.
@@ -12997,8 +13727,8 @@ Arguments:
""""""""""
The first two arguments to the '``llvm.experimental.constrained.fsub``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector <t_vector>`
-of floating point values. Both arguments must have identical types.
+intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
+of floating-point values. Both arguments must have identical types.
The third and fourth arguments specify the rounding mode and exception
behavior as described above.
@@ -13006,7 +13736,7 @@ behavior as described above.
Semantics:
""""""""""
-The value produced is the floating point difference of the two value operands
+The value produced is the floating-point difference of the two value operands
and has the same type as the operands.
@@ -13034,8 +13764,8 @@ Arguments:
""""""""""
The first two arguments to the '``llvm.experimental.constrained.fmul``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector <t_vector>`
-of floating point values. Both arguments must have identical types.
+intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
+of floating-point values. Both arguments must have identical types.
The third and fourth arguments specify the rounding mode and exception
behavior as described above.
@@ -13043,7 +13773,7 @@ behavior as described above.
Semantics:
""""""""""
-The value produced is the floating point product of the two value operands and
+The value produced is the floating-point product of the two value operands and
has the same type as the operands.
@@ -13071,8 +13801,8 @@ Arguments:
""""""""""
The first two arguments to the '``llvm.experimental.constrained.fdiv``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector <t_vector>`
-of floating point values. Both arguments must have identical types.
+intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
+of floating-point values. Both arguments must have identical types.
The third and fourth arguments specify the rounding mode and exception
behavior as described above.
@@ -13080,7 +13810,7 @@ behavior as described above.
Semantics:
""""""""""
-The value produced is the floating point quotient of the two value operands and
+The value produced is the floating-point quotient of the two value operands and
has the same type as the operands.
@@ -13108,18 +13838,18 @@ Arguments:
""""""""""
The first two arguments to the '``llvm.experimental.constrained.frem``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector <t_vector>`
-of floating point values. Both arguments must have identical types.
+intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
+of floating-point values. Both arguments must have identical types.
The third and fourth arguments specify the rounding mode and exception
behavior as described above. The rounding mode argument has no effect, since
the result of frem is never rounded, but the argument is included for
-consistency with the other constrained floating point intrinsics.
+consistency with the other constrained floating-point intrinsics.
Semantics:
""""""""""
-The value produced is the floating point remainder from the division of the two
+The value produced is the floating-point remainder from the division of the two
value operands and has the same type as the operands. The remainder has the
same sign as the dividend.
@@ -13146,8 +13876,8 @@ Arguments:
""""""""""
The first three arguments to the '``llvm.experimental.constrained.fma``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
-<t_vector>` of floating point values. All arguments must have identical types.
+intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector
+<t_vector>` of floating-point values. All arguments must have identical types.
The fourth and fifth arguments specify the rounding mode and exception behavior
as described above.
@@ -13162,15 +13892,15 @@ precision.
Constrained libm-equivalent Intrinsics
--------------------------------------
-In addition to the basic floating point operations for which constrained
+In addition to the basic floating-point operations for which constrained
intrinsics are described above, there are constrained versions of various
operations which provide equivalent behavior to a corresponding libm function.
These intrinsics allow the precise behavior of these operations with respect to
rounding mode and exception behavior to be controlled.
-As with the basic constrained floating point intrinsics, the rounding mode
+As with the basic constrained floating-point intrinsics, the rounding mode
and exception behavior arguments only control the behavior of the optimizer.
-They do not change the runtime floating point environment.
+They do not change the runtime floating-point environment.
'``llvm.experimental.constrained.sqrt``' Intrinsic
@@ -13196,7 +13926,7 @@ functions would, but without setting ``errno``.
Arguments:
""""""""""
-The first argument and the return type are floating point numbers of the same
+The first argument and the return type are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13206,8 +13936,8 @@ Semantics:
""""""""""
This function returns the nonnegative square root of the specified value.
-If the value is less than negative zero, a floating point exception occurs
-and the the return value is architecture specific.
+If the value is less than negative zero, a floating-point exception occurs
+and the return value is architecture specific.
'``llvm.experimental.constrained.pow``' Intrinsic
@@ -13232,7 +13962,7 @@ raised to the (positive or negative) power specified by the second operand.
Arguments:
""""""""""
-The first two arguments and the return value are floating point numbers of the
+The first two arguments and the return value are floating-point numbers of the
same type. The second argument specifies the power to which the first argument
should be raised.
@@ -13265,14 +13995,14 @@ Overview:
The '``llvm.experimental.constrained.powi``' intrinsic returns the first operand
raised to the (positive or negative) power specified by the second operand. The
-order of evaluation of multiplications is not defined. When a vector of floating
-point type is used, the second argument remains a scalar integer value.
+order of evaluation of multiplications is not defined. When a vector of
+floating-point type is used, the second argument remains a scalar integer value.
Arguments:
""""""""""
-The first argument and the return value are floating point numbers of the same
+The first argument and the return value are floating-point numbers of the same
type. The second argument is a 32-bit signed integer specifying the power to
which the first argument should be raised.
@@ -13308,7 +14038,7 @@ first operand.
Arguments:
""""""""""
-The first argument and the return type are floating point numbers of the same
+The first argument and the return type are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13344,7 +14074,7 @@ first operand.
Arguments:
""""""""""
-The first argument and the return type are floating point numbers of the same
+The first argument and the return type are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13380,7 +14110,7 @@ exponential of the specified value.
Arguments:
""""""""""
-The first argument and the return value are floating point numbers of the same
+The first argument and the return value are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13416,7 +14146,7 @@ exponential of the specified value.
Arguments:
""""""""""
-The first argument and the return value are floating point numbers of the same
+The first argument and the return value are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13451,7 +14181,7 @@ logarithm of the specified value.
Arguments:
""""""""""
-The first argument and the return value are floating point numbers of the same
+The first argument and the return value are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13487,7 +14217,7 @@ logarithm of the specified value.
Arguments:
""""""""""
-The first argument and the return value are floating point numbers of the same
+The first argument and the return value are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13522,7 +14252,7 @@ logarithm of the specified value.
Arguments:
""""""""""
-The first argument and the return value are floating point numbers of the same
+The first argument and the return value are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13552,13 +14282,13 @@ Overview:
"""""""""
The '``llvm.experimental.constrained.rint``' intrinsic returns the first
-operand rounded to the nearest integer. It may raise an inexact floating point
+operand rounded to the nearest integer. It may raise an inexact floating-point
exception if the operand is not an integer.
Arguments:
""""""""""
-The first argument and the return value are floating point numbers of the same
+The first argument and the return value are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13570,7 +14300,7 @@ Semantics:
This function returns the same values as the libm ``rint`` functions
would, and handles error conditions in the same way. The rounding mode is
described, not determined, by the rounding mode argument. The actual rounding
-mode is determined by the runtime floating point environment. The rounding
+mode is determined by the runtime floating-point environment. The rounding
mode argument is only intended as information to the compiler.
@@ -13591,14 +14321,14 @@ Overview:
"""""""""
The '``llvm.experimental.constrained.nearbyint``' intrinsic returns the first
-operand rounded to the nearest integer. It will not raise an inexact floating
-point exception if the operand is not an integer.
+operand rounded to the nearest integer. It will not raise an inexact
+floating-point exception if the operand is not an integer.
Arguments:
""""""""""
-The first argument and the return value are floating point numbers of the same
+The first argument and the return value are floating-point numbers of the same
type.
The second and third arguments specify the rounding mode and exception
@@ -13610,7 +14340,7 @@ Semantics:
This function returns the same values as the libm ``nearbyint`` functions
would, and handles error conditions in the same way. The rounding mode is
described, not determined, by the rounding mode argument. The actual rounding
-mode is determined by the runtime floating point environment. The rounding
+mode is determined by the runtime floating-point environment. The rounding
mode argument is only intended as information to the compiler.
@@ -13902,10 +14632,10 @@ The ``llvm.objectsize`` intrinsic takes three arguments. The first argument is
a pointer to or into the ``object``. The second argument determines whether
``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size
is unknown. The third argument controls how ``llvm.objectsize`` acts when
-``null`` is used as its pointer argument. If it's true and the pointer is in
-address space 0, ``null`` is treated as an opaque value with an unknown number
-of bytes. Otherwise, ``llvm.objectsize`` reports 0 bytes available when given
-``null``.
+``null`` in address space 0 is used as its pointer argument. If it's ``false``,
+``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if
+the ``null`` is in a non-zero address space or if ``true`` is given for the
+third argument of ``llvm.objectsize``, we assume its size is unknown.
The second and third arguments only accept constants.
@@ -14541,4 +15271,3 @@ lowered to a call to the symbol ``__llvm_memset_element_unordered_atomic_*``. Wh
is replaced with an actual element size.
The optimizer is allowed to inline the memory assignment when it's profitable to do so.
-
diff --git a/docs/Lexicon.rst b/docs/Lexicon.rst
index 0021bf8e00b1..981aaea961a4 100644
--- a/docs/Lexicon.rst
+++ b/docs/Lexicon.rst
@@ -133,6 +133,12 @@ H
I
-
+**ICE**
+ Internal Compiler Error. This abbreviation is used to describe errors
+ that occur in LLVM or Clang as they are compiling source code. For example,
+ if a valid C++ source program were to trigger an assert in Clang when
+ compiled, that could be referred to as an "ICE".
+
**IPA**
Inter-Procedural Analysis. Refers to any variety of code analysis that
occurs between procedures, functions or compilation units (modules).
diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst
index 7a105e5ed129..0737fbbcd930 100644
--- a/docs/LibFuzzer.rst
+++ b/docs/LibFuzzer.rst
@@ -75,11 +75,13 @@ Recent versions of Clang (starting from 6.0) include libFuzzer, and no extra ins
In order to build your fuzzer binary, use the `-fsanitize=fuzzer` flag during the
compilation and linking. In most cases you may want to combine libFuzzer with
-AddressSanitizer_ (ASAN), UndefinedBehaviorSanitizer_ (UBSAN), or both::
+AddressSanitizer_ (ASAN), UndefinedBehaviorSanitizer_ (UBSAN), or both. You can
+also build with MemorySanitizer_ (MSAN), but support is experimental::
clang -g -O1 -fsanitize=fuzzer mytarget.c # Builds the fuzz target w/o sanitizers
clang -g -O1 -fsanitize=fuzzer,address mytarget.c # Builds the fuzz target with ASAN
clang -g -O1 -fsanitize=fuzzer,signed-integer-overflow mytarget.c # Builds the fuzz target with a part of UBSAN
+ clang -g -O1 -fsanitize=fuzzer,memory mytarget.c # Builds the fuzz target with MSAN
This will perform the necessary instrumentation, as well as linking with the libFuzzer library.
Note that ``-fsanitize=fuzzer`` links in the libFuzzer's ``main()`` symbol.
@@ -93,10 +95,6 @@ instrumentation without linking::
Then libFuzzer can be linked to the desired driver by passing in
``-fsanitize=fuzzer`` during the linking stage.
-Using MemorySanitizer_ (MSAN) with libFuzzer is possible too, but tricky.
-The exact details are out of scope, we expect to simplify this in future
-versions.
-
.. _libfuzzer-corpus:
Corpus
@@ -369,14 +367,16 @@ possible event codes are:
Each output line also reports the following statistics (when non-zero):
``cov:``
- Total number of code blocks or edges covered by the executing the current
- corpus.
+ Total number of code blocks or edges covered by executing the current corpus.
``ft:``
libFuzzer uses different signals to evaluate the code coverage:
edge coverage, edge counters, value profiles, indirect caller/callee pairs, etc.
These signals combined are called *features* (`ft:`).
``corp:``
Number of entries in the current in-memory test corpus and its size in bytes.
+``lim:``
+ Current limit on the length of new entries in the corpus. Increases over time
+ until the max length (``-max_len``) is reached.
``exec/s:``
Number of fuzzer iterations per second.
``rss:``
diff --git a/docs/MIRLangRef.rst b/docs/MIRLangRef.rst
index 1176435c8761..9d65a5279e15 100644
--- a/docs/MIRLangRef.rst
+++ b/docs/MIRLangRef.rst
@@ -185,15 +185,15 @@ of such YAML document:
name: inc
tracksRegLiveness: true
liveins:
- - { reg: '%rdi' }
+ - { reg: '$rdi' }
body: |
bb.0.entry:
- liveins: %rdi
+ liveins: $rdi
- %eax = MOV32rm %rdi, 1, _, 0, _
- %eax = INC32r killed %eax, implicit-def dead %eflags
- MOV32mr killed %rdi, 1, _, 0, _, %eax
- RETQ %eax
+ $eax = MOV32rm $rdi, 1, _, 0, _
+ $eax = INC32r killed $eax, implicit-def dead $eflags
+ MOV32mr killed $rdi, 1, _, 0, _, $eax
+ RETQ $eax
...
The document above consists of attributes that represent the various
@@ -307,7 +307,7 @@ the instructions:
.. code-block:: text
bb.0.entry:
- liveins: %edi, %esi
+ liveins: $edi, $esi
The list of live in registers and successors can be empty. The language also
allows multiple live in register and successor lists - they are combined into
@@ -344,7 +344,7 @@ operand:
.. code-block:: text
- RETQ %eax
+ RETQ $eax
However, if the machine instruction has one or more explicitly defined register
operands, the instruction's name has to be specified after them. The example
@@ -353,7 +353,7 @@ defined register operands:
.. code-block:: text
- %sp, %fp, %lr = LDPXpost %sp, 2
+ $sp, $fp, $lr = LDPXpost $sp, 2
The instruction names are serialized using the exact definitions from the
target's ``*InstrInfo.td`` files, and they are case sensitive. This means that
@@ -365,40 +365,60 @@ machine instructions.
Instruction Flags
^^^^^^^^^^^^^^^^^
-The flag ``frame-setup`` can be specified before the instruction's name:
+The flag ``frame-setup`` or ``frame-destroy`` can be specified before the
+instruction's name:
.. code-block:: text
- %fp = frame-setup ADDXri %sp, 0, 0
+ $fp = frame-setup ADDXri $sp, 0, 0
+
+.. code-block:: text
+
+ $x21, $x20 = frame-destroy LDPXi $sp
.. _registers:
+Bundled Instructions
+^^^^^^^^^^^^^^^^^^^^
+
+The syntax for bundled instructions is the following:
+
+.. code-block:: text
+
+ BUNDLE implicit-def $r0, implicit-def $r1, implicit $r2 {
+ $r0 = SOME_OP $r2
+ $r1 = ANOTHER_OP internal $r0
+ }
+
+The first instruction is often a bundle header. The instructions between ``{``
+and ``}`` are bundled with the first instruction.
+
Registers
---------
Registers are one of the key primitives in the machine instructions
-serialization language. They are primarly used in the
+serialization language. They are primarily used in the
:ref:`register machine operands <register-operands>`,
but they can also be used in a number of other places, like the
:ref:`basic block's live in list <bb-liveins>`.
-The physical registers are identified by their name. They use the following
-syntax:
+The physical registers are identified by their name and by the '$' prefix sigil.
+They use the following syntax:
.. code-block:: text
- %<name>
+ $<name>
The example below shows three X86 physical registers:
.. code-block:: text
- %eax
- %r15
- %eflags
+ $eax
+ $r15
+ $eflags
-The virtual registers are identified by their ID number. They use the following
-syntax:
+The virtual registers are identified by their ID number and by the '%' sigil.
+They use the following syntax:
.. code-block:: text
@@ -411,7 +431,7 @@ Example:
%0
The null registers are represented using an underscore ('``_``'). They can also be
-represented using a '``%noreg``' named register, although the former syntax
+represented using a '``$noreg``' named register, although the former syntax
is preferred.
.. _machine-operands:
@@ -432,7 +452,7 @@ immediate machine operand ``-42``:
.. code-block:: text
- %eax = MOV32ri -42
+ $eax = MOV32ri -42
An immediate operand is also used to represent a subregister index when the
machine instruction has one of the following opcodes:
@@ -490,7 +510,7 @@ This example shows an instance of the X86 ``XOR32rr`` instruction that has
.. code-block:: text
- dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al
+ dead $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $al
.. _register-flags:
@@ -610,7 +630,7 @@ a global value operand named ``G``:
.. code-block:: text
- %rax = MOV64rm %rip, 1, _, @G, _
+ $rax = MOV64rm $rip, 1, _, @G, _
The named global values are represented using an identifier with the '@' prefix.
If the identifier doesn't match the regular expression
@@ -632,7 +652,7 @@ and the offset 8:
.. code-block:: text
- %sgpr2 = S_ADD_U32 _, target-index(amdgpu-constdata-start) + 8, implicit-def _, implicit-def _
+ $sgpr2 = S_ADD_U32 _, target-index(amdgpu-constdata-start) + 8, implicit-def _, implicit-def _
Jump-table Index Operands
^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -641,7 +661,7 @@ A jump-table index operand with the index 0 is printed as following:
.. code-block:: text
- tBR_JTr killed %r0, %jump-table.0
+ tBR_JTr killed $r0, %jump-table.0
A machine jump-table entry contains a list of ``MachineBasicBlocks``. When serializing all the function's jump-table entries, the following format is used:
@@ -670,7 +690,7 @@ Example:
External Symbol Operands
^^^^^^^^^^^^^^^^^^^^^^^^^
-An external symbol operand is represented using an identifier with the ``$``
+An external symbol operand is represented using an identifier with the ``&``
prefix. The identifier is surrounded with ""'s and escaped if it has any
special non-printable characters in it.
@@ -678,7 +698,7 @@ Example:
.. code-block:: text
- CALL64pcrel32 $__stack_chk_fail, csr_64, implicit %rsp, implicit-def %rsp
+ CALL64pcrel32 &__stack_chk_fail, csr_64, implicit $rsp, implicit-def $rsp
MCSymbol Operands
^^^^^^^^^^^^^^^^^
@@ -705,7 +725,7 @@ The syntax is:
.. code-block:: text
- CFI_INSTRUCTION offset %w30, -16
+ CFI_INSTRUCTION offset $w30, -16
which may be emitted later in the MC layer as:
@@ -722,7 +742,7 @@ The syntax for the ``returnaddress`` intrinsic is:
.. code-block:: text
- %x0 = COPY intrinsic(@llvm.returnaddress)
+ $x0 = COPY intrinsic(@llvm.returnaddress)
Predicate Operands
^^^^^^^^^^^^^^^^^^
@@ -738,7 +758,6 @@ For an int eq predicate ``ICMP_EQ``, the syntax is:
.. TODO: Describe the parsers default behaviour when optional YAML attributes
are missing.
-.. TODO: Describe the syntax for the bundled instructions.
.. TODO: Describe the syntax for virtual register YAML definitions.
.. TODO: Describe the machine function's YAML flag attributes.
.. TODO: Describe the syntax for the register mask machine operands.
diff --git a/docs/MemorySSA.rst b/docs/MemorySSA.rst
index 0249e702c037..1669117fcf56 100644
--- a/docs/MemorySSA.rst
+++ b/docs/MemorySSA.rst
@@ -79,7 +79,7 @@ viewing this example, it may be helpful to view it in terms of clobbers. The
operands of a given ``MemoryAccess`` are all (potential) clobbers of said
MemoryAccess, and the value produced by a ``MemoryAccess`` can act as a clobber
for other ``MemoryAccess``\ es. Another useful way of looking at it is in
-terms of heap versions. In that view, operands of of a given
+terms of heap versions. In that view, operands of a given
``MemoryAccess`` are the version of the heap before the operation, and
if the access produces a value, the value is the new version of the heap
after the operation.
diff --git a/docs/OptBisect.rst b/docs/OptBisect.rst
index 5a216d419a64..4dcbb76ccebe 100644
--- a/docs/OptBisect.rst
+++ b/docs/OptBisect.rst
@@ -166,7 +166,7 @@ A MachineFunctionPass should use FunctionPass::skipFunction() as such:
bool MyMachineFunctionPass::runOnMachineFunction(Function &MF) {
if (skipFunction(*MF.getFunction())
- return false;
+ return false;
// Otherwise, run the pass normally.
}
diff --git a/docs/PDB/MsfFile.rst b/docs/PDB/MsfFile.rst
index bdceca3aeb39..dfbbf9ded7fb 100644
--- a/docs/PDB/MsfFile.rst
+++ b/docs/PDB/MsfFile.rst
@@ -5,6 +5,44 @@ The MSF File Format
.. contents::
:local:
+.. _msf_layout:
+
+File Layout
+===========
+
+The MSF file format consists of the following components:
+
+1. :ref:`msf_superblock`
+2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
+3. Data
+
+Each component is stored as an indexed block, the length of which is specified
+in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
+following pattern (sometimes referred to as an "interval"):
+
+1. 1 block of data
+2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
+3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
+4. ``SuperBlock::BlockSize - 3`` blocks of data
+
+In the first interval, the first data block is used to store
+:ref:`msf_superblock`.
+
+The following diagram demonstrates the general layout of the file (\| denotes
+the end of an interval, and is for visualization purposes only):
+
++-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
+| Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
++=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
+| Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
++-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
+
+The file may end after any block, including immediately after a FPM1.
+
+.. note::
+ LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
+ variant), so the rest of this document will assume a block size of 4096.
+
.. _msf_superblock:
The Superblock
@@ -32,14 +70,9 @@ follows:
sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
- **FreeBlockMapBlock** - The index of a block within the file, at which begins
a bitfield representing the set of all blocks within the file which are "free"
- (i.e. the data within that block is not used). This bitfield is spread across
- the MSF file at ``BlockSize`` intervals.
- **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field
- is designed to support incremental and atomic updates of the underlying MSF
- file. While writing to an MSF file, if the value of this field is `1`, you
- can write your new modified bitfield to page 2, and vice versa. Only when
- you commit the file to disk do you need to swap the value in the SuperBlock
- to point to the new ``FreeBlockMapBlock``.
+ (i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for
+ more information.
+ **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
should equal the size of the file on disk.
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
@@ -53,7 +86,32 @@ follows:
contains the list of blocks that the stream directory occupies, and the stream
directory itself can be stitched together accordingly. The number of
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
-
+
+.. _msf_freeblockmap:
+
+The Free Block Map
+==================
+
+The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
+series of blocks which contains a bit flag for every block in the file. The
+flag will be set to 0 if the block is in use, and 1 if the block is unused.
+
+Each file contains two FPMs, one of which is active at any given time. This
+feature is designed to support incremental and atomic updates of the underlying
+MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
+write your new modified bitfield to FPM2, and vice versa. Only when you commit
+the file to disk do you need to swap the value in the SuperBlock to point to
+the new ``FreeBlockMapBlock``.
+
+The Free Block Maps are stored as a series of single blocks thoughout the file
+at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
+bytes, it contains 8 times as many bits as an interval has blocks. This means
+that the first block of each FPM refers to the first 8 intervals of the file
+(the first 32768 blocks), the second block of each FPM refers to the next 8
+blocks, and so on. This results in far more FPM blocks being present than are
+required, but in order to maintain backwards compatibility the format must stay
+this way.
+
The Stream Directory
====================
The Stream Directory is the root of all access to the other streams in an MSF
@@ -66,10 +124,10 @@ file. Beginning at byte 0 of the stream directory is the following structure:
ulittle32_t StreamSizes[NumStreams];
ulittle32_t StreamBlocks[NumStreams][];
};
-
+
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
Note that each of the last two arrays is of variable length, and in particular
-that the second array is jagged.
+that the second array is jagged.
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
@@ -97,7 +155,7 @@ like:
{10, 15, 12}
};
};
-
+
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
diff --git a/docs/Passes.rst b/docs/Passes.rst
index 77461f3c52d9..9ab214984d2d 100644
--- a/docs/Passes.rst
+++ b/docs/Passes.rst
@@ -83,6 +83,8 @@ Yet to be written.
A pass which can be used to count how many alias queries are being made and how
the alias analysis implementation being used responds.
+.. _passes-da:
+
``-da``: Dependence Analysis
----------------------------
@@ -641,6 +643,21 @@ not library calls are simplified is controlled by the
:ref:`-functionattrs <passes-functionattrs>` pass and LLVM's knowledge of
library calls on different targets.
+.. _passes-aggressive-instcombine:
+
+``-aggressive-instcombine``: Combine expression patterns
+--------------------------------------------------------
+
+Combine expression patterns to form expressions with fewer, simple instructions.
+This pass does not modify the CFG.
+
+For example, this pass reduce width of expressions post-dominated by TruncInst
+into smaller width when applicable.
+
+It differs from instcombine pass in that it contains pattern optimization that
+requires higher complexity than the O(1), thus, it should run fewer times than
+instcombine pass.
+
``-internalize``: Internalize Global Symbols
--------------------------------------------
@@ -810,6 +827,27 @@ This pass implements a simple loop unroller. It works best when loops have
been canonicalized by the :ref:`indvars <passes-indvars>` pass, allowing it to
determine the trip counts of loops easily.
+``-loop-unroll-and-jam``: Unroll and Jam loops
+----------------------------------------------
+
+This pass implements a simple unroll and jam classical loop optimisation pass.
+It transforms loop from:
+
+.. code-block:: c++
+
+ for i.. i+= 1 for i.. i+= 4
+ for j.. for j..
+ code(i, j) code(i, j)
+ code(i+1, j)
+ code(i+2, j)
+ code(i+3, j)
+ remainder loop
+
+Which can be seen as unrolling the outer loop and "jamming" (fusing) the inner
+loops into one. When variables or loads can be shared in the new inner loop, this
+can lead to significant performance improvements. It uses
+:ref:`Dependence Analysis <passes-da>` for proving the transformations are safe.
+
``-loop-unswitch``: Unswitch loops
----------------------------------
diff --git a/docs/Phabricator.rst b/docs/Phabricator.rst
index cc8484cc1e3e..53cb3b5980a9 100644
--- a/docs/Phabricator.rst
+++ b/docs/Phabricator.rst
@@ -38,6 +38,8 @@ the command line. To get you set up, follow the
You can learn more about how to use arc to interact with
Phabricator in the `Arcanist User Guide`_.
+.. _phabricator-request-review-web:
+
Requesting a review via the web interface
-----------------------------------------
@@ -63,15 +65,16 @@ To upload a new patch:
* Click *Differential*.
* Click *+ Create Diff*.
* Paste the text diff or browse to the patch file. Click *Create Diff*.
-* Leave the Repository field blank.
+* Leave this first Repository field blank. (We'll fill in the Repository
+ later, when sending the review.)
* Leave the drop down on *Create a new Revision...* and click *Continue*.
* Enter a descriptive title and summary. The title and summary are usually
in the form of a :ref:`commit message <commit messages>`.
-* Add reviewers (see below for advice) and subscribe mailing
- lists that you want to be included in the review. If your patch is
- for LLVM, add llvm-commits as a Subscriber; if your patch is for Clang,
- add cfe-commits.
-* Leave the Repository and Project fields blank.
+* Add reviewers (see below for advice). (If you set the Repository field
+ correctly, llvm-commits or cfe-commits will be subscribed automatically;
+ otherwise, you will have to manually subscribe them.)
+* In the Repository field, enter the name of the project (LLVM, Clang,
+ etc.) to which the review should be sent.
* Click *Save*.
To submit an updated patch:
@@ -81,7 +84,8 @@ To submit an updated patch:
* Paste the updated diff or browse to the updated patch file. Click *Create Diff*.
* Select the review you want to from the *Attach To* dropdown and click
*Continue*.
-* Leave the Repository and Project fields blank.
+* Leave the Repository field blank. (We previously filled out the Repository
+ for the review request.)
* Add comments about the changes in the new diff. Click *Save*.
Choosing reviewers: You typically pick one or two people as initial reviewers.
diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst
index 07048a52319e..d8016184c744 100644
--- a/docs/ProgrammersManual.rst
+++ b/docs/ProgrammersManual.rst
@@ -1020,8 +1020,8 @@ be passed by value.
.. _DEBUG:
-The ``DEBUG()`` macro and ``-debug`` option
--------------------------------------------
+The ``LLVM_DEBUG()`` macro and ``-debug`` option
+------------------------------------------------
Often when working on your pass you will put a bunch of debugging printouts and
other code into your pass. After you get it working, you want to remove it, but
@@ -1033,14 +1033,14 @@ them out, allowing you to enable them if you need them in the future.
The ``llvm/Support/Debug.h`` (`doxygen
<http://llvm.org/doxygen/Debug_8h_source.html>`__) file provides a macro named
-``DEBUG()`` that is a much nicer solution to this problem. Basically, you can
-put arbitrary code into the argument of the ``DEBUG`` macro, and it is only
+``LLVM_DEBUG()`` that is a much nicer solution to this problem. Basically, you can
+put arbitrary code into the argument of the ``LLVM_DEBUG`` macro, and it is only
executed if '``opt``' (or any other tool) is run with the '``-debug``' command
line argument:
.. code-block:: c++
- DEBUG(dbgs() << "I am here!\n");
+ LLVM_DEBUG(dbgs() << "I am here!\n");
Then you can run your pass like this:
@@ -1051,13 +1051,13 @@ Then you can run your pass like this:
$ opt < a.bc > /dev/null -mypass -debug
I am here!
-Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not
+Using the ``LLVM_DEBUG()`` macro instead of a home-brewed solution allows you to not
have to create "yet another" command line option for the debug output for your
-pass. Note that ``DEBUG()`` macros are disabled for non-asserts builds, so they
+pass. Note that ``LLVM_DEBUG()`` macros are disabled for non-asserts builds, so they
do not cause a performance impact at all (for the same reason, they should also
not contain side-effects!).
-One additional nice thing about the ``DEBUG()`` macro is that you can enable or
+One additional nice thing about the ``LLVM_DEBUG()`` macro is that you can enable or
disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set
DebugFlag=1``" from the gdb if the program is running. If the program hasn't
been started yet, you can always just run it with ``-debug``.
@@ -1076,10 +1076,10 @@ follows:
.. code-block:: c++
#define DEBUG_TYPE "foo"
- DEBUG(dbgs() << "'foo' debug type\n");
+ LLVM_DEBUG(dbgs() << "'foo' debug type\n");
#undef DEBUG_TYPE
#define DEBUG_TYPE "bar"
- DEBUG(dbgs() << "'bar' debug type\n");
+ LLVM_DEBUG(dbgs() << "'bar' debug type\n");
#undef DEBUG_TYPE
Then you can run your pass like this:
@@ -1435,7 +1435,7 @@ order (so you can do pointer arithmetic between elements), supports efficient
push_back/pop_back operations, supports efficient random access to its elements,
etc.
-The advantage of SmallVector is that it allocates space for some number of
+The main advantage of SmallVector is that it allocates space for some number of
elements (N) **in the object itself**. Because of this, if the SmallVector is
dynamically smaller than N, no malloc is performed. This can be a big win in
cases where the malloc/free call is far more expensive than the code that
@@ -1450,6 +1450,21 @@ SmallVectors are most useful when on the stack.
SmallVector also provides a nice portable and efficient replacement for
``alloca``.
+SmallVector has grown a few other minor advantages over std::vector, causing
+``SmallVector<Type, 0>`` to be preferred over ``std::vector<Type>``.
+
+#. std::vector is exception-safe, and some implementations have pessimizations
+ that copy elements when SmallVector would move them.
+
+#. SmallVector understands ``isPodLike<Type>`` and uses realloc aggressively.
+
+#. Many LLVM APIs take a SmallVectorImpl as an out parameter (see the note
+ below).
+
+#. SmallVector with N equal to 0 is smaller than std::vector on 64-bit
+ platforms, since it uses ``unsigned`` (instead of ``void*``) for its size
+ and capacity.
+
.. note::
Prefer to use ``SmallVectorImpl<T>`` as a parameter type.
@@ -1482,12 +1497,10 @@ SmallVector also provides a nice portable and efficient replacement for
<vector>
^^^^^^^^
-``std::vector`` is well loved and respected. It is useful when SmallVector
-isn't: when the size of the vector is often large (thus the small optimization
-will rarely be a benefit) or if you will be allocating many instances of the
-vector itself (which would waste space for elements that aren't in the
-container). vector is also useful when interfacing with code that expects
-vectors :).
+``std::vector<T>`` is well loved and respected. However, ``SmallVector<T, 0>``
+is often a better option due to the advantages listed above. std::vector is
+still useful when you need to store more than ``UINT32_MAX`` elements or when
+interfacing with code that expects vectors :).
One worthwhile note about std::vector: avoid code like this:
@@ -1832,7 +1845,7 @@ A sorted 'vector'
^^^^^^^^^^^^^^^^^
If you intend to insert a lot of elements, then do a lot of queries, a great
-approach is to use a vector (or other sequential container) with
+approach is to use an std::vector (or other sequential container) with
std::sort+std::unique to remove duplicates. This approach works really well if
your usage pattern has these two distinct phases (insert then query), and can be
coupled with a good choice of :ref:`sequential container <ds_sequential>`.
@@ -2984,7 +2997,7 @@ Conceptually, ``LLVMContext`` provides isolation. Every LLVM entity
in-memory IR belongs to an ``LLVMContext``. Entities in different contexts
*cannot* interact with each other: ``Module``\ s in different contexts cannot be
linked together, ``Function``\ s cannot be added to ``Module``\ s in different
-contexts, etc. What this means is that is is safe to compile on multiple
+contexts, etc. What this means is that is safe to compile on multiple
threads simultaneously, as long as no two threads operate on entities within the
same context.
@@ -3721,7 +3734,7 @@ Important Subclasses of the ``Instruction`` class
* ``CmpInst``
- This subclass respresents the two comparison instructions,
+ This subclass represents the two comparison instructions,
`ICmpInst <LangRef.html#i_icmp>`_ (integer opreands), and
`FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands).
diff --git a/docs/Proposals/VectorizationPlan.rst b/docs/Proposals/VectorizationPlan.rst
index f9700d177d23..6d6a38890c06 100644
--- a/docs/Proposals/VectorizationPlan.rst
+++ b/docs/Proposals/VectorizationPlan.rst
@@ -212,7 +212,7 @@ Related LLVM components
Polly [7]_.
3. Loop Vectorizer: the Vectorization Plan aims to upgrade the infrastructure of
- the Loop Vectorizer and extend it to handle outer loops [8,9]_.
+ the Loop Vectorizer and extend it to handle outer loops [8]_, [9]_.
References
----------
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index 41b9cf92d767..a6942c019141 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -1,12 +1,12 @@
========================
-LLVM 6.0.0 Release Notes
+LLVM 7.0.0 Release Notes
========================
.. contents::
:local:
.. warning::
- These are in-progress notes for the upcoming LLVM 6 release.
+ These are in-progress notes for the upcoming LLVM 7 release.
Release notes for previous releases can be found on
`the Download Page <http://releases.llvm.org/download.html>`_.
@@ -15,7 +15,7 @@ Introduction
============
This document contains the release notes for the LLVM Compiler Infrastructure,
-release 5.0.0. Here we describe the status of LLVM, including major improvements
+release 7.0.0. Here we describe the status of LLVM, including major improvements
from the previous release, improvements in various subprojects of LLVM, and
some of the current users of the code. All LLVM releases may be downloaded
from the `LLVM releases web site <http://llvm.org/releases/>`_.
@@ -40,19 +40,74 @@ Non-comprehensive list of changes in this release
functionality, or simply have a lot to talk about), see the `NOTE` below
for adding a new subsection.
-* The ``Redirects`` argument of ``llvm::sys::ExecuteAndWait`` and
- ``llvm::sys::ExecuteNoWait`` was changed to an ``ArrayRef`` of optional
- ``StringRef``'s to make it safer and more convenient to use.
+* Libraries have been renamed from 7.0 to 7. This change also impacts
+ downstream libraries like lldb.
-* The backend name was added to the Target Registry to allow run-time
- information to be fed back into TableGen. Out-of-tree targets will need to add
- the name used in the `def X : Target` definition to the call to
- `RegisterTarget`.
+* The LoopInstSimplify pass (-loop-instsimplify) has been removed.
-* The ``Debugify`` pass was added to ``opt`` to facilitate testing of debug
- info preservation. This pass attaches synthetic ``DILocations`` and
- ``DIVariables`` to the instructions in a ``Module``. The ``CheckDebugify``
- pass determines how much of the metadata is lost.
+* Symbols starting with ``?`` are no longer mangled by LLVM when using the
+ Windows ``x`` or ``w`` IR mangling schemes.
+
+* A new tool named :doc:`llvm-exegesis <CommandGuide/llvm-exegesis>` has been
+ added. :program:`llvm-exegesis` automatically measures instruction scheduling
+ properties (latency/uops) and provides a principled way to edit scheduling
+ models.
+
+* A new tool named :doc:`llvm-mca <CommandGuide/llvm-mca>` has been added.
+ :program:`llvm-mca` is a static performance analysis tool that uses
+ information available in LLVM to statically predict the performance of
+ machine code for a specific CPU.
+
+* The optimization flag to merge constants (-fmerge-all-constants) is no longer
+ applied by default.
+
+* Optimization of floating-point casts is improved. This may cause surprising
+ results for code that is relying on the undefined behavior of overflowing
+ casts. The optimization can be disabled by specifying a function attribute:
+ "strict-float-cast-overflow"="false". This attribute may be created by the
+ clang option :option:`-fno-strict-float-cast-overflow`.
+ Code sanitizers can be used to detect affected patterns. The option for
+ detecting this problem alone is "-fsanitize=float-cast-overflow":
+
+.. code-block:: c
+
+ int main() {
+ float x = 4294967296.0f;
+ x = (float)((int)x);
+ printf("junk in the ftrunc: %f\n", x);
+ return 0;
+ }
+
+.. code-block:: bash
+
+ clang -O1 ftrunc.c -fsanitize=float-cast-overflow ; ./a.out
+ ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of representable values of type 'int'
+ junk in the ftrunc: 0.000000
+
+* ``LLVM_ON_WIN32`` is no longer set by ``llvm/Config/config.h`` and
+ ``llvm/Config/llvm-config.h``. If you used this macro, use the compiler-set
+ ``_WIN32`` instead which is set exactly when ``LLVM_ON_WIN32`` used to be set.
+
+* The ``DEBUG`` macro has been renamed to ``LLVM_DEBUG``, the interface remains
+ the same. If you used this macro you need to migrate to the new one.
+ You should also clang-format your code to make it easier to integrate future
+ changes locally. This can be done with the following bash commands:
+
+.. code-block:: bash
+
+ git grep -l 'DEBUG' | xargs perl -pi -e 's/\bDEBUG\s?\(/LLVM_DEBUG(/g'
+ git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
+
+* Early support for UBsan, X-Ray instrumentation and libFuzzer (x86 and x86_64) for OpenBSD. Support for MSan
+ (x86_64), X-Ray instrumentation and libFuzzer (x86 and x86_64) for FreeBSD.
+
+* ``SmallVector<T, 0>`` shrank from ``sizeof(void*) * 4 + sizeof(T)`` to
+ ``sizeof(void*) + sizeof(unsigned) * 2``, smaller than ``std::vector<T>`` on
+ 64-bit platforms. The maximum capacity is now restricted to ``UINT32_MAX``.
+ Since SmallVector doesn't have the exception-safety pessimizations some
+ implementations saddle std::vector with and is better at using ``realloc``,
+ it's now a better choice even on the heap (although when TinyPtrVector works,
+ it's even smaller).
* Note..
@@ -69,6 +124,14 @@ Non-comprehensive list of changes in this release
Changes to the LLVM IR
----------------------
+* The signatures for the builtins @llvm.memcpy, @llvm.memmove, and @llvm.memset
+ have changed. Alignment is no longer an argument, and are instead conveyed as
+ parameter attributes.
+
+* invariant.group.barrier has been renamed to launder.invariant.group.
+
+* invariant.group metadata can now refer only empty metadata nodes.
+
Changes to the ARM Backend
--------------------------
@@ -104,16 +167,26 @@ Changes to the AVR Target
Changes to the OCaml bindings
-----------------------------
- During this release ...
+* Remove ``add_bb_vectorize``.
Changes to the C API
--------------------
- During this release ...
+* Remove ``LLVMAddBBVectorizePass``. The implementation was removed and the C
+ interface was made a deprecated no-op in LLVM 5. Use
+ ``LLVMAddSLPVectorizePass`` instead to get the supported SLP vectorizer.
+
+Changes to the DAG infrastructure
+---------------------------------
+* ADDC/ADDE/SUBC/SUBE are now deprecated and will default to expand. Backends
+ that wish to continue to use these opcodes should explicitely request so
+ using ``setOperationAction`` in their ``TargetLowering``. New backends
+ should use UADDO/ADDCARRY/USUBO/SUBCARRY instead of the deprecated opcodes.
+* The SETCCE opcode has now been removed in favor of SETCCCARRY.
-External Open Source Projects Using LLVM 6
+External Open Source Projects Using LLVM 7
==========================================
* A project...
diff --git a/docs/ReleaseProcess.rst b/docs/ReleaseProcess.rst
index d7f703126019..5822360cd1df 100644
--- a/docs/ReleaseProcess.rst
+++ b/docs/ReleaseProcess.rst
@@ -9,9 +9,9 @@ How To Validate a New Release
Introduction
============
-This document contains information about testing the release candidates that will
-ultimately be the next LLVM release. For more information on how to manage the
-actual release, please refer to :doc:`HowToReleaseLLVM`.
+This document contains information about testing the release candidates that
+will ultimately be the next LLVM release. For more information on how to
+manage the actual release, please refer to :doc:`HowToReleaseLLVM`.
Overview of the Release Process
-------------------------------
@@ -21,26 +21,28 @@ and it'll be the role of each volunteer to:
* Test and benchmark the previous release
-* Test and benchmark each release candidate, comparing to the previous release and candidates
+* Test and benchmark each release candidate, comparing to the previous release
+ and candidates
* Identify, reduce and report every regression found during tests and benchmarks
* Make sure the critical bugs get fixed and merged to the next release candidate
Not all bugs or regressions are show-stoppers and it's a bit of a grey area what
-should be fixed before the next candidate and what can wait until the next release.
+should be fixed before the next candidate and what can wait until the next
+release.
It'll depend on:
-* The severity of the bug, how many people it affects and if it's a regression or a
- known bug. Known bugs are "unsupported features" and some bugs can be disabled if
- they have been implemented recently.
+* The severity of the bug, how many people it affects and if it's a regression
+ or a known bug. Known bugs are "unsupported features" and some bugs can be
+ disabled if they have been implemented recently.
-* The stage in the release. Less critical bugs should be considered to be fixed between
- RC1 and RC2, but not so much at the end of it.
+* The stage in the release. Less critical bugs should be considered to be
+ fixed between RC1 and RC2, but not so much at the end of it.
-* If it's a correctness or a performance regression. Performance regression tends to be
- taken more lightly than correctness.
+* If it's a correctness or a performance regression. Performance regression
+ tends to be taken more lightly than correctness.
.. _scripts:
@@ -52,10 +54,12 @@ The scripts are in the ``utils/release`` directory.
test-release.sh
---------------
-This script will check-out, configure and compile LLVM+Clang (+ most add-ons, like ``compiler-rt``,
-``libcxx``, ``libomp`` and ``clang-extra-tools``) in three stages, and will test the final stage.
-It'll have installed the final binaries on the Phase3/Releasei(+Asserts) directory, and
-that's the one you should use for the test-suite and other external tests.
+This script will check-out, configure and compile LLVM+Clang (+ most add-ons,
+like ``compiler-rt``, ``libcxx``, ``libomp`` and ``clang-extra-tools``) in
+three stages, and will test the final stage.
+It'll have installed the final binaries on the Phase3/Releasei(+Asserts)
+directory, and that's the one you should use for the test-suite and other
+external tests.
To run the script on a specific release candidate run::
@@ -66,25 +70,32 @@ To run the script on a specific release candidate run::
-test-asserts \
-no-compare-files
-Each system will require different options. For instance, x86_64 will obviously not need
-``-no-64bit`` while 32-bit systems will, or the script will fail.
+Each system will require different options. For instance, x86_64 will
+obviously not need ``-no-64bit`` while 32-bit systems will, or the script will
+fail.
The important flags to get right are:
-* On the pre-release, you should change ``-rc 1`` to ``-final``. On RC2, change it to ``-rc 2`` and so on.
+* On the pre-release, you should change ``-rc 1`` to ``-final``. On RC2,
+ change it to ``-rc 2`` and so on.
-* On non-release testing, you can use ``-final`` in conjunction with ``-no-checkout``, but you'll have to
- create the ``final`` directory by hand and link the correct source dir to ``final/llvm.src``.
+* On non-release testing, you can use ``-final`` in conjunction with
+ ``-no-checkout``, but you'll have to create the ``final`` directory by hand
+ and link the correct source dir to ``final/llvm.src``.
-* For release candidates, you need ``-test-asserts``, or it won't create a "Release+Asserts" directory,
- which is needed for release testing and benchmarking. This will take twice as long.
+* For release candidates, you need ``-test-asserts``, or it won't create a
+ "Release+Asserts" directory, which is needed for release testing and
+ benchmarking. This will take twice as long.
-* On the final candidate you just need Release builds, and that's the binary directory you'll have to pack.
+* On the final candidate you just need Release builds, and that's the binary
+ directory you'll have to pack.
-This script builds three phases of Clang+LLVM twice each (Release and Release+Asserts), so use
-screen or nohup to avoid headaches, since it'll take a long time.
+This script builds three phases of Clang+LLVM twice each (Release and
+Release+Asserts), so use screen or nohup to avoid headaches, since it'll take
+a long time.
-Use the ``--help`` option to see all the options and chose it according to your needs.
+Use the ``--help`` option to see all the options and chose it according to
+your needs.
findRegressions-nightly.py
@@ -100,9 +111,12 @@ Test Suite
.. contents::
:local:
-Follow the `LNT Quick Start Guide <http://llvm.org/docs/lnt/quickstart.html>`__ link on how to set-up the test-suite
+Follow the `LNT Quick Start Guide
+<http://llvm.org/docs/lnt/quickstart.html>`__ link on how to set-up the
+test-suite
-The binary location you'll have to use for testing is inside the ``rcN/Phase3/Release+Asserts/llvmCore-REL-RC.install``.
+The binary location you'll have to use for testing is inside the
+``rcN/Phase3/Release+Asserts/llvmCore-REL-RC.install``.
Link that directory to an easier location and run the test-suite.
An example on the run command line, assuming you created a link from the correct
@@ -116,13 +130,16 @@ install directory to ``~/devel/llvm/install``::
--cc ~/devel/llvm/install/bin/clang \
--cxx ~/devel/llvm/install/bin/clang++
-It should have no new regressions, compared to the previous release or release candidate. You don't need to fix
-all the bugs in the test-suite, since they're not necessarily meant to pass on all architectures all the time. This is
-due to the nature of the result checking, which relies on direct comparison, and most of the time, the failures are
-related to bad output checking, rather than bad code generation.
+It should have no new regressions, compared to the previous release or release
+candidate. You don't need to fix all the bugs in the test-suite, since they're
+not necessarily meant to pass on all architectures all the time. This is
+due to the nature of the result checking, which relies on direct comparison,
+and most of the time, the failures are related to bad output checking, rather
+than bad code generation.
-If the errors are in LLVM itself, please report every single regression found as blocker, and all the other bugs
-as important, but not necessarily blocking the release to proceed. They can be set as "known failures" and to be
+If the errors are in LLVM itself, please report every single regression found
+as blocker, and all the other bugs as important, but not necessarily blocking
+the release to proceed. They can be set as "known failures" and to be
fix on a future date.
.. _pre-release-process:
@@ -134,23 +151,26 @@ Pre-Release Process
:local:
When the release process is announced on the mailing list, you should prepare
-for the testing, by applying the same testing you'll do on the release candidates,
-on the previous release.
+for the testing, by applying the same testing you'll do on the release
+candidates, on the previous release.
You should:
-* Download the previous release sources from http://llvm.org/releases/download.html.
+* Download the previous release sources from
+ http://llvm.org/releases/download.html.
-* Run the test-release.sh script on ``final`` mode (change ``-rc 1`` to ``-final``).
+* Run the test-release.sh script on ``final`` mode (change ``-rc 1`` to
+ ``-final``).
* Once all three stages are done, it'll test the final stage.
-* Using the ``Phase3/Release+Asserts/llvmCore-MAJ.MIN-final.install`` base, run the test-suite.
+* Using the ``Phase3/Release+Asserts/llvmCore-MAJ.MIN-final.install`` base,
+ run the test-suite.
-If the final phase's ``make check-all`` failed, it's a good idea to also test the
-intermediate stages by going on the obj directory and running ``make check-all`` to find
-if there's at least one stage that passes (helps when reducing the error for bug report
-purposes).
+If the final phase's ``make check-all`` failed, it's a good idea to also test
+the intermediate stages by going on the obj directory and running
+``make check-all`` to find if there's at least one stage that passes (helps
+when reducing the error for bug report purposes).
.. _release-process:
@@ -166,22 +186,23 @@ to them), and run the release test as above.
You should:
-* Download the current candidate sources from where the release manager points you
- (ex. http://llvm.org/pre-releases/3.3/rc1/).
+* Download the current candidate sources from where the release manager points
+ you (ex. http://llvm.org/pre-releases/3.3/rc1/).
-* Repeat the steps above with ``-rc 1``, ``-rc 2`` etc modes and run the test-suite
- the same way.
+* Repeat the steps above with ``-rc 1``, ``-rc 2`` etc modes and run the
+ test-suite the same way.
* Compare the results, report all errors on Bugzilla and publish the binary blob
where the release manager can grab it.
-Once the release manages announces that the latest candidate is the good one, you
-have to pack the ``Release`` (no Asserts) install directory on ``Phase3`` and that
-will be the official binary.
+Once the release manages announces that the latest candidate is the good one,
+you have to pack the ``Release`` (no Asserts) install directory on ``Phase3``
+and that will be the official binary.
* Rename (or link) ``clang+llvm-REL-ARCH-ENV`` to the .install directory
-* Tar that into the same name with ``.tar.gz`` extensioan from outside the directory
+* Tar that into the same name with ``.tar.gz`` extensioan from outside the
+ directory
* Make it available for the release manager to download
@@ -196,15 +217,15 @@ Bug Reporting Process
If you found regressions or failures when comparing a release candidate with the
previous release, follow the rules below:
-* Critical bugs on compilation should be fixed as soon as possible, possibly before
- releasing the binary blobs.
+* Critical bugs on compilation should be fixed as soon as possible, possibly
+ before releasing the binary blobs.
-* Check-all tests should be fixed before the next release candidate, but can wait
- until the test-suite run is finished.
+* Check-all tests should be fixed before the next release candidate, but can
+ wait until the test-suite run is finished.
* Bugs in the test suite or unimportant check-all tests can be fixed in between
release candidates.
-* New features or recent big changes, when close to the release, should have done
- in a way that it's easy to disable. If they misbehave, prefer disabling them than
- releasing an unstable (but untested) binary package.
+* New features or recent big changes, when close to the release, should have
+ done in a way that it's easy to disable. If they misbehave, prefer disabling
+ them than releasing an unstable (but untested) binary package.
diff --git a/docs/ScudoHardenedAllocator.rst b/docs/ScudoHardenedAllocator.rst
index 562a39144829..fcd5cefdac6d 100644
--- a/docs/ScudoHardenedAllocator.rst
+++ b/docs/ScudoHardenedAllocator.rst
@@ -18,7 +18,8 @@ Currently, the allocator supports (was tested on) the following architectures:
- i386 (& i686) (32-bit);
- x86_64 (64-bit);
- armhf (32-bit);
-- AArch64 (64-bit).
+- AArch64 (64-bit);
+- MIPS (32-bit & 64-bit).
The name "Scudo" has been retained from the initial implementation (Escudo
meaning Shield in Spanish and Portuguese).
@@ -26,32 +27,45 @@ meaning Shield in Spanish and Portuguese).
Design
======
+Allocator
+---------
+Scudo can be considered a Frontend to the Sanitizers' common allocator (later
+referenced as the Backend). It is split between a Primary allocator, fast and
+efficient, that services smaller allocation sizes, and a Secondary allocator
+that services larger allocation sizes and is backed by the operating system
+memory mapping primitives.
+
+Scudo was designed with security in mind, but aims at striking a good balance
+between security and performance. It is highly tunable and configurable.
+
Chunk Header
------------
Every chunk of heap memory will be preceded by a chunk header. This has two
purposes, the first one being to store various information about the chunk,
the second one being to detect potential heap overflows. In order to achieve
-this, the header will be checksumed, involving the pointer to the chunk itself
+this, the header will be checksummed, involving the pointer to the chunk itself
and a global secret. Any corruption of the header will be detected when said
header is accessed, and the process terminated.
The following information is stored in the header:
- the 16-bit checksum;
-- the unused bytes amount for that chunk, which is necessary for computing the
- size of the chunk;
+- the class ID for that chunk, which is the "bucket" where the chunk resides
+ for Primary backed allocations, or 0 for Secondary backed allocations;
+- the size (Primary) or unused bytes amount (Secondary) for that chunk, which is
+ necessary for computing the size of the chunk;
- the state of the chunk (available, allocated or quarantined);
- the allocation type (malloc, new, new[] or memalign), to detect potential
mismatches in the allocation APIs used;
- the offset of the chunk, which is the distance in bytes from the beginning of
- the returned chunk to the beginning of the backend allocation;
-- a 8-bit salt.
+ the returned chunk to the beginning of the Backend allocation;
This header fits within 8 bytes, on all platforms supported.
The checksum is computed as a CRC32 (made faster with hardware support)
of the global secret, the chunk pointer itself, and the 8 bytes of header with
-the checksum field zeroed out.
+the checksum field zeroed out. It is not intended to be cryptographically
+strong.
The header is atomically loaded and stored to prevent races. This is important
as two consecutive chunks could belong to different threads. We also want to
@@ -60,9 +74,9 @@ local copies of the header for this purpose.
Delayed Freelist
-----------------
-A delayed freelist allows us to not return a chunk directly to the backend, but
+A delayed freelist allows us to not return a chunk directly to the Backend, but
to keep it aside for a while. Once a criterion is met, the delayed freelist is
-emptied, and the quarantined chunks are returned to the backend. This helps
+emptied, and the quarantined chunks are returned to the Backend. This helps
mitigate use-after-free vulnerabilities by reducing the determinism of the
allocation and deallocation patterns.
@@ -74,7 +88,7 @@ Randomness
----------
It is important for the allocator to not make use of fixed addresses. We use
the dynamic base option for the SizeClassAllocator, allowing us to benefit
-from the randomness of mmap.
+from the randomness of the system memory mapping functions.
Usage
=====
@@ -98,26 +112,39 @@ You may also build Scudo like this:
cd $LLVM/projects/compiler-rt/lib
clang++ -fPIC -std=c++11 -msse4.2 -O2 -I. scudo/*.cpp \
- $(\ls sanitizer_common/*.{cc,S} | grep -v "sanitizer_termination\|sanitizer_common_nolibc") \
- -shared -o scudo-allocator.so -pthread
+ $(\ls sanitizer_common/*.{cc,S} | grep -v "sanitizer_termination\|sanitizer_common_nolibc\|sancov_\|sanitizer_unwind\|sanitizer_symbol") \
+ -shared -o libscudo.so -pthread
and then use it with existing binaries as follows:
.. code::
- LD_PRELOAD=`pwd`/scudo-allocator.so ./a.out
+ LD_PRELOAD=`pwd`/libscudo.so ./a.out
+
+Clang
+-----
+With a recent version of Clang (post rL317337), the allocator can be linked with
+a binary at compilation using the ``-fsanitize=scudo`` command-line argument, if
+the target platform is supported. Currently, the only other Sanitizer Scudo is
+compatible with is UBSan (eg: ``-fsanitize=scudo,undefined``). Compiling with
+Scudo will also enforce PIE for the output binary.
Options
-------
-Several aspects of the allocator can be configured through the following ways:
+Several aspects of the allocator can be configured on a per process basis
+through the following ways:
+
+- at compile time, by defining ``SCUDO_DEFAULT_OPTIONS`` to the options string
+ you want set by default;
- by defining a ``__scudo_default_options`` function in one's program that
returns the options string to be parsed. Said function must have the following
- prototype: ``extern "C" const char* __scudo_default_options()``.
+ prototype: ``extern "C" const char* __scudo_default_options(void)``, with a
+ default visibility. This will override the compile time define;
- through the environment variable SCUDO_OPTIONS, containing the options string
to be parsed. Options defined this way will override any definition made
- through ``__scudo_default_options``;
+ through ``__scudo_default_options``.
The options string follows a syntax similar to ASan, where distinct options
can be assigned in the same string, separated by colons.
@@ -146,7 +173,9 @@ The following options are available:
| | | | the actual deallocation of chunks. Lower value |
| | | | may reduce memory usage but decrease the |
| | | | effectiveness of the mitigation; a negative |
-| | | | value will fallback to the defaults. |
+| | | | value will fallback to the defaults. Setting |
+| | | | *both* this and ThreadLocalQuarantineSizeKb to |
+| | | | zero will disable the quarantine entirely. |
+-----------------------------+----------------+----------------+------------------------------------------------+
| QuarantineChunksUpToSize | 2048 | 512 | Size (in bytes) up to which chunks can be |
| | | | quarantined. |
@@ -154,7 +183,9 @@ The following options are available:
| ThreadLocalQuarantineSizeKb | 1024 | 256 | The size (in Kb) of per-thread cache use to |
| | | | offload the global quarantine. Lower value may |
| | | | reduce memory usage but might increase |
-| | | | contention on the global quarantine. |
+| | | | contention on the global quarantine. Setting |
+| | | | *both* this and QuarantineSizeKb to zero will |
+| | | | disable the quarantine entirely. |
+-----------------------------+----------------+----------------+------------------------------------------------+
| DeallocationTypeMismatch | true | true | Whether or not we report errors on |
| | | | malloc/delete, new/free, new/delete[], etc. |
@@ -167,7 +198,6 @@ The following options are available:
+-----------------------------+----------------+----------------+------------------------------------------------+
Allocator related common Sanitizer options can also be passed through Scudo
-options, such as ``allocator_may_return_null``. A detailed list including those
-can be found here:
+options, such as ``allocator_may_return_null`` or ``abort_on_error``. A detailed
+list including those can be found here:
https://github.com/google/sanitizers/wiki/SanitizerCommonFlags.
-
diff --git a/docs/SourceLevelDebugging.rst b/docs/SourceLevelDebugging.rst
index 103c6e0365ba..3fa738c7e442 100644
--- a/docs/SourceLevelDebugging.rst
+++ b/docs/SourceLevelDebugging.rst
@@ -77,8 +77,8 @@ source from generated code.
.. _intro_debugopt:
-Debugging optimized code
-------------------------
+Debug information and optimizations
+-----------------------------------
An extremely high priority of LLVM debugging information is to make it interact
well with optimizations and analysis. In particular, the LLVM debug
@@ -1464,3 +1464,180 @@ Improving LLVM's CodeView support is a process of finding interesting type
records, constructing a C++ test case that makes MSVC emit those records,
dumping the records, understanding them, and then generating equivalent records
in LLVM's backend.
+
+Testing Debug Info Preservation in Optimizations
+================================================
+
+The following paragraphs are an introduction to the debugify utility
+and examples of how to use it in regression tests to check debug info
+preservation after optimizations.
+
+The ``debugify`` utility
+------------------------
+
+The ``debugify`` synthetic debug info testing utility consists of two
+main parts. The ``debugify`` pass and the ``check-debugify`` one. They are
+meant to be used with ``opt`` for development purposes.
+
+The first applies synthetic debug information to every instruction of the module,
+while the latter checks that this DI is still available after an optimization
+has occurred, reporting any errors/warnings while doing so.
+
+The instructions are assigned sequentially increasing line locations,
+and are immediately used by debug value intrinsics when possible.
+
+For example, here is a module before:
+
+.. code-block:: llvm
+
+ define dso_local void @f(i32* %x) {
+ entry:
+ %x.addr = alloca i32*, align 8
+ store i32* %x, i32** %x.addr, align 8
+ %0 = load i32*, i32** %x.addr, align 8
+ store i32 10, i32* %0, align 4
+ ret void
+ }
+
+and after running ``opt -debugify`` on it we get:
+
+.. code-block:: llvm
+
+ define dso_local void @f(i32* %x) !dbg !6 {
+ entry:
+ %x.addr = alloca i32*, align 8, !dbg !12
+ call void @llvm.dbg.value(metadata i32** %x.addr, metadata !9, metadata !DIExpression()), !dbg !12
+ store i32* %x, i32** %x.addr, align 8, !dbg !13
+ %0 = load i32*, i32** %x.addr, align 8, !dbg !14
+ call void @llvm.dbg.value(metadata i32* %0, metadata !11, metadata !DIExpression()), !dbg !14
+ store i32 10, i32* %0, align 4, !dbg !15
+ ret void, !dbg !16
+ }
+
+ !llvm.dbg.cu = !{!0}
+ !llvm.debugify = !{!3, !4}
+ !llvm.module.flags = !{!5}
+
+ !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
+ !1 = !DIFile(filename: "debugify-sample.ll", directory: "/")
+ !2 = !{}
+ !3 = !{i32 5}
+ !4 = !{i32 2}
+ !5 = !{i32 2, !"Debug Info Version", i32 3}
+ !6 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !1, line: 1, type: !7, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, retainedNodes: !8)
+ !7 = !DISubroutineType(types: !2)
+ !8 = !{!9, !11}
+ !9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10)
+ !10 = !DIBasicType(name: "ty64", size: 64, encoding: DW_ATE_unsigned)
+ !11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10)
+ !12 = !DILocation(line: 1, column: 1, scope: !6)
+ !13 = !DILocation(line: 2, column: 1, scope: !6)
+ !14 = !DILocation(line: 3, column: 1, scope: !6)
+ !15 = !DILocation(line: 4, column: 1, scope: !6)
+ !16 = !DILocation(line: 5, column: 1, scope: !6)
+
+The following is an example of the -check-debugify output:
+
+.. code-block:: none
+
+ $ opt -enable-debugify -loop-vectorize llvm/test/Transforms/LoopVectorize/i8-induction.ll -disable-output
+ ERROR: Instruction with empty DebugLoc in function f -- %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
+
+Errors/warnings can range from instructions with empty debug location to an
+instruction having a type that's incompatible with the source variable it describes,
+all the way to missing lines and missing debug value intrinsics.
+
+Fixing errors
+^^^^^^^^^^^^^
+
+Each of the errors above has a relevant API available to fix it.
+
+* In the case of missing debug location, ``Instruction::setDebugLoc`` or possibly
+ ``IRBuilder::setCurrentDebugLocation`` when using a Builder and the new location
+ should be reused.
+
+* When a debug value has incompatible type ``llvm::replaceAllDbgUsesWith`` can be used.
+ After a RAUW call an incompatible type error can occur because RAUW does not handle
+ widening and narrowing of variables while ``llvm::replaceAllDbgUsesWith`` does. It is
+ also capable of changing the DWARF expression used by the debugger to describe the variable.
+ It also prevents use-before-def by salvaging or deleting invalid debug values.
+
+* When a debug value is missing ``llvm::salvageDebugInfo`` can be used when no replacement
+ exists, or ``llvm::replaceAllDbgUsesWith`` when a replacement exists.
+
+Using ``debugify``
+------------------
+
+In order for ``check-debugify`` to work, the DI must be coming from
+``debugify``. Thus, modules with existing DI will be skipped.
+
+The most straightforward way to use ``debugify`` is as follows::
+
+ $ opt -debugify -pass-to-test -check-debugify sample.ll
+
+This will inject synthetic DI to ``sample.ll`` run the ``pass-to-test``
+and then check for missing DI.
+
+Some other ways to run debugify are avaliable:
+
+.. code-block:: bash
+
+ # Same as the above example.
+ $ opt -enable-debugify -pass-to-test sample.ll
+
+ # Suppresses verbose debugify output.
+ $ opt -enable-debugify -debugify-quiet -pass-to-test sample.ll
+
+ # Prepend -debugify before and append -check-debugify -strip after
+ # each pass on the pipeline (similar to -verify-each).
+ $ opt -debugify-each -O2 sample.ll
+
+``debugify`` can also be used to test a backend, e.g:
+
+.. code-block:: bash
+
+ $ opt -debugify < sample.ll | llc -o -
+
+``debugify`` in regression tests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``-debugify`` pass is especially helpful when it comes to testing that
+a given pass preserves DI while transforming the module. For this to work,
+the ``-debugify`` output must be stable enough to use in regression tests.
+Changes to this pass are not allowed to break existing tests.
+
+It allows us to test for DI loss in the same tests we check that the
+transformation is actually doing what it should.
+
+Here is an example from ``test/Transforms/InstCombine/cast-mul-select.ll``:
+
+.. code-block:: llvm
+
+ ; RUN: opt < %s -debugify -instcombine -S | FileCheck %s --check-prefix=DEBUGINFO
+
+ define i32 @mul(i32 %x, i32 %y) {
+ ; DBGINFO-LABEL: @mul(
+ ; DBGINFO-NEXT: [[C:%.*]] = mul i32 {{.*}}
+ ; DBGINFO-NEXT: call void @llvm.dbg.value(metadata i32 [[C]]
+ ; DBGINFO-NEXT: [[D:%.*]] = and i32 {{.*}}
+ ; DBGINFO-NEXT: call void @llvm.dbg.value(metadata i32 [[D]]
+
+ %A = trunc i32 %x to i8
+ %B = trunc i32 %y to i8
+ %C = mul i8 %A, %B
+ %D = zext i8 %C to i32
+ ret i32 %D
+ }
+
+Here we test that the two ``dbg.value`` instrinsics are preserved and
+are correctly pointing to the ``[[C]]`` and ``[[D]]`` variables.
+
+.. note::
+
+ Note, that when writing this kind of regression tests, it is important
+ to make them as robust as possible. That's why we should try to avoid
+ hardcoding line/variable numbers in check lines. If for example you test
+ for a ``DILocation`` to have a specific line number, and someone later adds
+ an instruction before the one we check the test will fail. In the cases this
+ can't be avoided (say, if a test wouldn't be precise enough), moving the
+ test to it's own file is preferred.
diff --git a/docs/SpeculativeLoadHardening.md b/docs/SpeculativeLoadHardening.md
new file mode 100644
index 000000000000..bf5c7d354fef
--- /dev/null
+++ b/docs/SpeculativeLoadHardening.md
@@ -0,0 +1,1099 @@
+# Speculative Load Hardening
+
+### A Spectre Variant #1 Mitigation Technique
+
+Author: Chandler Carruth - [chandlerc@google.com](mailto:chandlerc@google.com)
+
+## Problem Statement
+
+Recently, Google Project Zero and other researchers have found information leak
+vulnerabilities by exploiting speculative execution in modern CPUs. These
+exploits are currently broken down into three variants:
+* GPZ Variant #1 (a.k.a. Spectre Variant #1): Bounds check (or predicate) bypass
+* GPZ Variant #2 (a.k.a. Spectre Variant #2): Branch target injection
+* GPZ Variant #3 (a.k.a. Meltdown): Rogue data cache load
+
+For more details, see the Google Project Zero blog post and the Spectre research
+paper:
+* https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
+* https://spectreattack.com/spectre.pdf
+
+The core problem of GPZ Variant #1 is that speculative execution uses branch
+prediction to select the path of instructions speculatively executed. This path
+is speculatively executed with the available data, and may load from memory and
+leak the loaded values through various side channels that survive even when the
+speculative execution is unwound due to being incorrect. Mispredicted paths can
+cause code to be executed with data inputs that never occur in correct
+executions, making checks against malicious inputs ineffective and allowing
+attackers to use malicious data inputs to leak secret data. Here is an example,
+extracted and simplified from the Project Zero paper:
+```
+struct array {
+ unsigned long length;
+ unsigned char data[];
+};
+struct array *arr1 = ...; // small array
+struct array *arr2 = ...; // array of size 0x400
+unsigned long untrusted_offset_from_caller = ...;
+if (untrusted_offset_from_caller < arr1->length) {
+ unsigned char value = arr1->data[untrusted_offset_from_caller];
+ unsigned long index2 = ((value&1)*0x100)+0x200;
+ unsigned char value2 = arr2->data[index2];
+}
+```
+
+The key of the attack is to call this with `untrusted_offset_from_caller` that
+is far outside of the bounds when the branch predictor will predict that it
+will be in-bounds. In that case, the body of the `if` will be executed
+speculatively, and may read secret data into `value` and leak it via a
+cache-timing side channel when a dependent access is made to populate `value2`.
+
+## High Level Mitigation Approach
+
+While several approaches are being actively pursued to mitigate specific
+branches and/or loads inside especially risky software (most notably various OS
+kernels), these approaches require manual and/or static analysis aided auditing
+of code and explicit source changes to apply the mitigation. They are unlikely
+to scale well to large applications. We are proposing a comprehensive
+mitigation approach that would apply automatically across an entire program
+rather than through manual changes to the code. While this is likely to have a
+high performance cost, some applications may be in a good position to take this
+performance / security tradeoff.
+
+The specific technique we propose is to cause loads to be checked using
+branchless code to ensure that they are executing along a valid control flow
+path. Consider the following C-pseudo-code representing the core idea of a
+predicate guarding potentially invalid loads:
+```
+void leak(int data);
+void example(int* pointer1, int* pointer2) {
+ if (condition) {
+ // ... lots of code ...
+ leak(*pointer1);
+ } else {
+ // ... more code ...
+ leak(*pointer2);
+ }
+}
+```
+
+This would get transformed into something resembling the following:
+```
+uintptr_t all_ones_mask = std::numerical_limits<uintptr_t>::max();
+uintptr_t all_zeros_mask = 0;
+void leak(int data);
+void example(int* pointer1, int* pointer2) {
+ uintptr_t predicate_state = all_ones_mask;
+ if (condition) {
+ // Assuming ?: is implemented using branchless logic...
+ predicate_state = !condition ? all_zeros_mask : predicate_state;
+ // ... lots of code ...
+ //
+ // Harden the pointer so it can't be loaded
+ pointer1 &= predicate_state;
+ leak(*pointer1);
+ } else {
+ predicate_state = condition ? all_zeros_mask : predicate_state;
+ // ... more code ...
+ //
+ // Alternative: Harden the loaded value
+ int value2 = *pointer2 & predicate_state;
+ leak(value2);
+ }
+}
+```
+
+The result should be that if the `if (condition) {` branch is mis-predicted,
+there is a *data* dependency on the condition used to zero out any pointers
+prior to loading through them or to zero out all of the loaded bits. Even
+though this code pattern may still execute speculatively, *invalid* speculative
+executions are prevented from leaking secret data from memory (but note that
+this data might still be loaded in safe ways, and some regions of memory are
+required to not hold secrets, see below for detailed limitations). This
+approach only requires the underlying hardware have a way to implement a
+branchless and unpredicted conditional update of a register's value. All modern
+architectures have support for this, and in fact such support is necessary to
+correctly implement constant time cryptographic primitives.
+
+Crucial properties of this approach:
+* It is not preventing any particular side-channel from working. This is
+ important as there are an unknown number of potential side channels and we
+ expect to continue discovering more. Instead, it prevents the observation of
+ secret data in the first place.
+* It accumulates the predicate state, protecting even in the face of nested
+ *correctly* predicted control flows.
+* It passes this predicate state across function boundaries to provide
+ [interprocedural protection](#interprocedural-checking).
+* When hardening the address of a load, it uses a *destructive* or
+ *non-reversible* modification of the address to prevent an attacker from
+ reversing the check using attacker-controlled inputs.
+* It does not completely block speculative execution, and merely prevents
+ *mis*-speculated paths from leaking secrets from memory (and stalls
+ speculation until this can be determined).
+* It is completely general and makes no fundamental assumptions about the
+ underlying architecture other than the ability to do branchless conditional
+ data updates and a lack of value prediction.
+* It does not require programmers to identify all possible secret data using
+ static source code annotations or code vulnerable to a variant #1 style
+ attack.
+
+Limitations of this approach:
+* It requires re-compiling source code to insert hardening instruction
+ sequences. Only software compiled in this mode is protected.
+* The performance is heavily dependent on a particular architecture's
+ implementation strategy. We outline a potential x86 implementation below and
+ characterize its performance.
+* It does not defend against secret data already loaded from memory and
+ residing in registers or leaked through other side-channels in
+ non-speculative execution. Code dealing with this, e.g cryptographic
+ routines, already uses constant-time algorithms and code to prevent
+ side-channels. Such code should also scrub registers of secret data following
+ [these
+ guidelines](https://github.com/HACS-workshop/spectre-mitigations/blob/master/crypto_guidelines.md).
+* To achieve reasonable performance, many loads may not be checked, such as
+ those with compile-time fixed addresses. This primarily consists of accesses
+ at compile-time constant offsets of global and local variables. Code which
+ needs this protection and intentionally stores secret data must ensure the
+ memory regions used for secret data are necessarily dynamic mappings or heap
+ allocations. This is an area which can be tuned to provide more comprehensive
+ protection at the cost of performance.
+* [Hardened loads](#hardening-the-address-of-the-load) may still load data from
+ _valid_ addresses if not _attacker-controlled_ addresses. To prevent these
+ from reading secret data, the low 2gb of the address space and 2gb above and
+ below any executable pages should be protected.
+
+Credit:
+* The core idea of tracing misspeculation through data and marking pointers to
+ block misspeculated loads was developed as part of a HACS 2018 discussion
+ between Chandler Carruth, Paul Kocher, Thomas Pornin, and several other
+ individuals.
+* Core idea of masking out loaded bits was part of the original mitigation
+ suggested by Jann Horn when these attacks were reported.
+
+
+### Indirect Branches, Calls, and Returns
+
+It is possible to attack control flow other than conditional branches with
+variant #1 style mispredictions.
+* A prediction towards a hot call target of a virtual method can lead to it
+ being speculatively executed when an expected type is used (often called
+ "type confusion").
+* A hot case may be speculatively executed due to prediction instead of the
+ correct case for a switch statement implemented as a jump table.
+* A hot common return address may be predicted incorrectly when returning from
+ a function.
+
+These code patterns are also vulnerable to Spectre variant #2, and as such are
+best mitigated with a
+[retpoline](https://support.google.com/faqs/answer/7625886) on x86 platforms.
+When a mitigation technique like retpoline is used, speculation simply cannot
+proceed through an indirect control flow edge (or it cannot be mispredicted in
+the case of a filled RSB) and so it is also protected from variant #1 style
+attacks. However, some architectures, micro-architectures, or vendors do not
+employ the retpoline mitigation, and on future x86 hardware (both Intel and
+AMD) it is expected to become unnecessary due to hardware-based mitigation.
+
+When not using a retpoline, these edges will need independent protection from
+variant #1 style attacks. The analogous approach to that used for conditional
+control flow should work:
+```
+uintptr_t all_ones_mask = std::numerical_limits<uintptr_t>::max();
+uintptr_t all_zeros_mask = 0;
+void leak(int data);
+void example(int* pointer1, int* pointer2) {
+ uintptr_t predicate_state = all_ones_mask;
+ switch (condition) {
+ case 0:
+ // Assuming ?: is implemented using branchless logic...
+ predicate_state = (condition != 0) ? all_zeros_mask : predicate_state;
+ // ... lots of code ...
+ //
+ // Harden the pointer so it can't be loaded
+ pointer1 &= predicate_state;
+ leak(*pointer1);
+ break;
+
+ case 1:
+ predicate_state = (condition != 1) ? all_zeros_mask : predicate_state;
+ // ... more code ...
+ //
+ // Alternative: Harden the loaded value
+ int value2 = *pointer2 & predicate_state;
+ leak(value2);
+ break;
+
+ // ...
+ }
+}
+```
+
+The core idea remains the same: validate the control flow using data-flow and
+use that validation to check that loads cannot leak information along
+misspeculated paths. Typically this involves passing the desired target of such
+control flow across the edge and checking that it is correct afterwards. Note
+that while it is tempting to think that this mitigates variant #2 attacks, it
+does not. Those attacks go to arbitrary gadgets that don't include the checks.
+
+
+### Variant #1.1 and #1.2 attacks: "Bounds Check Bypass Store"
+
+Beyond the core variant #1 attack, there are techniques to extend this attack.
+The primary technique is known as "Bounds Check Bypass Store" and is discussed
+in this research paper: https://people.csail.mit.edu/vlk/spectre11.pdf
+
+We will analyze these two variants independently. First, variant #1.1 works by
+speculatively storing over the return address after a bounds check bypass. This
+speculative store then ends up being used by the CPU during speculative
+execution of the return, potentially directing speculative execution to
+arbitrary gadgets in the binary. Let's look at an example.
+```
+unsigned char local_buffer[4];
+unsigned char *untrusted_data_from_caller = ...;
+unsigned long untrusted_size_from_caller = ...;
+if (untrusted_size_from_caller < sizeof(local_buffer)) {
+ // Speculative execution enters here with a too-large size.
+ memcpy(local_buffer, untrusted_data_from_caller,
+ untrusted_size_from_caller);
+ // The stack has now been smashed, writing an attacker-controlled
+ // address over the return adress.
+ minor_processing(local_buffer);
+ return;
+ // Control will speculate to the attacker-written address.
+}
+```
+
+However, this can be mitigated by hardening the load of the return address just
+like any other load. This is sometimes complicated because x86 for example
+*implicitly* loads the return address off the stack. However, the
+implementation technique below is specifically designed to mitigate this
+implicit load by using the stack pointer to communicate misspeculation between
+functions. This additionally causes a misspeculation to have an invalid stack
+pointer and never be able to read the speculatively stored return address. See
+the detailed discussion below.
+
+For variant #1.2, the attacker speculatively stores into the vtable or jump
+table used to implement an indirect call or indirect jump. Because this is
+speculative, this will often be possible even when these are stored in
+read-only pages. For example:
+```
+class FancyObject : public BaseObject {
+public:
+ void DoSomething() override;
+};
+void f(unsigned long attacker_offset, unsigned long attacker_data) {
+ FancyObject object = getMyObject();
+ unsigned long *arr[4] = getFourDataPointers();
+ if (attacker_offset < 4) {
+ // We have bypassed the bounds check speculatively.
+ unsigned long *data = arr[attacker_offset];
+ // Now we have computed a pointer inside of `object`, the vptr.
+ *data = attacker_data;
+ // The vptr points to the virtual table and we speculatively clobber that.
+ g(object); // Hand the object to some other routine.
+ }
+}
+// In another file, we call a method on the object.
+void g(BaseObject &object) {
+ object.DoSomething();
+ // This speculatively calls the address stored over the vtable.
+}
+```
+
+Mitigating this requires hardening loads from these locations, or mitigating
+the indirect call or indirect jump. Any of these are sufficient to block the
+call or jump from using a speculatively stored value that has been read back.
+
+For both of these, using retpolines would be equally sufficient. One possible
+hybrid approach is to use retpolines for indirect call and jump, while relying
+on SLH to mitigate returns.
+
+Another approach that is sufficient for both of these is to harden all of the
+speculative stores. However, as most stores aren't interesting and don't
+inherently leak data, this is expected to be prohibitively expensive given the
+attack it is defending against.
+
+
+## Implementation Details
+
+There are a number of complex details impacting the implementation of this
+technique, both on a particular architecture and within a particular compiler.
+We discuss proposed implementation techniques for the x86 architecture and the
+LLVM compiler. These are primarily to serve as an example, as other
+implementation techniques are very possible.
+
+
+### x86 Implementation Details
+
+On the x86 platform we break down the implementation into three core
+components: accumulating the predicate state through the control flow graph,
+checking the loads, and checking control transfers between procedures.
+
+
+#### Accumulating Predicate State
+
+Consider baseline x86 instructions like the following, which test three
+conditions and if all pass, loads data from memory and potentially leaks it
+through some side channel:
+```
+# %bb.0: # %entry
+ pushq %rax
+ testl %edi, %edi
+ jne .LBB0_4
+# %bb.1: # %then1
+ testl %esi, %esi
+ jne .LBB0_4
+# %bb.2: # %then2
+ testl %edx, %edx
+ je .LBB0_3
+.LBB0_4: # %exit
+ popq %rax
+ retq
+.LBB0_3: # %danger
+ movl (%rcx), %edi
+ callq leak
+ popq %rax
+ retq
+```
+
+When we go to speculatively execute the load, we want to know whether any of
+the dynamically executed predicates have been misspeculated. To track that,
+along each conditional edge, we need to track the data which would allow that
+edge to be taken. On x86, this data is stored in the flags register used by the
+conditional jump instruction. Along both edges after this fork in control flow,
+the flags register remains alive and contains data that we can use to build up
+our accumulated predicate state. We accumulate it using the x86 conditional
+move instruction which also reads the flag registers where the state resides.
+These conditional move instructions are known to not be predicted on any x86
+processors, making them immune to misprediction that could reintroduce the
+vulnerability. When we insert the conditional moves, the code ends up looking
+like the following:
+```
+# %bb.0: # %entry
+ pushq %rax
+ xorl %eax, %eax # Zero out initial predicate state.
+ movq $-1, %r8 # Put all-ones mask into a register.
+ testl %edi, %edi
+ jne .LBB0_1
+# %bb.2: # %then1
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ testl %esi, %esi
+ jne .LBB0_1
+# %bb.3: # %then2
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ testl %edx, %edx
+ je .LBB0_4
+.LBB0_1:
+ cmoveq %r8, %rax # Conditionally update predicate state.
+ popq %rax
+ retq
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ ...
+```
+
+Here we create the "empty" or "correct execution" predicate state by zeroing
+`%rax`, and we create a constant "incorrect execution" predicate value by
+putting `-1` into `%r8`. Then, along each edge coming out of a conditional
+branch we do a conditional move that in a correct execution will be a no-op,
+but if misspeculated, will replace the `%rax` with the value of `%r8`.
+Misspeculating any one of the three predicates will cause `%rax` to hold the
+"incorrect execution" value from `%r8` as we preserve incoming values when
+execution is correct rather than overwriting it.
+
+We now have a value in `%rax` in each basic block that indicates if at some
+point previously a predicate was mispredicted. And we have arranged for that
+value to be particularly effective when used below to harden loads.
+
+
+##### Indirect Call, Branch, and Return Predicates
+
+(Not yet implemented.)
+
+There is no analogous flag to use when tracing indirect calls, branches, and
+returns. The predicate state must be accumulated through some other means.
+Fundamentally, this is the reverse of the problem posed in CFI: we need to
+check where we came from rather than where we are going. For function-local
+jump tables, this is easily arranged by testing the input to the jump table
+within each destination:
+```
+ pushq %rax
+ xorl %eax, %eax # Zero out initial predicate state.
+ movq $-1, %r8 # Put all-ones mask into a register.
+ jmpq *.LJTI0_0(,%rdi,8) # Indirect jump through table.
+.LBB0_2: # %sw.bb
+ testq $0, %rdi # Validate index used for jump table.
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ ...
+ jmp _Z4leaki # TAILCALL
+
+.LBB0_3: # %sw.bb1
+ testq $1, %rdi # Validate index used for jump table.
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ ...
+ jmp _Z4leaki # TAILCALL
+
+.LBB0_5: # %sw.bb10
+ testq $2, %rdi # Validate index used for jump table.
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ ...
+ jmp _Z4leaki # TAILCALL
+ ...
+
+ .section .rodata,"a",@progbits
+ .p2align 3
+.LJTI0_0:
+ .quad .LBB0_2
+ .quad .LBB0_3
+ .quad .LBB0_5
+ ...
+```
+
+Returns have a simple mitigation technique on x86-64 (or other ABIs which have
+what is called a "red zone" region beyond the end of the stack). This region is
+guaranteed to be preserved across interrupts and context switches, making the
+return address used in returning to the current code remain on the stack and
+valid to read. We can emit code in the caller to verify that a return edge was
+not mispredicted:
+```
+ callq other_function
+return_addr:
+ testq -8(%rsp), return_addr # Validate return address.
+ cmovneq %r8, %rax # Update predicate state.
+```
+
+For an ABI without a "red zone" (and thus unable to read the return address
+from the stack), mitigating returns face similar problems to calls below.
+
+Indirect calls (and returns in the absence of a red zone ABI) pose the most
+significant challenge to propagate. The simplest technique would be to define a
+new ABI such that the intended call target is passed into the called function
+and checked in the entry. Unfortunately, new ABIs are quite expensive to deploy
+in C and C++. While the target function could be passed in TLS, we would still
+require complex logic to handle a mixture of functions compiled with and
+without this extra logic (essentially, making the ABI backwards compatible).
+Currently, we suggest using retpolines here and will continue to investigate
+ways of mitigating this.
+
+
+##### Optimizations, Alternatives, and Tradeoffs
+
+Merely accumulating predicate state involves significant cost. There are
+several key optimizations we employ to minimize this and various alternatives
+that present different tradeoffs in the generated code.
+
+First, we work to reduce the number of instructions used to track the state:
+* Rather than inserting a `cmovCC` instruction along every conditional edge in
+ the original program, we track each set of condition flags we need to capture
+ prior to entering each basic block and reuse a common `cmovCC` sequence for
+ those.
+ * We could further reuse suffixes when there are multiple `cmovCC`
+ instructions required to capture the set of flags. Currently this is
+ believed to not be worth the cost as paired flags are relatively rare and
+ suffixes of them are exceedingly rare.
+* A common pattern in x86 is to have multiple conditional jump instructions
+ that use the same flags but handle different conditions. Naively, we could
+ consider each fallthrough between them an "edge" but this causes a much more
+ complex control flow graph. Instead, we accumulate the set of conditions
+ necessary for fallthrough and use a sequence of `cmovCC` instructions in a
+ single fallthrough edge to track it.
+
+Second, we trade register pressure for simpler `cmovCC` instructions by
+allocating a register for the "bad" state. We could read that value from memory
+as part of the conditional move instruction, however, this creates more
+micro-ops and requires the load-store unit to be involved. Currently, we place
+the value into a virtual register and allow the register allocator to decide
+when the register pressure is sufficient to make it worth spilling to memory
+and reloading.
+
+
+#### Hardening Loads
+
+Once we have the predicate accumulated into a special value for correct vs.
+misspeculated, we need to apply this to loads in a way that ensures they do not
+leak secret data. There are two primary techniques for this: we can either
+harden the loaded value to prevent observation, or we can harden the address
+itself to prevent the load from occuring. These have significantly different
+performance tradeoffs.
+
+
+##### Hardening loaded values
+
+The most appealing way to harden loads is to mask out all of the bits loaded.
+The key requirement is that for each bit loaded, along the misspeculated path
+that bit is always fixed at either 0 or 1 regardless of the value of the bit
+loaded. The most obvious implementation uses either an `and` instruction with
+an all-zero mask along misspeculated paths and an all-one mask along correct
+paths, or an `or` instruction with an all-one mask along misspeculated paths
+and an all-zero mask along correct paths. Other options become less appealing
+such as multiplying by zero, or multiple shift instructions. For reasons we
+elaborate on below, we end up suggesting you use `or` with an all-ones mask,
+making the x86 instruction sequence look like the following:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ movl (%rsi), %edi # Load potentially secret data from %rsi.
+ orl %eax, %edi
+```
+
+Other useful patterns may be to fold the load into the `or` instruction itself
+at the cost of a register-to-register copy.
+
+There are some challenges with deploying this approach:
+1. Many loads on x86 are folded into other instructions. Separating them would
+ add very significant and costly register pressure with prohibitive
+ performance cost.
+1. Loads may not target a general purpose register requiring extra instructions
+ to map the state value into the correct register class, and potentially more
+ expensive instructions to mask the value in some way.
+1. The flags registers on x86 are very likely to be live, and challenging to
+ preserve cheaply.
+1. There are many more values loaded than pointers & indices used for loads. As
+ a consequence, hardening the result of a load requires substantially more
+ instructions than hardening the address of the load (see below).
+
+Despite these challenges, hardening the result of the load critically allows
+the load to proceed and thus has dramatically less impact on the total
+speculative / out-of-order potential of the execution. There are also several
+interesting techniques to try and mitigate these challenges and make hardening
+the results of loads viable in at least some cases. However, we generally
+expect to fall back when unprofitable from hardening the loaded value to the
+next approach of hardening the address itself.
+
+
+###### Loads folded into data-invariant operations can be hardened after the operation
+
+The first key to making this feasible is to recognize that many operations on
+x86 are "data-invariant". That is, they have no (known) observable behavior
+differences due to the particular input data. These instructions are often used
+when implementing cryptographic primitives dealing with private key data
+because they are not believed to provide any side-channels. Similarly, we can
+defer hardening until after them as they will not in-and-of-themselves
+introduce a speculative execution side-channel. This results in code sequences
+that look like:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ addl (%rsi), %edi # Load and accumulate without leaking.
+ orl %eax, %edi
+```
+
+While an addition happens to the loaded (potentially secret) value, that
+doesn't leak any data and we then immediately harden it.
+
+
+###### Hardening of loaded values deferred down the data-invariant expression graph
+
+We can generalize the previous idea and sink the hardening down the expression
+graph across as many data-invariant operations as desirable. This can use very
+conservative rules for whether something is data-invariant. The primary goal
+should be to handle multiple loads with a single hardening instruction:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ addl (%rsi), %edi # Load and accumulate without leaking.
+ addl 4(%rsi), %edi # Continue without leaking.
+ addl 8(%rsi), %edi
+ orl %eax, %edi # Mask out bits from all three loads.
+```
+
+
+###### Preserving the flags while hardening loaded values on Haswell, Zen, and newer processors
+
+Sadly, there are no useful instructions on x86 that apply a mask to all 64 bits
+without touching the flag registers. However, we can harden loaded values that
+are narrower than a word (fewer than 32-bits on 32-bit systems and fewer than
+64-bits on 64-bit systems) by zero-extending the value to the full word size
+and then shifting right by at least the number of original bits using the BMI2
+`shrx` instruction:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ addl (%rsi), %edi # Load and accumulate 32 bits of data.
+ shrxq %rax, %rdi, %rdi # Shift out all 32 bits loaded.
+```
+
+Because on x86 the zero-extend is free, this can efficiently harden the loaded
+value.
+
+
+##### Hardening the address of the load
+
+When hardening the loaded value is inapplicable, most often because the
+instruction directly leaks information (like `cmp` or `jmpq`), we switch to
+hardening the _address_ of the load instead of the loaded value. This avoids
+increasing register pressure by unfolding the load or paying some other high
+cost.
+
+To understand how this works in practice, we need to examine the exact
+semantics of the x86 addressing modes which, in its fully general form, looks
+like `(%base,%index,scale)offset`. Here `%base` and `%index` are 64-bit
+registers that can potentially be any value, and may be attacker controlled,
+and `scale` and `offset` are fixed immediate values. `scale` must be `1`, `2`,
+`4`, or `8`, and `offset` can be any 32-bit sign extended value. The exact
+computation performed to find the address is then: `%base + (scale * %index) +
+offset` under 64-bit 2's complement modular arithmetic.
+
+One issue with this approach is that, after hardening, the `%base + (scale *
+%index)` subexpression will compute a value near zero (`-1 + (scale * -1)`) and
+then a large, positive `offset` will index into memory within the first two
+gigabytes of address space. While these offsets are not attacker controlled,
+the attacker could chose to attack a load which happens to have the desired
+offset and then successfully read memory in that region. This significantly
+raises the burden on the attacker and limits the scope of attack but does not
+eliminate it. To fully close the attack we must work with the operating system
+to preclude mapping memory in the low two gigabytes of address space.
+
+
+###### 64-bit load checking instructions
+
+We can use the following instruction sequences to check loads. We set up `%r8`
+in these examples to hold the special value of `-1` which will be `cmov`ed over
+`%rax` in misspeculated paths.
+
+Single register addressing mode:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ orq %rax, %rsi # Mask the pointer if misspeculating.
+ movl (%rsi), %edi
+```
+
+Two register addressing mode:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ orq %rax, %rsi # Mask the pointer if misspeculating.
+ orq %rax, %rcx # Mask the index if misspeculating.
+ movl (%rsi,%rcx), %edi
+```
+
+This will result in a negative address near zero or in `offset` wrapping the
+address space back to a small positive address. Small, negative addresses will
+fault in user-mode for most operating systems, but targets which need the high
+address space to be user accessible may need to adjust the exact sequence used
+above. Additionally, the low addresses will need to be marked unreadable by the
+OS to fully harden the load.
+
+
+###### RIP-relative addressing is even easier to break
+
+There is a common addressing mode idiom that is substantially harder to check:
+addressing relative to the instruction pointer. We cannot change the value of
+the instruction pointer register and so we have the harder problem of forcing
+`%base + scale * %index + offset` to be an invalid address, by *only* changing
+`%index`. The only advantage we have is that the attacker also cannot modify
+`%base`. If we use the fast instruction sequence above, but only apply it to
+the index, we will always access `%rip + (scale * -1) + offset`. If the
+attacker can find a load which with this address happens to point to secret
+data, then they can reach it. However, the loader and base libraries can also
+simply refuse to map the heap, data segments, or stack within 2gb of any of the
+text in the program, much like it can reserve the low 2gb of address space.
+
+
+###### The flag registers again make everything hard
+
+Unfortunately, the technique of using `orq`-instructions has a serious flaw on
+x86. The very thing that makes it easy to accumulate state, the flag registers
+containing predicates, causes serious problems here because they may be alive
+and used by the loading instruction or subsequent instructions. On x86, the
+`orq` instruction **sets** the flags and will override anything already there.
+This makes inserting them into the instruction stream very hazardous.
+Unfortunately, unlike when hardening the loaded value, we have no fallback here
+and so we must have a fully general approach available.
+
+The first thing we must do when generating these sequences is try to analyze
+the surrounding code to prove that the flags are not in fact alive or being
+used. Typically, it has been set by some other instruction which just happens
+to set the flags register (much like ours!) with no actual dependency. In those
+cases, it is safe to directly insert these instructions. Alternatively we may
+be able to move them earlier to avoid clobbering the used value.
+
+However, this may ultimately be impossible. In that case, we need to preserve
+the flags around these instructions:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ pushfq
+ orq %rax, %rcx # Mask the pointer if misspeculating.
+ orq %rax, %rdx # Mask the index if misspeculating.
+ popfq
+ movl (%rcx,%rdx), %edi
+```
+
+Using the `pushf` and `popf` instructions saves the flags register around our
+inserted code, but comes at a high cost. First, we must store the flags to the
+stack and reload them. Second, this causes the stack pointer to be adjusted
+dynamically, requiring a frame pointer be used for referring to temporaries
+spilled to the stack, etc.
+
+On newer x86 processors we can use the `lahf` and `sahf` instructions to save
+all of the flags besides the overflow flag in a register rather than on the
+stack. We can then use `seto` and `add` to save and restore the overflow flag
+in a register. Combined, this will save and restore flags in the same manner as
+above but using two registers rather than the stack. That is still very
+expensive if slightly less expensive than `pushf` and `popf` in most cases.
+
+
+###### A flag-less alternative on Haswell, Zen and newer processors
+
+Starting with the BMI2 x86 instruction set extensions available on Haswell and
+Zen processors, there is an instruction for shifting that does not set any
+flags: `shrx`. We can use this and the `lea` instruction to implement analogous
+code sequences to the above ones. However, these are still very marginally
+slower, as there are fewer ports able to dispatch shift instructions in most
+modern x86 processors than there are for `or` instructions.
+
+Fast, single register addressing mode:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ shrxq %rax, %rsi, %rsi # Shift away bits if misspeculating.
+ movl (%rsi), %edi
+```
+
+This will collapse the register to zero or one, and everything but the offset
+in the addressing mode to be less than or equal to 9. This means the full
+address can only be guaranteed to be less than `(1 << 31) + 9`. The OS may wish
+to protect an extra page of the low address space to account for this
+
+
+##### Optimizations
+
+A very large portion of the cost for this approach comes from checking loads in
+this way, so it is important to work to optimize this. However, beyond making
+the instruction sequences to *apply* the checks efficient (for example by
+avoiding `pushfq` and `popfq` sequences), the only significant optimization is
+to check fewer loads without introducing a vulnerability. We apply several
+techniques to accomplish that.
+
+
+###### Don't check loads from compile-time constant stack offsets
+
+We implement this optimization on x86 by skipping the checking of loads which
+use a fixed frame pointer offset.
+
+The result of this optimization is that patterns like reloading a spilled
+register or accessing a global field don't get checked. This is a very
+significant performance win.
+
+
+###### Don't check dependent loads
+
+A core part of why this mitigation strategy works is that it establishes a
+data-flow check on the loaded address. However, this means that if the address
+itself was already loaded using a checked load, there is no need to check a
+dependent load provided it is within the same basic block as the checked load,
+and therefore has no additional predicates guarding it. Consider code like the
+following:
+```
+ ...
+
+.LBB0_4: # %danger
+ movq (%rcx), %rdi
+ movl (%rdi), %edx
+```
+
+This will get transformed into:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ orq %rax, %rcx # Mask the pointer if misspeculating.
+ movq (%rcx), %rdi # Hardened load.
+ movl (%rdi), %edx # Unhardened load due to dependent addr.
+```
+
+This doesn't check the load through `%rdi` as that pointer is dependent on a
+checked load already.
+
+
+###### Protect large, load-heavy blocks with a single lfence
+
+It may be worth using a single `lfence` instruction at the start of a block
+which begins with a (very) large number of loads that require independent
+protection *and* which require hardening the address of the load. However, this
+is unlikely to be profitable in practice. The latency hit of the hardening
+would need to exceed that of an `lfence` when *correctly* speculatively
+executed. But in that case, the `lfence` cost is a complete loss of speculative
+execution (at a minimum). So far, the evidence we have of the performance cost
+of using `lfence` indicates few if any hot code patterns where this trade off
+would make sense.
+
+
+###### Tempting optimizations that break the security model
+
+Several optimizations were considered which didn't pan out due to failure to
+uphold the security model. One in particular is worth discussing as many others
+will reduce to it.
+
+We wondered whether only the *first* load in a basic block could be checked. If
+the check works as intended, it forms an invalid pointer that doesn't even
+virtual-address translate in the hardware. It should fault very early on in its
+processing. Maybe that would stop things in time for the misspeculated path to
+fail to leak any secrets. This doesn't end up working because the processor is
+fundamentally out-of-order, even in its speculative domain. As a consequence,
+the attacker could cause the initial address computation itself to stall and
+allow an arbitrary number of unrelated loads (including attacked loads of
+secret data) to pass through.
+
+
+#### Interprocedural Checking
+
+Modern x86 processors may speculate into called functions and out of functions
+to their return address. As a consequence, we need a way to check loads that
+occur after a misspeculated predicate but where the load and the misspeculated
+predicate are in different functions. In essence, we need some interprocedural
+generalization of the predicate state tracking. A primary challenge to passing
+the predicate state between functions is that we would like to not require a
+change to the ABI or calling convention in order to make this mitigation more
+deployable, and further would like code mitigated in this way to be easily
+mixed with code not mitigated in this way and without completely losing the
+value of the mitigation.
+
+
+##### Embed the predicate state into the high bit(s) of the stack pointer
+
+We can use the same technique that allows hardening pointers to pass the
+predicate state into and out of functions. The stack pointer is trivially
+passed between functions and we can test for it having the high bits set to
+detect when it has been marked due to misspeculation. The callsite instruction
+sequence looks like (assuming a misspeculated state value of `-1`):
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ shlq $47, %rax
+ orq %rax, %rsp
+ callq other_function
+ movq %rsp, %rax
+ sarq 63, %rax # Sign extend the high bit to all bits.
+```
+
+This first puts the predicate state into the high bits of `%rsp` before calling
+the function and then reads it back out of high bits of `%rsp` afterward. When
+correctly executing (speculatively or not), these are all no-ops. When
+misspeculating, the stack pointer will end up negative. We arrange for it to
+remain a canonical address, but otherwise leave the low bits alone to allow
+stack adjustments to proceed normally without disrupting this. Within the
+called function, we can extract this predicate state and then reset it on
+return:
+```
+other_function:
+ # prolog
+ callq other_function
+ movq %rsp, %rax
+ sarq 63, %rax # Sign extend the high bit to all bits.
+ # ...
+
+.LBB0_N:
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ shlq $47, %rax
+ orq %rax, %rsp
+ retq
+```
+
+This approach is effective when all code is mitigated in this fashion, and can
+even survive very limited reaches into unmitigated code (the state will
+round-trip in and back out of an unmitigated function, it just won't be
+updated). But it does have some limitations. There is a cost to merging the
+state into `%rsp` and it doesn't insulate mitigated code from misspeculation in
+an unmitigated caller.
+
+There is also an advantage to using this form of interprocedural mitigation: by
+forming these invalid stack pointer addresses we can prevent speculative
+returns from successfully reading speculatively written values to the actual
+stack. This works first by forming a data-dependency between computing the
+address of the return address on the stack and our predicate state. And even
+when satisfied, if a misprediction causes the state to be poisoned the
+resulting stack pointer will be invalid.
+
+
+##### Rewrite API of internal functions to directly propagate predicate state
+
+(Not yet implemented.)
+
+We have the option with internal functions to directly adjust their API to
+accept the predicate as an argument and return it. This is likely to be
+marginally cheaper than embedding into `%rsp` for entering functions.
+
+
+##### Use `lfence` to guard function transitions
+
+An `lfence` instruction can be used to prevent subsequent loads from
+speculatively executing until all prior mispredicted predicates have resolved.
+We can use this broader barrier to speculative loads executing between
+functions. We emit it in the entry block to handle calls, and prior to each
+return. This approach also has the advantage of providing the strongest degree
+of mitigation when mixed with unmitigated code by halting all misspeculation
+entering a function which is mitigated, regardless of what occured in the
+caller. However, such a mixture is inherently more risky. Whether this kind of
+mixture is a sufficient mitigation requires careful analysis.
+
+Unfortunately, experimental results indicate that the performance overhead of
+this approach is very high for certain patterns of code. A classic example is
+any form of recursive evaluation engine. The hot, rapid call and return
+sequences exhibit dramatic performance loss when mitigated with `lfence`. This
+component alone can regress performance by 2x or more, making it an unpleasant
+tradeoff even when only used in a mixture of code.
+
+
+##### Use an internal TLS location to pass predicate state
+
+We can define a special thread-local value to hold the predicate state between
+functions. This avoids direct ABI implications by using a side channel between
+callers and callees to communicate the predicate state. It also allows implicit
+zero-initialization of the state, which allows non-checked code to be the first
+code executed.
+
+However, this requires a load from TLS in the entry block, a store to TLS
+before every call and every ret, and a load from TLS after every call. As a
+consequence it is expected to be substantially more expensive even than using
+`%rsp` and potentially `lfence` within the function entry block.
+
+
+##### Define a new ABI and/or calling convention
+
+We could define a new ABI and/or calling convention to explicitly pass the
+predicate state in and out of functions. This may be interesting if none of the
+alternatives have adequate performance, but it makes deployment and adoption
+dramatically more complex, and potentially infeasible.
+
+
+## High-Level Alternative Mitigation Strategies
+
+There are completely different alternative approaches to mitigating variant 1
+attacks. [Most](https://lwn.net/Articles/743265/)
+[discussion](https://lwn.net/Articles/744287/) so far focuses on mitigating
+specific known attackable components in the Linux kernel (or other kernels) by
+manually rewriting the code to contain an instruction sequence that is not
+vulnerable. For x86 systems this is done by either injecting an `lfence`
+instruction along the code path which would leak data if executed speculatively
+or by rewriting memory accesses to have branch-less masking to a known safe
+region. On Intel systems, `lfence` [will prevent the speculative load of secret
+data](https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf).
+On AMD systems `lfence` is currently a no-op, but can be made
+dispatch-serializing by setting an MSR, and thus preclude misspeculation of the
+code path ([mitigation G-2 +
+V1-1](https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf)).
+
+However, this relies on finding and enumerating all possible points in code
+which could be attacked to leak information. While in some cases static
+analysis is effective at doing this at scale, in many cases it still relies on
+human judgement to evaluate whether code might be vulnerable. Especially for
+software systems which receive less detailed scrutiny but remain sensitive to
+these attacks, this seems like an impractical security model. We need an
+automatic and systematic mitigation strategy.
+
+
+### Automatic `lfence` on Conditional Edges
+
+A natural way to scale up the existing hand-coded mitigations is simply to
+inject an `lfence` instruction into both the target and fallthrough
+destinations of every conditional branch. This ensures that no predicate or
+bounds check can be bypassed speculatively. However, the performance overhead
+of this approach is, simply put, catastrophic. Yet it remains the only truly
+"secure by default" approach known prior to this effort and serves as the
+baseline for performance.
+
+One attempt to address the performance overhead of this and make it more
+realistic to deploy is [MSVC's /Qspectre
+switch](https://blogs.msdn.microsoft.com/vcblog/2018/01/15/spectre-mitigations-in-msvc/).
+Their technique is to use static analysis within the compiler to only insert
+`lfence` instructions into conditional edges at risk of attack. However,
+[initial](https://arstechnica.com/gadgets/2018/02/microsofts-compiler-level-spectre-fix-shows-how-hard-this-problem-will-be-to-solve/)
+[analysis](https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html)
+has shown that this approach is incomplete and only catches a small and limited
+subset of attackable patterns which happen to resemble very closely the initial
+proofs of concept. As such, while its performance is acceptable, it does not
+appear to be an adequate systematic mitigation.
+
+
+## Performance Overhead
+
+The performance overhead of this style of comprehensive mitigation is very
+high. However, it compares very favorably with previously recommended
+approaches such as the `lfence` instruction. Just as users can restrict the
+scope of `lfence` to control its performance impact, this mitigation technique
+could be restricted in scope as well.
+
+However, it is important to understand what it would cost to get a fully
+mitigated baseline. Here we assume targeting a Haswell (or newer) processor and
+using all of the tricks to improve performance (so leaves the low 2gb
+unprotected and +/- 2gb surrounding any PC in the program). We ran both
+Google's microbenchmark suite and a large highly-tuned server built using
+ThinLTO and PGO. All were built with `-march=haswell` to give access to BMI2
+instructions, and benchmarks were run on large Haswell servers. We collected
+data both with an `lfence`-based mitigation and load hardening as presented
+here. The summary is that mitigating with load hardening is 1.77x faster than
+mitigating with `lfence`, and the overhead of load hardening compared to a
+normal program is likely between a 10% overhead and a 50% overhead with most
+large applications seeing a 30% overhead or less.
+
+| Benchmark | `lfence` | Load Hardening | Mitigated Speedup |
+| -------------------------------------- | -------: | -------------: | ----------------: |
+| Google microbenchmark suite | -74.8% | -36.4% | **2.5x** |
+| Large server QPS (using ThinLTO & PGO) | -62% | -29% | **1.8x** |
+
+Below is a visualization of the microbenchmark suite results which helps show
+the distribution of results that is somewhat lost in the summary. The y-axis is
+a log-scale speedup ratio of load hardening relative to `lfence` (up -> faster
+-> better). Each box-and-whiskers represents one microbenchmark which may have
+many different metrics measured. The red line marks the median, the box marks
+the first and third quartiles, and the whiskers mark the min and max.
+
+![Microbenchmark result visualization](speculative_load_hardening_microbenchmarks.png)
+
+We don't yet have benchmark data on SPEC or the LLVM test suite, but we can
+work on getting that. Still, the above should give a pretty clear
+characterization of the performance, and specific benchmarks are unlikely to
+reveal especially interesting properties.
+
+
+### Future Work: Fine Grained Control and API-Integration
+
+The performance overhead of this technique is likely to be very significant and
+something users wish to control or reduce. There are interesting options here
+that impact the implementation strategy used.
+
+One particularly appealing option is to allow both opt-in and opt-out of this
+mitigation at reasonably fine granularity such as on a per-function basis,
+including intelligent handling of inlining decisions -- protected code can be
+prevented from inlining into unprotected code, and unprotected code will become
+protected when inlined into protected code. For systems where only a limited
+set of code is reachable by externally controlled inputs, it may be possible to
+limit the scope of mitigation through such mechanisms without compromising the
+application's overall security. The performance impact may also be focused in a
+few key functions that can be hand-mitigated in ways that have lower
+performance overhead while the remainder of the application receives automatic
+protection.
+
+For both limiting the scope of mitigation or manually mitigating hot functions,
+there needs to be some support for mixing mitigated and unmitigated code
+without completely defeating the mitigation. For the first use case, it would
+be particularly desirable that mitigated code remains safe when being called
+during misspeculation from unmitigated code.
+
+For the second use case, it may be important to connect the automatic
+mitigation technique to explicit mitigation APIs such as what is described in
+http://wg21.link/p0928 (or any other eventual API) so that there is a clean way
+to switch from automatic to manual mitigation without immediately exposing a
+hole. However, the design for how to do this is hard to come up with until the
+APIs are better established. We will revisit this as those APIs mature.
diff --git a/docs/SystemLibrary.rst b/docs/SystemLibrary.rst
index 0d0f4fa99482..dba446b476da 100644
--- a/docs/SystemLibrary.rst
+++ b/docs/SystemLibrary.rst
@@ -209,10 +209,9 @@ Implementations of the System Library interface are separated by their general
class of operating system. Currently only Unix and Win32 classes are defined
but more could be added for other operating system classifications. To
distinguish which implementation to compile, the code in ``lib/System`` uses
-the ``LLVM_ON_UNIX`` and ``LLVM_ON_WIN32`` ``#defines`` provided via configure
-through the ``llvm/Config/config.h`` file. Each source file in ``lib/System``,
-after implementing the generic (operating system independent) functionality
-needs to include the correct implementation using a set of
+the ``LLVM_ON_UNIX`` and ``_WIN32`` ``#defines``. Each source file in
+``lib/System``, after implementing the generic (operating system independent)
+functionality needs to include the correct implementation using a set of
``#if defined(LLVM_ON_XYZ)`` directives. For example, if we had
``lib/System/File.cpp``, we'd expect to see in that file:
@@ -221,7 +220,7 @@ needs to include the correct implementation using a set of
#if defined(LLVM_ON_UNIX)
#include "Unix/File.cpp"
#endif
- #if defined(LLVM_ON_WIN32)
+ #if defined(_WIN32)
#include "Win32/File.cpp"
#endif
diff --git a/docs/TableGen/BackEnds.rst b/docs/TableGen/BackEnds.rst
index 993134386f76..8b3133835668 100644
--- a/docs/TableGen/BackEnds.rst
+++ b/docs/TableGen/BackEnds.rst
@@ -221,6 +221,22 @@ OptParserDefs
**Purpose**: Print enum values for a class.
+SearchableTables
+----------------
+
+**Purpose**: Generate custom searchable tables.
+
+**Output**: Enums, global tables and lookup helper functions.
+
+**Usage**: This backend allows generating free-form, target-specific tables
+from TableGen records. The ARM and AArch64 targets use this backend to generate
+tables of system registers; the AMDGPU target uses it to generate meta-data
+about complex image and memory buffer instructions.
+
+More documentation is available in ``include/llvm/TableGen/SearchableTable.td``,
+which also contains the definitions of TableGen classes which must be
+instantiated in order to define the enums and tables emitted by this backend.
+
CTags
-----
@@ -419,6 +435,127 @@ AttrDocs
**Purpose**: Creates ``AttributeReference.rst`` from ``AttrDocs.td``, and is
used for documenting user-facing attributes.
+General BackEnds
+================
+
+JSON
+----
+
+**Purpose**: Output all the values in every ``def``, as a JSON data
+structure that can be easily parsed by a variety of languages. Useful
+for writing custom backends without having to modify TableGen itself,
+or for performing auxiliary analysis on the same TableGen data passed
+to a built-in backend.
+
+**Output**:
+
+The root of the output file is a JSON object (i.e. dictionary),
+containing the following fixed keys:
+
+* ``!tablegen_json_version``: a numeric version field that will
+ increase if an incompatible change is ever made to the structure of
+ this data. The format described here corresponds to version 1.
+
+* ``!instanceof``: a dictionary whose keys are the class names defined
+ in the TableGen input. For each key, the corresponding value is an
+ array of strings giving the names of ``def`` records that derive
+ from that class. So ``root["!instanceof"]["Instruction"]``, for
+ example, would list the names of all the records deriving from the
+ class ``Instruction``.
+
+For each ``def`` record, the root object also has a key for the record
+name. The corresponding value is a subsidiary object containing the
+following fixed keys:
+
+* ``!superclasses``: an array of strings giving the names of all the
+ classes that this record derives from.
+
+* ``!fields``: an array of strings giving the names of all the variables
+ in this record that were defined with the ``field`` keyword.
+
+* ``!name``: a string giving the name of the record. This is always
+ identical to the key in the JSON root object corresponding to this
+ record's dictionary. (If the record is anonymous, the name is
+ arbitrary.)
+
+* ``!anonymous``: a boolean indicating whether the record's name was
+ specified by the TableGen input (if it is ``false``), or invented by
+ TableGen itself (if ``true``).
+
+For each variable defined in a record, the ``def`` object for that
+record also has a key for the variable name. The corresponding value
+is a translation into JSON of the variable's value, using the
+conventions described below.
+
+Some TableGen data types are translated directly into the
+corresponding JSON type:
+
+* A completely undefined value (e.g. for a variable declared without
+ initializer in some superclass of this record, and never initialized
+ by the record itself or any other superclass) is emitted as the JSON
+ ``null`` value.
+
+* ``int`` and ``bit`` values are emitted as numbers. Note that
+ TableGen ``int`` values are capable of holding integers too large to
+ be exactly representable in IEEE double precision. The integer
+ literal in the JSON output will show the full exact integer value.
+ So if you need to retrieve large integers with full precision, you
+ should use a JSON reader capable of translating such literals back
+ into 64-bit integers without losing precision, such as Python's
+ standard ``json`` module.
+
+* ``string`` and ``code`` values are emitted as JSON strings.
+
+* ``list<T>`` values, for any element type ``T``, are emitted as JSON
+ arrays. Each element of the array is represented in turn using these
+ same conventions.
+
+* ``bits`` values are also emitted as arrays. A ``bits`` array is
+ ordered from least-significant bit to most-significant. So the
+ element with index ``i`` corresponds to the bit described as
+ ``x{i}`` in TableGen source. However, note that this means that
+ scripting languages are likely to *display* the array in the
+ opposite order from the way it appears in the TableGen source or in
+ the diagnostic ``-print-records`` output.
+
+All other TableGen value types are emitted as a JSON object,
+containing two standard fields: ``kind`` is a discriminator describing
+which kind of value the object represents, and ``printable`` is a
+string giving the same representation of the value that would appear
+in ``-print-records``.
+
+* A reference to a ``def`` object has ``kind=="def"``, and has an
+ extra field ``def`` giving the name of the object referred to.
+
+* A reference to another variable in the same record has
+ ``kind=="var"``, and has an extra field ``var`` giving the name of
+ the variable referred to.
+
+* A reference to a specific bit of a ``bits``-typed variable in the
+ same record has ``kind=="varbit"``, and has two extra fields:
+ ``var`` gives the name of the variable referred to, and ``index``
+ gives the index of the bit.
+
+* A value of type ``dag`` has ``kind=="dag"``, and has two extra
+ fields. ``operator`` gives the initial value after the opening
+ parenthesis of the dag initializer; ``args`` is an array giving the
+ following arguments. The elements of ``args`` are arrays of length
+ 2, giving the value of each argument followed by its colon-suffixed
+ name (if any). For example, in the JSON representation of the dag
+ value ``(Op 22, "hello":$foo)`` (assuming that ``Op`` is the name of
+ a record defined elsewhere with a ``def`` statement):
+
+ * ``operator`` will be an object in which ``kind=="def"`` and
+ ``def=="Op"``
+
+ * ``args`` will be the array ``[[22, null], ["hello", "foo"]]``.
+
+* If any other kind of value or complicated expression appears in the
+ output, it will have ``kind=="complex"``, and no additional fields.
+ These values are not expected to be needed by backends. The standard
+ ``printable`` field can be used to extract a representation of them
+ in TableGen source syntax if necessary.
+
How to write a back-end
=======================
diff --git a/docs/TableGen/LangIntro.rst b/docs/TableGen/LangIntro.rst
index 460ff9067f20..ea46550ffc03 100644
--- a/docs/TableGen/LangIntro.rst
+++ b/docs/TableGen/LangIntro.rst
@@ -152,8 +152,8 @@ supported include:
``foreach <var> = [ <list> ] in <def>``
Replicate <body> or <def>, replacing instances of <var> with each value
in <list>. <var> is scoped at the level of the ``foreach`` loop and must
- not conflict with any other object introduced in <body> or <def>. Currently
- only ``def``\s are expanded within <body>.
+ not conflict with any other object introduced in <body> or <def>. Only
+ ``def``\s and ``defm``\s are expanded within <body>.
``foreach <var> = 0-15 in ...``
@@ -165,6 +165,24 @@ supported include:
remaining elements in the list may be arbitrary other values, including
nested ```dag``' values.
+``!con(a, b, ...)``
+ Concatenate two or more DAG nodes. Their operations must equal.
+
+ Example: !con((op a1:$name1, a2:$name2), (op b1:$name3)) results in
+ the DAG node (op a1:$name1, a2:$name2, b1:$name3).
+
+``!dag(op, children, names)``
+ Generate a DAG node programmatically. 'children' and 'names' must be lists
+ of equal length or unset ('?'). 'names' must be a 'list<string>'.
+
+ Due to limitations of the type system, 'children' must be a list of items
+ of a common type. In practice, this means that they should either have the
+ same type or be records with a common superclass. Mixing dag and non-dag
+ items is not possible. However, '?' can be used.
+
+ Example: !dag(op, [a1, a2, ?], ["name1", "name2", "name3"]) results in
+ (op a1:$name1, a2:$name2, ?:$name3).
+
``!listconcat(a, b, ...)``
A list value that is the result of concatenating the 'a' and 'b' lists.
The lists must have the same element type.
@@ -182,19 +200,48 @@ supported include:
the operand of the paste.
``!cast<type>(a)``
- A symbol of type *type* obtained by looking up the string 'a' in the symbol
- table. If the type of 'a' does not match *type*, TableGen aborts with an
- error. !cast<string> is a special case in that the argument must be an
- object defined by a 'def' construct.
+ If 'a' is a string, a record of type *type* obtained by looking up the
+ string 'a' in the list of all records defined by the time that all template
+ arguments in 'a' are fully resolved.
+
+ For example, if !cast<type>(a) appears in a multiclass definition, or in a
+ class instantiated inside of a multiclass definition, and 'a' does not
+ reference any template arguments of the multiclass, then a record of name
+ 'a' must be instantiated earlier in the source file. If 'a' does reference
+ a template argument, then the lookup is delayed until defm statements
+ instantiating the multiclass (or later, if the defm occurs in another
+ multiclass and template arguments of the inner multiclass that are
+ referenced by 'a' are substituted by values that themselves contain
+ references to template arguments of the outer multiclass).
+
+ If the type of 'a' does not match *type*, TableGen aborts with an error.
+
+ Otherwise, perform a normal type cast e.g. between an int and a bit, or
+ between record types. This allows casting a record to a subclass, though if
+ the types do not match, constant folding will be inhibited. !cast<string>
+ is a special case in that the argument can be an int or a record. In the
+ latter case, the record's name is returned.
+
+``!isa<type>(a)``
+ Returns an integer: 1 if 'a' is dynamically of the given type, 0 otherwise.
``!subst(a, b, c)``
If 'a' and 'b' are of string type or are symbol references, substitute 'b'
for 'a' in 'c.' This operation is analogous to $(subst) in GNU make.
``!foreach(a, b, c)``
- For each member of dag or list 'b' apply operator 'c.' 'a' is a dummy
- variable that should be declared as a member variable of an instantiated
- class. This operation is analogous to $(foreach) in GNU make.
+ For each member of dag or list 'b' apply operator 'c'. 'a' is the name
+ of a variable that will be substituted by members of 'b' in 'c'.
+ This operation is analogous to $(foreach) in GNU make.
+
+``!foldl(start, lst, a, b, expr)``
+ Perform a left-fold over 'lst' with the given starting value. 'a' and 'b'
+ are variable names which will be substituted in 'expr'. If you think of
+ expr as a function f(a,b), the fold will compute
+ 'f(...f(f(start, lst[0]), lst[1]), ...), lst[n-1])' for a list of length n.
+ As usual, 'a' will be of the type of 'start', and 'b' will be of the type
+ of elements of 'lst'. These types need not be the same, but 'expr' must be
+ of the same type as 'start'.
``!head(a)``
The first element of list 'a.'
@@ -205,6 +252,9 @@ supported include:
``!empty(a)``
An integer {0,1} indicating whether list 'a' is empty.
+``!size(a)``
+ An integer indicating the number of elements in list 'a'.
+
``!if(a,b,c)``
'b' if the result of 'int' or 'bit' operator 'a' is nonzero, 'c' otherwise.
@@ -213,8 +263,19 @@ supported include:
on string, int and bit objects. Use !cast<string> to compare other types of
objects.
-``!shl(a,b)`` ``!srl(a,b)`` ``!sra(a,b)`` ``!add(a,b)`` ``!and(a,b)``
- The usual binary and arithmetic operators.
+``!ne(a,b)``
+ The negation of ``!eq(a,b)``.
+
+``!le(a,b), !lt(a,b), !ge(a,b), !gt(a,b)``
+ (Signed) comparison of integer values that returns bit 1 or 0 depending on
+ the result of the comparison.
+
+``!shl(a,b)`` ``!srl(a,b)`` ``!sra(a,b)``
+ The usual shift operators. Operations are on 64-bit integers, the result
+ is undefined for shift counts outside [0, 63].
+
+``!add(a,b,...)`` ``!and(a,b,...)`` ``!or(a,b,...)``
+ The usual arithmetic and binary operators.
Note that all of the values have rules specifying how they convert to values
for different types. These rules allow you to assign a value like "``7``"
@@ -287,6 +348,23 @@ In this case, the ``Z`` definition will have a zero value for its ``V`` value,
despite the fact that it derives (indirectly) from the ``C`` class, because the
``D`` class overrode its value.
+References between variables in a record are substituted late, which gives
+``let`` expressions unusual power. Consider this admittedly silly example:
+
+.. code-block:: text
+
+ class A<int x> {
+ int Y = x;
+ int Yplus1 = !add(Y, 1);
+ int xplus1 = !add(x, 1);
+ }
+ def Z : A<5> {
+ let Y = 10;
+ }
+
+The value of ``Z.xplus1`` will be 6, but the value of ``Z.Yplus1`` is 11. Use
+this power wisely.
+
.. _template arguments:
Class template arguments
diff --git a/docs/TableGen/LangRef.rst b/docs/TableGen/LangRef.rst
index 285572fa481c..439d646034ad 100644
--- a/docs/TableGen/LangRef.rst
+++ b/docs/TableGen/LangRef.rst
@@ -98,7 +98,9 @@ wide variety of meanings:
:!eq !if !head !tail !con
:!add !shl !sra !srl !and
:!or !empty !subst !foreach !strconcat
- :!cast !listconcat
+ :!cast !listconcat !size !foldl
+ :!isa !dag !le !lt !ge
+ :!gt !ne
Syntax
@@ -115,13 +117,15 @@ TableGen's top-level production consists of "objects".
.. productionlist::
TableGenFile: `Object`*
- Object: `Class` | `Def` | `Defm` | `Let` | `MultiClass` | `Foreach`
+ Object: `Class` | `Def` | `Defm` | `Defset` | `Let` | `MultiClass` |
+ `Foreach`
``class``\es
------------
.. productionlist::
Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody`
+ TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">"
A ``class`` declaration creates a record which other records can inherit
from. A class can be parametrized by a list of "template arguments", whose
@@ -142,8 +146,9 @@ forward declaration: note that records deriving from the forward-declared
class will inherit no fields from it since the record expansion is done
when the record is parsed.
-.. productionlist::
- TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">"
+Every class has an implicit template argument called ``NAME``, which is set
+to the name of the instantiating ``def`` or ``defm``. The result is undefined
+if the class is instantiated by an anonymous record.
Declarations
------------
@@ -224,15 +229,17 @@ of:
int Baz = Bar;
}
+ Values defined in superclasses can be accessed the same way.
+
* a template arg of a ``class``, such as the use of ``Bar`` in::
class Foo<int Bar> {
int Baz = Bar;
}
-* value local to a ``multiclass``, such as the use of ``Bar`` in::
+* value local to a ``class``, such as the use of ``Bar`` in::
- multiclass Foo {
+ class Foo {
int Bar = 5;
int Baz = Bar;
}
@@ -240,9 +247,18 @@ of:
* a template arg to a ``multiclass``, such as the use of ``Bar`` in::
multiclass Foo<int Bar> {
- int Baz = Bar;
+ def : SomeClass<Bar>;
}
+* the iteration variable of a ``foreach``, such as the use of ``i`` in::
+
+ foreach i = 0-5 in
+ def Foo#i;
+
+* a variable defined by ``defset``
+
+* the implicit template argument ``NAME`` in a ``class`` or ``multiclass``
+
.. productionlist::
SimpleValue: `TokInteger`
@@ -291,7 +307,7 @@ given values.
leave it out.
.. productionlist::
- SimpleValue: "(" `DagArg` `DagArgList` ")"
+ SimpleValue: "(" `DagArg` [`DagArgList`] ")"
DagArgList: `DagArg` ("," `DagArg`)*
DagArg: `Value` [":" `TokVarName`] | `TokVarName`
@@ -322,50 +338,94 @@ It is after parsing the base class list that the "let stack" is applied.
Body: ";" | "{" BodyList "}"
BodyList: BodyItem*
BodyItem: `Declaration` ";"
- :| "let" `TokIdentifier` [`RangeList`] "=" `Value` ";"
+ :| "let" `TokIdentifier` [ "{" `RangeList` "}" ] "=" `Value` ";"
The ``let`` form allows overriding the value of an inherited field.
``def``
-------
-.. TODO::
- There can be pastes in the names here, like ``#NAME#``. Look into that
- and document it (it boils down to ParseIDValue with IDParseMode ==
- ParseNameMode). ParseObjectName calls into the general ParseValue, with
- the only different from "arbitrary expression parsing" being IDParseMode
- == Mode.
-
.. productionlist::
- Def: "def" `TokIdentifier` `ObjectBody`
+ Def: "def" [`Value`] `ObjectBody`
+
+Defines a record whose name is given by the optional :token:`Value`. The value
+is parsed in a special mode where global identifiers (records and variables
+defined by ``defset``) are not recognized, and all unrecognized identifiers
+are interpreted as strings.
-Defines a record whose name is given by the :token:`TokIdentifier`. The
-fields of the record are inherited from the base classes and defined in the
-body.
+If no name is given, the record is anonymous. The final name of anonymous
+records is undefined, but globally unique.
Special handling occurs if this ``def`` appears inside a ``multiclass`` or
a ``foreach``.
+When a non-anonymous record is defined in a multiclass and the given name
+does not contain a reference to the implicit template argument ``NAME``, such
+a reference will automatically be prepended. That is, the following are
+equivalent inside a multiclass::
+
+ def Foo;
+ def NAME#Foo;
+
``defm``
--------
.. productionlist::
- Defm: "defm" `TokIdentifier` ":" `BaseClassListNE` ";"
+ Defm: "defm" [`Value`] ":" `BaseClassListNE` ";"
+
+The :token:`BaseClassList` is a list of at least one ``multiclass`` and any
+number of ``class``'s. The ``multiclass``'s must occur before any ``class``'s.
+
+Instantiates all records defined in all given ``multiclass``'s and adds the
+given ``class``'s as superclasses.
+
+The name is parsed in the same special mode used by ``def``. If the name is
+missing, a globally unique string is used instead (but instantiated records
+are not considered to be anonymous, unless they were originally defined by an
+anonymous ``def``) That is, the following have different semantics::
+
+ defm : SomeMultiClass<...>; // some globally unique name
+ defm "" : SomeMultiClass<...>; // empty name string
+
+When it occurs inside a multiclass, the second variant is equivalent to
+``defm NAME : ...``. More generally, when ``defm`` occurs in a multiclass and
+its name does not contain a reference to the implicit template argument
+``NAME``, such a reference will automatically be prepended. That is, the
+following are equivalent inside a multiclass::
-Note that in the :token:`BaseClassList`, all of the ``multiclass``'s must
-precede any ``class``'s that appear.
+ defm Foo : SomeMultiClass<...>;
+ defm NAME#Foo : SomeMultiClass<...>;
+
+``defset``
+----------
+.. productionlist::
+ Defset: "defset" `Type` `TokIdentifier` "=" "{" `Object`* "}"
+
+All records defined inside the braces via ``def`` and ``defm`` are collected
+in a globally accessible list of the given name (in addition to being added
+to the global collection of records as usual). Anonymous records created inside
+initializier expressions using the ``Class<args...>`` syntax are never collected
+in a defset.
+
+The given type must be ``list<A>``, where ``A`` is some class. It is an error
+to define a record (via ``def`` or ``defm``) inside the braces which doesn't
+derive from ``A``.
``foreach``
-----------
.. productionlist::
- Foreach: "foreach" `Declaration` "in" "{" `Object`* "}"
- :| "foreach" `Declaration` "in" `Object`
+ Foreach: "foreach" `ForeachDeclaration` "in" "{" `Object`* "}"
+ :| "foreach" `ForeachDeclaration` "in" `Object`
+ ForeachDeclaration: ID "=" ( "{" `RangeList` "}" | `RangePiece` | `Value` )
The value assigned to the variable in the declaration is iterated over and
the object or object list is reevaluated with the variable set at each
iterated value.
+Note that the productions involving RangeList and RangePiece have precedence
+over the more generic value parsing based on the first token.
+
Top-Level ``let``
-----------------
diff --git a/docs/TableGen/index.rst b/docs/TableGen/index.rst
index 5ba555ac2d23..0697bd0298e8 100644
--- a/docs/TableGen/index.rst
+++ b/docs/TableGen/index.rst
@@ -76,11 +76,14 @@ example, to get a list of all of the definitions that subclass a particular type
ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
-The default backend prints out all of the records.
+The default backend prints out all of the records. There is also a general
+backend which outputs all the records as a JSON data structure, enabled using
+the `-dump-json` option.
If you plan to use TableGen, you will most likely have to write a `backend`_
that extracts the information specific to what you need and formats it in the
-appropriate way.
+appropriate way. You can do this by extending TableGen itself in C++, or by
+writing a script in any language that can consume the JSON output.
Example
-------
@@ -171,13 +174,6 @@ factor out the common features that instructions of its class share. A key
feature of TableGen is that it allows the end-user to define the abstractions
they prefer to use when describing their information.
-Each ``def`` record has a special entry called "NAME". This is the name of the
-record ("``ADD32rr``" above). In the general case ``def`` names can be formed
-from various kinds of string processing expressions and ``NAME`` resolves to the
-final value obtained after resolving all of those expressions. The user may
-refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``.
-``NAME`` should not be defined anywhere else in user code to avoid conflicts.
-
Syntax
======
@@ -224,7 +220,7 @@ definitions of a particular class, such as "Instruction".
class ProcNoItin<string Name, list<SubtargetFeature> Features>
: Processor<Name, NoItineraries, Features>;
-
+
Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
a list of target features is specializing the class Processor by passing the
arguments down as well as hard-coding NoItineraries.
diff --git a/docs/TestingGuide.rst b/docs/TestingGuide.rst
index a27da0de4d0e..3cc8cf4589bb 100644
--- a/docs/TestingGuide.rst
+++ b/docs/TestingGuide.rst
@@ -460,7 +460,10 @@ RUN lines:
Example: ``/home/user/llvm.build/test/MC/ELF/Output/foo_test.s.tmp``
``%T``
- Directory of ``%t``.
+ Directory of ``%t``. Deprecated. Shouldn't be used, because it can be easily
+ misused and cause race conditions between tests.
+
+ Use ``rm -rf %t && mkdir %t`` instead if a temporary directory is necessary.
Example: ``/home/user/llvm.build/test/MC/ELF/Output``
diff --git a/docs/Vectorizers.rst b/docs/Vectorizers.rst
index 92d6200e169f..42e8d02f337e 100644
--- a/docs/Vectorizers.rst
+++ b/docs/Vectorizers.rst
@@ -428,12 +428,3 @@ through clang using the command line flag:
.. code-block:: console
$ clang -fno-slp-vectorize file.c
-
-LLVM has a second basic block vectorization phase
-which is more compile-time intensive (The BB vectorizer). This optimization
-can be enabled through clang using the command line flag:
-
-.. code-block:: console
-
- $ clang -fslp-vectorize-aggressive file.c
-
diff --git a/docs/XRay.rst b/docs/XRay.rst
index ebf025678305..8616088b1062 100644
--- a/docs/XRay.rst
+++ b/docs/XRay.rst
@@ -28,9 +28,10 @@ XRay consists of three main parts:
- A runtime library for enabling/disabling tracing at runtime.
- A suite of tools for analysing the traces.
- **NOTE:** As of February 27, 2017 , XRay is only available for the following
+ **NOTE:** As of July 25, 2018 , XRay is only available for the following
architectures running Linux: x86_64, arm7 (no thumb), aarch64, powerpc64le,
- mips, mipsel, mips64, mips64el.
+ mips, mipsel, mips64, mips64el, NetBSD: x86_64, FreeBSD: x86_64 and
+ OpenBSD: x86_64.
The compiler-inserted instrumentation points come in the form of nop-sleds in
the final generated binary, and an ELF section named ``xray_instr_map`` which
@@ -59,7 +60,7 @@ For example:
::
- clang -fxray-instrument ..
+ clang -fxray-instrument ...
By default, functions that have at least 200 instructions will get XRay
instrumentation points. You can tweak that number through the
@@ -67,7 +68,7 @@ instrumentation points. You can tweak that number through the
::
- clang -fxray-instrument -fxray-instruction-threshold=1 ..
+ clang -fxray-instrument -fxray-instruction-threshold=1 ...
You can also specifically instrument functions in your binary to either always
or never be instrumented using source-level attributes. You can do it using the
@@ -117,6 +118,27 @@ it gets instrumented.
; ...
}
+Special Case File
+-----------------
+
+Attributes can be imbued through the use of special case files instead of
+adding them to the original source files. You can use this to mark certain
+functions and classes to be never, always, or instrumented with first-argument
+logging from a file. The file's format is described below:
+
+.. code-block:: bash
+
+ # Comments are supported
+ [always]
+ fun:always_instrument
+ fun:log_arg1=arg1 # Log the first argument for the function
+
+ [never]
+ fun:never_instrument
+
+These files can be provided through the ``-fxray-attr-list=`` flag to clang.
+You may have multiple files loaded through multiple instances of the flag.
+
XRay Runtime Library
--------------------
@@ -150,20 +172,6 @@ variable, where we list down the options and their defaults below.
| xray_logfile_base | ``const char*`` | ``xray-log.`` | Filename base for the |
| | | | XRay logfile. |
+-------------------+-----------------+---------------+------------------------+
-| xray_naive_log | ``bool`` | ``false`` | **DEPRECATED:** Use |
-| | | | xray_mode=xray-basic |
-| | | | instead. Whether to |
-| | | | install the basic log |
-| | | | the naive log |
-| | | | implementation. |
-+-------------------+-----------------+---------------+------------------------+
-| xray_fdr_log | ``bool`` | ``false`` | **DEPRECATED:** Use |
-| | | | xray_mode=xray-fdr |
-| | | | instead. Whether to |
-| | | | install the Flight |
-| | | | Data Recorder |
-| | | | (FDR) mode. |
-+-------------------+-----------------+---------------+------------------------+
| verbosity | ``int`` | ``0`` | Runtime verbosity |
| | | | level. |
+-------------------+-----------------+---------------+------------------------+
@@ -172,30 +180,45 @@ variable, where we list down the options and their defaults below.
If you choose to not use the default logging implementation that comes with the
XRay runtime and/or control when/how the XRay instrumentation runs, you may use
the XRay APIs directly for doing so. To do this, you'll need to include the
-``xray_interface.h`` from the compiler-rt ``xray`` directory. The important API
+``xray_log_interface.h`` from the compiler-rt ``xray`` directory. The important API
functions we list below:
-- ``__xray_set_handler(void (*entry)(int32_t, XRayEntryType))``: Install your
- own logging handler for when an event is encountered. See
- ``xray/xray_interface.h`` for more details.
-- ``__xray_remove_handler()``: Removes whatever the installed handler is.
-- ``__xray_patch()``: Patch all the instrumentation points defined in the
- binary.
-- ``__xray_unpatch()``: Unpatch the instrumentation points defined in the
- binary.
-
-There are some requirements on the logging handler to be installed for the
-thread-safety of operations to be performed by the XRay runtime library:
-
-- The function should be thread-safe, as multiple threads may be invoking the
- function at the same time. If the logging function needs to do
- synchronisation, it must do so internally as XRay does not provide any
- synchronisation guarantees outside from the atomicity of updates to the
- pointer.
-- The pointer provided to ``__xray_set_handler(...)`` must be live even after
- calls to ``__xray_remove_handler()`` and ``__xray_unpatch()`` have succeeded.
- XRay cannot guarantee that all threads that have ever gotten a copy of the
- pointer will not invoke the function.
+- ``__xray_log_register_mode(...)``: Register a logging implementation against
+ a string Mode identifier. The implementation is an instance of
+ ``XRayLogImpl`` defined in ``xray/xray_log_interface.h``.
+- ``__xray_log_select_mode(...)``: Select the mode to install, associated with
+ a string Mode identifier. Only implementations registered with
+ ``__xray_log_register_mode(...)`` can be chosen with this function.
+- ``__xray_log_init_mode(...)``: This function allows for initializing and
+ re-initializing an installed logging implementation. See
+ ``xray/xray_log_interface.h`` for details, part of the XRay compiler-rt
+ installation.
+
+Once a logging implementation has been initialized, it can be "stopped" by
+finalizing the implementation through the ``__xray_log_finalize()`` function.
+The finalization routine is the opposite of the initialization. When finalized,
+an implementation's data can be cleared out through the
+``__xray_log_flushLog()`` function. For implementations that support in-memory
+processing, these should register an iterator function to provide access to the
+data via the ``__xray_log_set_buffer_iterator(...)`` which allows code calling
+the ``__xray_log_process_buffers(...)`` function to deal with the data in
+memory.
+
+All of this is better explained in the ``xray/xray_log_interface.h`` header.
+
+Basic Mode
+----------
+
+XRay supports a basic logging mode which will trace the application's
+execution, and periodically append to a single log. This mode can be
+installed/enabled by setting ``xray_mode=xray-basic`` in the ``XRAY_OPTIONS``
+environment variable. Combined with ``patch_premain=true`` this can allow for
+tracing applications from start to end.
+
+Like all the other modes installed through ``__xray_log_select_mode(...)``, the
+implementation can be configured through the ``__xray_log_init_mode(...)``
+function, providing the mode string and the flag options. Basic-mode specific
+defaults can be provided in the ``XRAY_BASIC_OPTIONS`` environment variable.
Flight Data Recorder Mode
-------------------------
@@ -205,9 +228,12 @@ fixed amount of memory's worth of events. Flight Data Recorder (FDR) mode works
very much like a plane's "black box" which keeps recording data to memory in a
fixed-size circular queue of buffers, and have the data available
programmatically until the buffers are finalized and flushed. To use FDR mode
-on your application, you may set the ``xray_fdr_log`` option to ``true`` in the
-``XRAY_OPTIONS`` environment variable (while also optionally setting the
-``xray_naive_log`` to ``false``).
+on your application, you may set the ``xray_mode`` variable to ``xray-fdr`` in
+the ``XRAY_OPTIONS`` environment variable. Additional options to the FDR mode
+implementation can be provided in the ``XRAY_FDR_OPTIONS`` environment
+variable. Programmatic configuration can be done by calling
+``__xray_log_init_mode("xray-fdr", <configuration string>)`` once it has been
+selected/installed.
When the buffers are flushed to disk, the result is a binary trace format
described by `XRay FDR format <XRayFDRFormat.html>`_
@@ -239,34 +265,15 @@ provided below:
}
The default settings for the FDR mode implementation will create logs named
-similarly to the naive log implementation, but will have a different log
+similarly to the basic log implementation, but will have a different log
format. All the trace analysis tools (and the trace reading library) will
support all versions of the FDR mode format as we add more functionality and
record types in the future.
- **NOTE:** We do not however promise perpetual support for when we update the
- log versions we support going forward. Deprecation of the formats will be
+ **NOTE:** We do not promise perpetual support for when we update the log
+ versions we support going forward. Deprecation of the formats will be
announced and discussed on the developers mailing list.
-XRay allows for replacing the default FDR mode logging implementation using the
-following API:
-
-- ``__xray_set_log_impl(...)``: This function takes a struct of type
- ``XRayLogImpl``, which is defined in ``xray/xray_log_interface.h``, part of
- the XRay compiler-rt installation.
-- ``__xray_log_register_mode(...)``: Register a logging implementation against
- a string Mode. The implementation is an instance of ``XRayLogImpl`` defined
- in ``xray/xray_log_interface.h``.
-- ``__xray_log_select_mode(...)``: Select the mode to install, associated with
- a string Mode. Only implementations registered with
- ``__xray_log_register_mode(...)`` can be chosen with this function. When
- successful, has the same effects as calling ``__xray_set_log_impl(...)`` with
- the registered logging implementation.
-- ``__xray_log_init(...)``: This function allows for initializing and
- re-initializing an installed logging implementation. See
- ``xray/xray_log_interface.h`` for details, part of the XRay compiler-rt
- installation.
-
Trace Analysis Tools
--------------------
@@ -280,7 +287,7 @@ supports the following subcommands:
options for sorting, and output formats (supports CSV, YAML, and
console-friendly TEXT).
- ``convert``: Converts an XRay log file from one format to another. We can
- convert from binary XRay traces (both naive and FDR mode) to YAML,
+ convert from binary XRay traces (both basic and FDR mode) to YAML,
`flame-graph <https://github.com/brendangregg/FlameGraph>`_ friendly text
formats, as well as `Chrome Trace Viewer (catapult)
<https://github.com/catapult-project/catapult>` formats.
diff --git a/docs/XRayExample.rst b/docs/XRayExample.rst
index f8e7d943fedd..e1b8c9b69d5f 100644
--- a/docs/XRayExample.rst
+++ b/docs/XRayExample.rst
@@ -48,11 +48,11 @@ Getting Traces
--------------
By default, XRay does not write out the trace files or patch the application
-before main starts. If we just run ``llc`` it should just work like a normally
-built binary. However, if we want to get a full trace of the application's
-operations (of the functions we do end up instrumenting with XRay) then we need
-to enable XRay at application start. To do this, XRay checks the
-``XRAY_OPTIONS`` environment variable.
+before main starts. If we run ``llc`` it should work like a normally built
+binary. If we want to get a full trace of the application's operations (of the
+functions we do end up instrumenting with XRay) then we need to enable XRay
+at application start. To do this, XRay checks the ``XRAY_OPTIONS`` environment
+variable.
::
@@ -73,9 +73,8 @@ instrumented, and how much time we're spending in parts of the code. To make
sense of this data, we use the ``llvm-xray`` tool which has a few subcommands
to help us understand our trace.
-One of the simplest things we can do is to get an accounting of the functions
-that have been instrumented. We can see an example accounting with ``llvm-xray
-account``:
+One of the things we can do is to get an accounting of the functions that have
+been instrumented. We can see an example accounting with ``llvm-xray account``:
::
@@ -178,22 +177,22 @@ add the attribute to the source.
To use this feature, you can define one file for the functions to always
instrument, and another for functions to never instrument. The format of these
files are exactly the same as the SanitizerLists files that control similar
-things for the sanitizer implementations. For example, we can have two
-different files like below:
+things for the sanitizer implementations. For example:
::
- # always-instrument.txt
+ # xray-attr-list.txt
# always instrument functions that match the following filters:
+ [always]
fun:main
- # never-instrument.txt
# never instrument functions that match the following filters:
+ [never]
fun:__cxx_*
-Given the above two files we can re-build by providing those two files as
-arguments to clang as ``-fxray-always-instrument=always-instrument.txt`` or
-``-fxray-never-instrument=never-instrument.txt``.
+Given the file above we can re-build by providing it to the
+``-fxray-attr-list=`` flag to clang. You can have multiple files, each defining
+different sets of attribute sets, to be combined into a single list by clang.
The XRay stack tool
-------------------
@@ -202,8 +201,7 @@ Given a trace, and optionally an instrumentation map, the ``llvm-xray stack``
command can be used to analyze a call stack graph constructed from the function
call timeline.
-The simplest way to use the command is simply to output the top stacks by call
-count and time spent.
+The way to use the command is to output the top stacks by call count and time spent.
::
@@ -245,7 +243,7 @@ FlameGraph tool, currently available on `github
To generate output for a flamegraph, a few more options are necessary.
-- ``-all-stacks`` - Emits all of the stacks instead of just the top stacks.
+- ``-all-stacks`` - Emits all of the stacks.
- ``-stack-format`` - Choose the flamegraph output format 'flame'.
- ``-aggregation-type`` - Choose the metric to graph.
diff --git a/docs/XRayFDRFormat.rst b/docs/XRayFDRFormat.rst
index f7942bc212df..46f72c54228b 100644
--- a/docs/XRayFDRFormat.rst
+++ b/docs/XRayFDRFormat.rst
@@ -15,7 +15,7 @@ When gathering XRay traces in Flight Data Recorder mode, each thread of an
application will claim buffers to fill with trace data, which at some point
is finalized and flushed.
-A goal of the profiler is to minimize overhead, so the flushed data directly
+A goal of the profiler is to minimize overhead, the flushed data directly
corresponds to the buffer.
This document describes the format of a trace file.
@@ -106,11 +106,11 @@ There are a few categories of data in the sequence.
- ``Function Arguments``: The arguments to some functions are included in the
trace. These are either pointer addresses or primitives that are read and
logged independently of their types in a high level language. To the tracer,
- they are all simply numbers. Function Records that have attached arguments
- will indicate their presence on the function entry record. We only support
- logging contiguous function argument sequences starting with argument zero,
- which will be the "this" pointer for member function invocations. For example,
- we don't support logging the first and third argument.
+ they are all numbers. Function Records that have attached arguments will
+ indicate their presence on the function entry record. We only support logging
+ contiguous function argument sequences starting with argument zero, which will
+ be the "this" pointer for member function invocations. For example, we don't
+ support logging the first and third argument.
A reader of the memory format must maintain a state machine. The format makes no
attempt to pad for alignment, and it is not seekable.
diff --git a/docs/YamlIO.rst b/docs/YamlIO.rst
index 4c07820b6f99..ac4f8d183220 100644
--- a/docs/YamlIO.rst
+++ b/docs/YamlIO.rst
@@ -1020,7 +1020,7 @@ object. For example:
// Reading multiple documents in one file
using llvm::yaml::Input;
- LLVM_YAML_IS_DOCUMENT_LIST_VECTOR(std::vector<MyDocType>)
+ LLVM_YAML_IS_DOCUMENT_LIST_VECTOR(MyDocType)
Input yin(mb.getBuffer());
diff --git a/docs/conf.py b/docs/conf.py
index 92eb9813ecf9..ce7df14ac3af 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -48,9 +48,9 @@ copyright = u'2003-%d, LLVM Project' % date.today().year
# built documents.
#
# The short version.
-version = '6'
+version = '7'
# The full version, including alpha/beta/rc tags.
-release = '6'
+release = '7'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
diff --git a/docs/doxygen.cfg.in b/docs/doxygen.cfg.in
index e3c7f479ac4e..fc11f6863ac8 100644
--- a/docs/doxygen.cfg.in
+++ b/docs/doxygen.cfg.in
@@ -285,7 +285,7 @@ MARKDOWN_SUPPORT = YES
# When enabled doxygen tries to link words that correspond to documented
# classes, or namespaces to their corresponding documentation. Such a link can
-# be prevented in individual cases by by putting a % sign in front of the word
+# be prevented in individual cases by putting a % sign in front of the word
# or globally by setting AUTOLINK_SUPPORT to NO.
# The default value is: YES.
diff --git a/docs/index.rst b/docs/index.rst
index 47c2f0473931..2173f94459dd 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -242,6 +242,8 @@ For developers of applications which use LLVM as a library.
:doc:`OptBisect`
A command line option for debugging optimization-induced failures.
+.. _index-subsystem-docs:
+
Subsystem Documentation
=======================
@@ -431,6 +433,7 @@ Information about LLVM's development process.
.. toctree::
:hidden:
+ Contributing
DeveloperPolicy
Projects
LLVMBuild
@@ -439,6 +442,9 @@ Information about LLVM's development process.
ReleaseProcess
Phabricator
+:doc:`Contributing`
+ An overview on how to contribute to LLVM.
+
:doc:`DeveloperPolicy`
The LLVM project's policy towards developers and their contributions.
diff --git a/docs/speculative_load_hardening_microbenchmarks.png b/docs/speculative_load_hardening_microbenchmarks.png
new file mode 100644
index 000000000000..b6f7d05bf5b5
--- /dev/null
+++ b/docs/speculative_load_hardening_microbenchmarks.png
Binary files differ
diff --git a/docs/tutorial/BuildingAJIT1.rst b/docs/tutorial/BuildingAJIT1.rst
index 9d7f50477836..2b83df42fc24 100644
--- a/docs/tutorial/BuildingAJIT1.rst
+++ b/docs/tutorial/BuildingAJIT1.rst
@@ -8,6 +8,11 @@ Building a JIT: Starting out with KaleidoscopeJIT
Chapter 1 Introduction
======================
+**Warning: This text is currently out of date due to ORC API updates.**
+
+**The example code has been updated and can be used. The text will be updated
+once the API churn dies down.**
+
Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This
tutorial runs through the implementation of a JIT compiler using LLVM's
On-Request-Compilation (ORC) APIs. It begins with a simplified version of the
diff --git a/docs/tutorial/BuildingAJIT2.rst b/docs/tutorial/BuildingAJIT2.rst
index f1861033cc79..15c9c3586bc3 100644
--- a/docs/tutorial/BuildingAJIT2.rst
+++ b/docs/tutorial/BuildingAJIT2.rst
@@ -12,6 +12,11 @@ we welcome any feedback.
Chapter 2 Introduction
======================
+**Warning: This text is currently out of date due to ORC API updates.**
+
+**The example code has been updated and can be used. The text will be updated
+once the API churn dies down.**
+
Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In
`Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT
class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce
@@ -219,7 +224,7 @@ layer interface. The interface consists of one typedef and five methods:
| | emitAndFinalize. |
+------------------+-----------------------------------------------------------+
| | Takes a given set of Modules and makes them "available |
-| | for execution. This means that symbols in those modules |
+| | for execution". This means that symbols in those modules |
| | should be searchable via findSymbol and findSymbolIn, and |
| | the address of the symbols should be read/writable (for |
| | data symbols), or executable (for function symbols) after |
diff --git a/docs/tutorial/BuildingAJIT3.rst b/docs/tutorial/BuildingAJIT3.rst
index 9c4e59fe1176..36ec2e707a73 100644
--- a/docs/tutorial/BuildingAJIT3.rst
+++ b/docs/tutorial/BuildingAJIT3.rst
@@ -12,6 +12,11 @@ we welcome any feedback.
Chapter 3 Introduction
======================
+**Warning: This text is currently out of date due to ORC API updates.**
+
+**The example code has been updated and can be used. The text will be updated
+once the API churn dies down.**
+
Welcome to Chapter 3 of the "Building an ORC-based JIT in LLVM" tutorial. This
chapter discusses lazy JITing and shows you how to enable it by adding an ORC
CompileOnDemand layer the JIT from `Chapter 2 <BuildingAJIT2.html>`_.
diff --git a/docs/tutorial/LangImpl02.rst b/docs/tutorial/LangImpl02.rst
index d72c8dc9add4..6982e969c8af 100644
--- a/docs/tutorial/LangImpl02.rst
+++ b/docs/tutorial/LangImpl02.rst
@@ -20,7 +20,7 @@ Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
`Operator-Precedence
Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
parse the Kaleidoscope language (the latter for binary expressions and
-the former for everything else). Before we get to parsing though, lets
+the former for everything else). Before we get to parsing though, let's
talk about the output of the parser: the Abstract Syntax Tree.
The Abstract Syntax Tree (AST)
@@ -716,15 +716,15 @@ Intermediate Representation (IR) from the AST.
Full Code Listing
=================
-Here is the complete code listing for this and the previous chapter.
-Note that it is fully self-contained: you don't need LLVM or any
-external libraries at all for this. (Besides the C and C++ standard
-libraries, of course.) To build this, just compile with:
+Here is the complete code listing for our running example. Because this
+uses the LLVM libraries, we need to link them in. To do this, we use the
+`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
+our makefile/command line about which options to use:
.. code-block:: bash
# Compile
- clang++ -g -O3 toy.cpp
+ clang++ -g -O3 toy.cpp `llvm-config --cxxflags`
# Run
./a.out
diff --git a/docs/tutorial/LangImpl03.rst b/docs/tutorial/LangImpl03.rst
index fab2ddaf8829..1f2d20f40c76 100644
--- a/docs/tutorial/LangImpl03.rst
+++ b/docs/tutorial/LangImpl03.rst
@@ -261,7 +261,7 @@ Function Code Generation
Code generation for prototypes and functions must handle a number of
details, which make their code less beautiful than expression code
generation, but allows us to illustrate some important points. First,
-lets talk about code generation for prototypes: they are used both for
+let's talk about code generation for prototypes: they are used both for
function bodies and external function declarations. The code starts
with:
diff --git a/docs/tutorial/LangImpl04.rst b/docs/tutorial/LangImpl04.rst
index 921c4dcc21ad..8927a912cc20 100644
--- a/docs/tutorial/LangImpl04.rst
+++ b/docs/tutorial/LangImpl04.rst
@@ -203,7 +203,7 @@ Another good source of ideas can come from looking at the passes that
experiment with passes from the command line, so you can see if they do
anything.
-Now that we have reasonable code coming out of our front-end, lets talk
+Now that we have reasonable code coming out of our front-end, let's talk
about executing it!
Adding a JIT Compiler
@@ -335,7 +335,7 @@ Recall, however, that the module we created a few lines earlier (via
``InitializeModuleAndPassManager``) is still open and waiting for new code to be
added.
-With just these two changes, lets see how Kaleidoscope works now!
+With just these two changes, let's see how Kaleidoscope works now!
::
@@ -380,7 +380,7 @@ demonstrates very basic functionality, but can we do more?
Function definitions and calls also work, but something went very wrong on that
last line. The call looks valid, so what happened? As you may have guessed from
-the the API a Module is a unit of allocation for the JIT, and testfunc was part
+the API a Module is a unit of allocation for the JIT, and testfunc was part
of the same module that contained anonymous expression. When we removed that
module from the JIT to free the memory for the anonymous expression, we deleted
the definition of ``testfunc`` along with it. Then, when we tried to call
@@ -514,7 +514,7 @@ In HandleDefinition, we add two lines to transfer the newly defined function to
the JIT and open a new module. In HandleExtern, we just need to add one line to
add the prototype to FunctionProtos.
-With these changes made, lets try our REPL again (I removed the dump of the
+With these changes made, let's try our REPL again (I removed the dump of the
anonymous functions this time, you should get the idea by now :) :
::
@@ -597,7 +597,7 @@ if we add:
.. code-block:: c++
- #ifdef LLVM_ON_WIN32
+ #ifdef _WIN32
#define DLLEXPORT __declspec(dllexport)
#else
#define DLLEXPORT
diff --git a/docs/tutorial/LangImpl05.rst b/docs/tutorial/LangImpl05.rst
index 8650892e8f8b..dad24890e123 100644
--- a/docs/tutorial/LangImpl05.rst
+++ b/docs/tutorial/LangImpl05.rst
@@ -27,7 +27,7 @@ lexer, parser, AST, and LLVM code emitter. This example is nice, because
it shows how easy it is to "grow" a language over time, incrementally
extending it as new ideas are discovered.
-Before we get going on "how" we add this extension, lets talk about
+Before we get going on "how" we add this extension, let's talk about
"what" we want. The basic idea is that we want to be able to write this
sort of thing:
@@ -54,7 +54,7 @@ false, the second subexpression is evaluated and returned. Since
Kaleidoscope allows side-effects, this behavior is important to nail
down.
-Now that we know what we "want", lets break this down into its
+Now that we know what we "want", let's break this down into its
constituent pieces.
Lexer Extensions for If/Then/Else
@@ -176,7 +176,7 @@ of the if/then/else example, because this is where it starts to
introduce new concepts. All of the code above has been thoroughly
described in previous chapters.
-To motivate the code we want to produce, lets take a look at a simple
+To motivate the code we want to produce, let's take a look at a simple
example. Consider:
::
@@ -276,7 +276,7 @@ of using the techniques that we will describe for #1, or you can insert
Phi nodes directly, if convenient. In this case, it is really
easy to generate the Phi node, so we choose to do it directly.
-Okay, enough of the motivation and overview, lets generate code!
+Okay, enough of the motivation and overview, let's generate code!
Code Generation for If/Then/Else
--------------------------------
@@ -429,7 +429,7 @@ languages...
=====================
Now that we know how to add basic control flow constructs to the
-language, we have the tools to add more powerful things. Lets add
+language, we have the tools to add more powerful things. Let's add
something more aggressive, a 'for' expression:
::
@@ -450,7 +450,7 @@ it executes its body expression. Because we don't have anything better
to return, we'll just define the loop as always returning 0.0. In the
future when we have mutable variables, it will get more useful.
-As before, lets talk about the changes that we need to Kaleidoscope to
+As before, let's talk about the changes that we need to Kaleidoscope to
support this.
Lexer Extensions for the 'for' Loop
@@ -619,7 +619,7 @@ this dump is generated with optimizations disabled for clarity):
}
This loop contains all the same constructs we saw before: a phi node,
-several expressions, and some basic blocks. Lets see how this fits
+several expressions, and some basic blocks. Let's see how this fits
together.
Code Generation for the 'for' Loop
diff --git a/docs/tutorial/LangImpl06.rst b/docs/tutorial/LangImpl06.rst
index cb8ec766bb26..2a9f4c6b609c 100644
--- a/docs/tutorial/LangImpl06.rst
+++ b/docs/tutorial/LangImpl06.rst
@@ -303,7 +303,7 @@ we need to do to "extend the grammar".
Now we have useful user-defined binary operators. This builds a lot on
the previous framework we built for other operators. Adding unary
operators is a bit more challenging, because we don't have any framework
-for it yet - lets see what it takes.
+for it yet - let's see what it takes.
User-defined Unary Operators
============================
diff --git a/docs/tutorial/LangImpl08.rst b/docs/tutorial/LangImpl08.rst
index 96eccaebd329..da4e60f84b8d 100644
--- a/docs/tutorial/LangImpl08.rst
+++ b/docs/tutorial/LangImpl08.rst
@@ -44,7 +44,7 @@ returns the target triple of the current machine.
auto TargetTriple = sys::getDefaultTargetTriple();
-LLVM doesn't require us to to link in all the target
+LLVM doesn't require us to link in all the target
functionality. For example, if we're just using the JIT, we don't need
the assembly printers. Similarly, if we're only targeting certain
architectures, we can only link in the functionality for those
diff --git a/docs/tutorial/OCamlLangImpl1.rst b/docs/tutorial/OCamlLangImpl1.rst
index 9de92305a1c3..3fed61d2d4e1 100644
--- a/docs/tutorial/OCamlLangImpl1.rst
+++ b/docs/tutorial/OCamlLangImpl1.rst
@@ -193,7 +193,7 @@ as:
``Lexer.lex`` works by recursing over a ``char Stream.t`` to read
characters one at a time from the standard input. It eats them as it
-recognizes them and stores them in in a ``Token.token`` variant. The
+recognizes them and stores them in a ``Token.token`` variant. The
first thing that it has to do is ignore whitespace between tokens. This
is accomplished with the recursive call above.