aboutsummaryrefslogtreecommitdiff
path: root/HOWTO.md
blob: dfaf9de6452e858ecc18f4aa52669226da524cd9 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
HOWTO - using the library with perf   {#howto_perf}
===================================

@brief Using command line perf and OpenCSD to collect and decode trace.

This HOWTO explains how to use the perf cmd line tools and the openCSD
library to collect and extract program flow traces generated by the 
CoreSight IP blocks on a Linux system.  The examples have been generated using
an aarch64 Juno-r0 platform.  All information is considered accurate and tested
using the latest version of the library and the `master` branch on the
[perf-opencsd github repository][1].


On Target Trace Acquisition - Perf Record
-----------------------------------------
All the enhancement to the Perf tools that support the new `cs_etm` pmu have
not been upstreamed yet.  To get the required functionality branch
`perf-opencsd-master` needs to be downloaded to the target system where
traces are to be collected.  This branch is a vanilla upstream kernel
supplemented with modifications to the CoreSight framework and drivers to be
usable by the Perf core.  The remaining out of tree patches are being
upstreamed incrementally.

From there compiling the perf tools with `make -C tools/perf` will yield a
`perf` executable that will support CoreSight trace collection.  Note that if
traces are to be decompressed *off* target, there is no need to download and
compile the openCSD library (on the target).

Before launching a trace run a sink that will collect trace data needs to be
identified.  All CoreSight blocks identified by the framework are registed in
sysFS:


    linaro@linaro-nano:~$ ls /sys/bus/coresight/devices/
    20010000.etf   20040000.main_funnel  22040000.etm 22140000.etm  
    230c0000.A53_funnel  23240000.etm  replicator@20020000 20030000.tpiu
    20070000.etr 220c0000.A57_funnel  23040000.etm  23140000.etm 23340000.etm


CoreSight blocks are listed in the device tree for a specific system and
discovered at boot time.  Since tracers can be linked to more than one sink,
the sink that will recieve trace data needs to be identified and given as an
option on the perf command line.  Once a sink has been identify trace collection
can start.  An easy and yet interesting example is the `uname` command:

    linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@20070000.etr/ --per-thread uname

This will generate a `perf.data` file where execution has been traced for both
user and kernel space.  To narrow the field to either user or kernel space the
`u` and `k` options can be specified.  For example the following will limit
traces to user space:


    linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@20070000.etr/u --per-thread uname
    Problems setting modules path maps, continuing anyway...
    -----------------------------------------------------------
    perf_event_attr:
      type                             8
      size                             112
      { sample_period, sample_freq }   1
      sample_type                      IP|TID|IDENTIFIER
      read_format                      ID
      disabled                         1
      exclude_kernel                   1
      exclude_hv                       1
      enable_on_exec                   1
      sample_id_all                    1
    ------------------------------------------------------------
    sys_perf_event_open: pid 11375  cpu -1  group_fd -1  flags 0x8
    ------------------------------------------------------------
    perf_event_attr:
      type                             1
      size                             112
      config                           0x9
      { sample_period, sample_freq }   1
      sample_type                      IP|TID|IDENTIFIER
      read_format                      ID
      disabled                         1
      exclude_kernel                   1
      exclude_hv                       1
      mmap                             1
      comm                             1
      enable_on_exec                   1
      task                             1
      sample_id_all                    1
      mmap2                            1
      comm_exec                        1
    ------------------------------------------------------------
    sys_perf_event_open: pid 11375  cpu -1  group_fd -1  flags 0x8
    mmap size 266240B
    AUX area mmap length 131072
    perf event ring buffer mmapped per thread
    Synthesizing auxtrace information
    Linux
    auxtrace idx 0 old 0 head 0x11ea0 diff 0x11ea0
    [ perf record: Woken up 1 times to write data ]
    overlapping maps:
     7f99daf000-7f99db0000 0 [vdso]
     7f99d84000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
     7f99d84000-7f99daf000 0 /lib/aarch64-linux-gnu/ld-2.21.so
     7f99db0000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
    failed to write feature 8
    failed to write feature 9
    failed to write feature 14
    [ perf record: Captured and wrote 0.072 MB perf.data ]

    linaro@linaro-nano:~/kernel$ ls -l ~/.debug/ perf.data
    _-rw------- 1 linaro linaro 77888 Mar  2 20:41 perf.data

    /home/linaro/.debug/:
    total 16
    drwxr-xr-x 2 linaro linaro 4096 Mar  2 20:40 [kernel.kallsyms]
    drwxr-xr-x 2 linaro linaro 4096 Mar  2 20:40 [vdso]
    drwxr-xr-x 3 linaro linaro 4096 Mar  2 20:40 bin
    drwxr-xr-x 3 linaro linaro 4096 Mar  2 20:40 lib

Trace data filtering
--------------------
The amount of traces generated by CoreSight tracers is staggering, event for
the most simple trace scenario.  Reducing trace generation to specific areas
of interest is desirable to save trace buffer space and avoid getting lost in
the trace data that isn't relevant.  Supplementing the 'k' and 'u' options
described above is the notion of address filters.

On CoreSight two types of address filter have been implemented - address range
and start/stop filter:

**Address range filters:**
With address range filters traces are generated if the instruction pointer
falls within the specified range.  Any work done by the CPU outside of that
range will not be traced.  Address range filters can be specified for both
user and kernel space session:

    perf record -e cs_etm/@20070000.etr/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname

    perf record -e cs_etm/@20070000.etr/u --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main

When dealing with kernel space trace addresses are typically taken in the
'System.map' file.  In user space addresses are relocatable and can be
extracted from an objdump output:

    $ aarch64-linux-gnu-objdump  -d libcstest.so.1.0
    ...
    ...
    000000000000072c <coresight_test1>:		<------------ Beginning of traces
     72c:	d10083ff 	sub	sp, sp, #0x20
     730:	b9000fe0 	str	w0, [sp,#12]
     734:	b9001fff 	str	wzr, [sp,#28]
     738:	14000007 	b	754 <coresight_test1+0x28>
     73c:	b9400fe0 	ldr	w0, [sp,#12]
     740:	11000800 	add	w0, w0, #0x2
     744:	b9000fe0 	str	w0, [sp,#12]
     748:	b9401fe0 	ldr	w0, [sp,#28]
     74c:	11000400 	add	w0, w0, #0x1
     750:	b9001fe0 	str	w0, [sp,#28]
     754:	b9401fe0 	ldr	w0, [sp,#28]
     758:	7100101f 	cmp	w0, #0x4
     75c:	54ffff0d 	b.le	73c <coresight_test1+0x10>
     760:	b9400fe0 	ldr	w0, [sp,#12]
     764:	910083ff 	add	sp, sp, #0x20
     768:	d65f03c0 	ret
    ...
    ...

Following the address the amount of byte is specified and if tracing in user
space, the full path to the binary (or library) being traced.

**Start/Stop filters:**
With start/stop filters traces are generated when the instruction pointer is
equal to the start address.  Incidentally traces stop being generated when the
insruction pointer is equal to the stop address.  Anything that happens between
there to events is traced:

    perf record -e cs_etm/@20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname

    perf record -vvv -e cs_etm/@20070000.etr/u --filter 'start 0x72c@/opt/lib/libcstest.so.1.0,    \
                                                         stop 0x40082c@/home/linaro/main'          \
                                                         --per-thread ./main

**Limitation on address filters:**
The only limitation on address filters is the amount of address comparator
found on an implementation and the mutual exclusion between range and
start stop filters.  As such the following example would _not_ work:

    perf record -e cs_etm/@20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \  // start/stop
                                                    filter 0x72c/0x40@/opt/lib/libcstest.so.1.0'      \  // address range
                                                    --per-thread uname

Additional Trace Options
------------------------
Additional options can be used during trace collection that add information to the captured trace.

- Timestamps: These packets are added to the trace streams to allow correlation of different sources where tools support this.
- Cycle Counts: These packets are added to get a count of cycles for blocks of executed instructions. Adding cycle counts will considerably increase the amount of generated trace.
The relationship between cycle counts and executed instructions differs according to the trace protocol.
For example, the ETMv4 protocol will emit counts for groups of instructions according to a minimum count threshold.
Presently this threshold is fixed at 256 cycles for `perf record`.

Command line options in `perf record` to use these features are part of the options for the `cs_etm` event:

    perf record -e cs_etm/timestamp,cycacc,@20070000.etr/ --per-thread uname

At current version,  `perf record` and `perf script` do not use this additional information.

On Target Trace Collection
--------------------------
The entire program flow will have been recorded in the `perf.data` file.
Information about libraries and executable is stored under `$HOME/.debug`:

    linaro@linaro-nano:~/kernel$ tree ~/.debug
    .debug
    ├── [kernel.kallsyms]
    │   └── 0542921808098d591a7acba5a1163e8991897669
    │       └── kallsyms
    ├── [vdso]
    │   └── 551fbbe29579eb63be3178a04c16830b8d449769
    │       └── vdso
    ├── bin
    │   └── uname
    │       └── ed95e81f97c4471fb2ccc21e356b780eb0c92676
    │           └── elf
    └── lib
        └── aarch64-linux-gnu
            ├── ld-2.21.so
            │   └── 94912dc5a1dc8c7ef2c4e4649d4b1639b6ebc8b7
            │       └── elf
            └── libc-2.21.so
                └── 169a143e9c40cfd9d09695333e45fd67743cd2d6
                    └── elf

    13 directories, 5 files
    linaro@linaro-nano:~/kernel$


All this information needs to be collected in order to successfully decode
traces off target:

    linaro@linaro-nano:~/kernel$ tar czf uname.trace.tgz perf.data ~/.debug


Note that file `vmlinux` should also be added to the bundle if kernel traces
have also been collected.


Off Target OpenCSD Compilation
------------------------------
The openCSD library is not part of the perf tools.  It is available on
[github][1] and needs to be compiled before the perf tools. Checkout the
required branch/tag version into a local directory.

    linaro@t430:~/linaro/coresight$ git clone -b v0.8 https://github.com/Linaro/OpenCSD.git my-opencsd
    Cloning into 'OpenCSD'...
    remote: Counting objects: 2063, done.
    remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063
    Receiving objects: 100% (2063/2063), 2.51 MiB | 1.24 MiB/s, done.
    Resolving deltas: 100% (1399/1399), done.
    Checking connectivity... done.
    linaro@t430:~/linaro/coresight$ ls my-opencsd
    decoder LICENSE  README.md HOWTO.md TODO

Once the source code has been acquired compilation of the openCSD library can
take place.  For Linux two options are available, LINUX and LINUX64, based on
the host's (which has nothing to do with the target) architecture:

    linaro@t430:~/linaro/coresight/$ cd my-opencsd/decoder/build/linux/
    linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls
    makefile  rctdl_c_api_lib  ref_trace_decode_lib

    linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ make LINUX64=1 DEBUG=1
    ...
    ...

    linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls ../../lib/linux64/dbg/
    libopencsd.a  libopencsd_c_api.a  libopencsd_c_api.so  libopencsd.so

From there the header file and libraries need to be installed on the system,
something that requires root privileges.  The default installation path is
/usr/include/opencsd for the header files and /usr/lib/ for the libraries:

    linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ sudo make install
    linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/include/opencsd
    total 60
    drwxr-xr-x 2 root root  4096 Dec 12 10:19 c_api
    drwxr-xr-x 2 root root  4096 Dec 12 10:19 etmv3
    drwxr-xr-x 2 root root  4096 Dec 12 10:19 etmv4
    -rw-r--r-- 1 root root 28049 Dec 12 10:19 ocsd_if_types.h
    drwxr-xr-x 2 root root  4096 Dec 12 10:19 ptm
    drwxr-xr-x 2 root root  4096 Dec 12 10:19 stm
    -rw-r--r-- 1 root root  7264 Dec 12 10:19 trc_gen_elem_types.h
    -rw-r--r-- 1 root root  3972 Dec 12 10:19 trc_pkt_types.h

    linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/lib/libopencsd*
    -rw-r--r-- 1 root root  598720 Dec 12 10:19 /usr/lib/libopencsd_c_api.so
    -rw-r--r-- 1 root root 4692200 Dec 12 10:19 /usr/lib/libopencsd.so

A "clean_install" target is also available so that openCSD installed files can
be removed from a system.  Going forward the goal is to have the openCSD library
packaged as a Debian or RPM archive so that it can be installed from a
distribution without having to be compiled.


Off Target Perf Tools Compilation
---------------------------------
As mentionned above the openCSD library is not part of the perf tools' code base
and needs to be installed on a system prior to compilation.  Information about
the status of the openCSD library on a system is given at compile time by the
perf tools build script:

    linaro@t430:~/linaro/linux-kernel$ make VF=1 -C tools/perf
    Auto-detecting system features:
    ...                         dwarf: [ on  ]
    ...            dwarf_getlocations: [ on  ]
    ...                         glibc: [ on  ]
    ...                          gtk2: [ on  ]
    ...                      libaudit: [ on  ]
    ...                        libbfd: [ OFF ]
    ...                        libelf: [ on  ]
    ...                       libnuma: [ OFF ]
    ...        numa_num_possible_cpus: [ OFF ]
    ...                       libperl: [ on  ]
    ...                     libpython: [ on  ]
    ...                      libslang: [ on  ]
    ...                     libcrypto: [ on  ]
    ...                     libunwind: [ OFF ]
    ...            libdw-dwarf-unwind: [ on  ]
    ...                          zlib: [ on  ]
    ...                          lzma: [ OFF ]
    ...                     get_cpuid: [ on  ]
    ...                           bpf: [ on  ]
    ...                    libopencsd: [ on  ]  <-------


At the end of the compilation a new perf binary is available in `tools/perf/`:

    linaro@t430:~/linaro/linux-kernel$ ldd tools/perf/perf
	linux-vdso.so.1 =>  (0x00007fff135db000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f15f9176000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f15f8f6e000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f15f8c64000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f15f8a60000)
	libopencsd_c_api.so => /usr/lib/libopencsd_c_api.so (0x00007f15f884e000)   <-------
	libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007f15f8635000)
	libdw.so.1 => /usr/lib/x86_64-linux-gnu/libdw.so.1 (0x00007f15f83ec000)
	libaudit.so.1 => /lib/x86_64-linux-gnu/libaudit.so.1 (0x00007f15f81c5000)
	libslang.so.2 => /lib/x86_64-linux-gnu/libslang.so.2 (0x00007f15f7e38000)
	libperl.so.5.22 => /usr/lib/x86_64-linux-gnu/libperl.so.5.22 (0x00007f15f7a5d000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f15f7693000)
	libpython2.7.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 (0x00007f15f7104000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f15f6eea000)
	/lib64/ld-linux-x86-64.so.2 (0x0000559b88038000)
	libopencsd.so => /usr/lib/libopencsd.so (0x00007f15f6c62000)    <-------
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f15f68df000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f15f66c9000)
	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f15f64a6000)
	libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f15f6296000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f15f605e000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f15f5e5a000)


Additional debug output from the decoder can be compiled in by setting the
`CSTRACE_RAW` environment variable. Setting this to `packed` gets trace frame
output as follows:-

    Frame Data; Index    576;    RAW_PACKED; d6 d6 d6 d6 d6 d6 d6 d6 fc fb d6 d6 d6 d6 e0 7f 
    Frame Data; Index    576;   ID_DATA[0x14]; d7 d6 d7 d6 d7 d6 d7 d6 fd fb d7 d6 d7 d6 e0
    
Set to any other value will remove the RAW_PACKED lines.

Working with a debug version of the openCSD library
---------------------------------------------------
When compiling the perf tools it is possible to reference another version of
the openCSD library than the one installed on the system.  This is useful when
working with multiple development trees or having the desire to keep system
libraries intact.  Two environment variable are available to tell the perf tools
build script where to get the header file and libraries, namely CSINCLUDES and
CSLIBS:

    linaro@t430:~/linaro/linux-kernel$ export CSINCLUDES=~/linaro/coresight/my-opencsd/decoder/include/
    linaro@t430:~/linaro/linux-kernel$ export CSLIBS=~/linaro/coresight/my-opencsd/decoder/lib/linux64-rel/
    linaro@t430:~/linaro/linux-kernel$ make VF=1 -C tools/perf

This will have the effect of compiling and linking against the provided library.
Since the system's openCSD library is in the loader's search patch the
LD_LIBRARY_PATH environment variable needs to be set.

    linaro@t430:~/linaro/linux-kernel$ export LD_LIBRARY_PATH=$CSLIBS


Trace Decoding with Perf Report
-------------------------------
Before working with custom traces it is suggested to use a trace bundle that
is known to be working properly.  A sample bundle has been made available
here [2].  Trace bundles can be extracted anywhere and have no dependencies on
where the perf tools and openCSD library have been compiled. 

    linaro@t430:~/linaro/coresight$ mkdir sept20
    linaro@t430:~/linaro/coresight$ cd sept20
    linaro@t430:~/linaro/coresight/sept20$ wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
    linaro@t430:~/linaro/coresight/sept20$ md5sum uname.v4.user.sept20.tgz
    f53f11d687ce72bdbe9de2e67e960ec6  uname.v4.user.sept20.tgz
    linaro@t430:~/linaro/coresight/sept20$ tar xf uname.v4.user.sept20.tgz
    linaro@t430:~/linaro/coresight/sept20$ ls -la
    total 1312
    drwxrwxr-x 3 linaro linaro    4096 Mar  3 10:26 .
    drwxrwxr-x 5 linaro linaro    4096 Mar  3 10:13 ..
    drwxr-xr-x 7 linaro linaro    4096 Feb 24 12:21 .debug
    -rw------- 1 linaro linaro   78016 Feb 24 12:21 perf.data
    -rw-rw-r-- 1 linaro linaro 1245881 Feb 24 12:25 uname.v4.user.sept20.tgz

Perf is expecting files related to the trace capture (`perf.data`) to be located
under `~/.debug` [3].  This example will remove the current `~/.debug` directory
to be sure everything is clean.  

    linaro@t430:~/linaro/coresight/sept20$ rm -rf ~/.debug
    linaro@t430:~/linaro/coresight/sept20$ cp -dpR .debug ~/
    linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio

    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 0  of event 'cs_etm//u'
    # Event count (approx.): 0
    #
    # Children      Self  Command  Shared Object  Symbol
    # ........  ........  .......  .............  ......
    #


    # Samples: 0  of event 'dummy:u'
    # Event count (approx.): 0
    #
    # Children      Self  Command  Shared Object  Symbol
    # ........  ........  .......  .............  ......
    #


    # Samples: 115K of event 'instructions:u'
    # Event count (approx.): 522009
    #
    # Children      Self  Command  Shared Object     Symbol                
    # ........  ........  .......  ................  ......................
    #
         4.13%     4.13%  uname    libc-2.21.so      [.] 0x0000000000078758
         3.81%     3.81%  uname    libc-2.21.so      [.] 0x0000000000078e50
         2.06%     2.06%  uname    libc-2.21.so      [.] 0x00000000000fcaf4
         1.65%     1.65%  uname    libc-2.21.so      [.] 0x00000000000fcae4
         1.59%     1.59%  uname    ld-2.21.so        [.] 0x000000000000a7f4
         1.50%     1.50%  uname    libc-2.21.so      [.] 0x0000000000078e40
         1.43%     1.43%  uname    libc-2.21.so      [.] 0x00000000000fcac4
         1.31%     1.31%  uname    libc-2.21.so      [.] 0x000000000002f0c0
         1.26%     1.26%  uname    ld-2.21.so        [.] 0x0000000000016888
         1.24%     1.24%  uname    libc-2.21.so      [.] 0x0000000000078e7c 
         1.24%     1.24%  uname    libc-2.21.so      [.] 0x00000000000fcab8
    ...

Additional data can be obtained, which contains a dump of the trace packets received using the command 

    mjl@ubuntu-vbox:./perf-opencsd-master/coresight/tools/perf/perf report --stdio --dump

resulting a large amount of data, trace looking like:-

    0x618 [0x30]: PERF_RECORD_AUXTRACE size: 0x11ef0  offset: 0  ref: 0x4d881c1f13216016  idx: 0  tid: 15244  cpu: -1

    . ... CoreSight ETM Trace data: size 73456 bytes

      0: I_ASYNC : Alignment Synchronisation.
      12: I_TRACE_INFO : Trace Info.
      17: I_TRACE_ON : Trace On.
      18: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F24D80; Ctxt: AArch64,EL0, NS; 
      28: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
      29: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
      30: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
      32: I_ATOM_F6 : Atom format 6.; EEEEN
      33: I_ATOM_F1 : Atom format 1.; E
      34: I_EXCEPT : Exception.;  Data Fault; Ret Addr Follows;
      36: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; 
      45: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS; 
      56: I_TRACE_ON : Trace On.
      57: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; Ctxt: AArch64,EL0, NS; 
      68: I_ATOM_F3 : Atom format 3.; NEE
      69: I_ATOM_F3 : Atom format 3.; NEN
      70: I_ATOM_F3 : Atom format 3.; NNE
      71: I_ATOM_F5 : Atom format 5.; ENENE
      72: I_ATOM_F5 : Atom format 5.; NENEN
      73: I_ATOM_F5 : Atom format 5.; ENENE
      74: I_ATOM_F5 : Atom format 5.; NENEN
      75: I_ATOM_F5 : Atom format 5.; ENENE
      76: I_ATOM_F3 : Atom format 3.; NNE
      77: I_ATOM_F3 : Atom format 3.; NNE
      78: I_ATOM_F3 : Atom format 3.; NNE
      80: I_ATOM_F3 : Atom format 3.; NNE
      81: I_ATOM_F3 : Atom format 3.; ENN
      82: I_EXCEPT : Exception.;  Data Fault; Ret Addr Follows;
      84: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0; 
      93: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS; 
      104: I_TRACE_ON : Trace On.
      105: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0; Ctxt: AArch64,EL0, NS; 
      116: I_ATOM_F5 : Atom format 5.; NNNNN
      117: I_ATOM_F5 : Atom format 5.; NNNNN


Trace Decoding with Perf Script 
-------------------------------
Working with perf scripts needs more command line options but yields
interesting results.

    linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/
    linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
    linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
    linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump

              7f89f24d80:   910003e0        mov     x0, sp
              7f89f24d84:   94000d53        bl      7f89f282d0 <free@plt+0x3790>
              7f89f282d0:   d11203ff        sub     sp, sp, #0x480
              7f89f282d4:   a9ba7bfd        stp     x29, x30, [sp,#-96]!
              7f89f282d8:   910003fd        mov     x29, sp
              7f89f282dc:   a90363f7        stp     x23, x24, [sp,#48]
              7f89f282e0:   9101e3b7        add     x23, x29, #0x78
              7f89f282e4:   a90573fb        stp     x27, x28, [sp,#80]
              7f89f282e8:   a90153f3        stp     x19, x20, [sp,#16]
              7f89f282ec:   aa0003fb        mov     x27, x0
              7f89f282f0:   910a82e1        add     x1, x23, #0x2a0
              7f89f282f4:   a9025bf5        stp     x21, x22, [sp,#32]
              7f89f282f8:   a9046bf9        stp     x25, x26, [sp,#64]
              7f89f282fc:   910102e0        add     x0, x23, #0x40
              7f89f28300:   f800841f        str     xzr, [x0],#8
              7f89f28304:   eb01001f        cmp     x0, x1
              7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>
              7f89f28300:   f800841f        str     xzr, [x0],#8
              7f89f28304:   eb01001f        cmp     x0, x1
              7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>
              7f89f28300:   f800841f        str     xzr, [x0],#8
              7f89f28304:   eb01001f        cmp     x0, x1
              7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>

Kernel Trace Decoding
---------------------

When dealing with kernel space traces the vmlinux file has to be communicated
explicitely to perf using the "--vmlinux" command line option:

    linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio --vmlinux=./vmlinux
    ...
    ...
    linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf script --vmlinux=./vmlinux

When using scripts things get a little more convoluted.  Using the same example
an above but for traces but for kernel traces, the command line becomes:

    linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/
    linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
    linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
    linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script	\
							--vmlinux=./vmlinux					\
							--script=python:${SCRIPT_PATH}/cs-trace-disasm.py --	\
							-d ${XTOOLS_PATH}/aarch64-linux-gnu-objdump		\
							-k ./vmlinux
    ...
    ...

The option "--vmlinux=./vmlinux" is interpreted by the "perf script" command
the same way it if for "perf report".  The option "-k ./vmlinux" is dependant
on the script being executed and has no related to the "--vmlinux", though it
is highly advised to keep them synchronized.


Perf Test Environment Scripts
-----------------------------

The decoder library comes with a number of `bash` scripts that ease the setting up of the
offline build and test environment for perf, and executing tests. 

These scripts can be found in

    decoder/tests/perf-test-scripts

There are three scripts provided:

- `perf-setup-env.bash`    : this sets up all the environment variables mentioned above.
- `perf-test-report.bash`  : this runs `perf report` - using the environment setup by `perf-setup-env.bash`
- `perf-test-script.bash`  : this runs `perf script` - using the environment setup by `perf-setup-env.bash`

Use as follows:-

1. Prior to building perf, edit `perf-setup-env.bash` to conform to your environment. There are four lines at the top of the file that will require editing.

2. Execute the script using the command

        source perf-setup-env.bash

   This will set up all the environment variables mentioned in the sections on building and running
   perf above, and these are used by the `perf-test...` scripts to run the tests.

3. Build perf as described above.
4. Follow the instructions for downloading the test capture, or create a capture from your target.
5. Copy the `perf-test...` scripts into the capture data directory -> the one that contains `perf.data`.

6. The scripts can now be run. No options are required for the default operation, but any command line options will be added to the perf report / perf script command line.

e.g.

        ./perf-test-report.bash --dump 

will add the --dump option to the end of the command line and run

        ${PERF_EXEC_PATH}/perf report --stdio --dump


Generating coverage files for Feedback Directed Optimization: AutoFDO
---------------------------------------------------------------------

Below is an example of using ARM ETM for autoFDO. The updates to the perf
support for this is experimental and available on the 'autoFDO' branch of
the [perf-opencsd github repository][1].

It also requires autofdo (https://github.com/google/autofdo) and gcc version 5.  The bubble
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).

        $ gcc-5 -O3 sort.c -o sort_optimized
        $ taskset -c 2 ./sort_optimized
        Bubble sorting array of 30000 elements
        5910 ms

        $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
        Bubble sorting array of 30000 elements
        12543 ms
        [ perf record: Woken up 35 times to write data ]
        [ perf record: Captured and wrote 69.640 MB perf.data ]

        $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
        $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
        $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
        $ taskset -c 2 ./sort_autofdo
        Bubble sorting array of 30000 elements
        5806 ms


The Linaro CoreSight Team
-------------------------
- Mike Leach
- Tor Jeremiassen
- Chunyan Zang
- Mathieu Poirier


One Last Thing
--------------
We welcome help on this project.  If you would like to add features or help
improve the way things work, we want to hear from you.

Best regards,
*The Linaro CoreSight Team*

--------------------------------------
[1]: https://github.com/Linaro/perf-opencsd "perf-opencsd Github"

[2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
 
[3]: Get in touch with us if you know a way to change this.