Documentation / trace / coresight / coresight.rst


Based on kernel version 5.16. Page generated on 2022-01-10 18:31 EST.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645
======================================
Coresight - HW Assisted Tracing on ARM
======================================

   :Author:   Mathieu Poirier <mathieu.poirier@linaro.org>
   :Date:     September 11th, 2014

Introduction
------------

Coresight is an umbrella of technologies allowing for the debugging of ARM
based SoC.  It includes solutions for JTAG and HW assisted tracing.  This
document is concerned with the latter.

HW assisted tracing is becoming increasingly useful when dealing with systems
that have many SoCs and other components like GPU and DMA engines.  ARM has
developed a HW assisted tracing solution by means of different components, each
being added to a design at synthesis time to cater to specific tracing needs.
Components are generally categorised as source, link and sinks and are
(usually) discovered using the AMBA bus.

"Sources" generate a compressed stream representing the processor instruction
path based on tracing scenarios as configured by users.  From there the stream
flows through the coresight system (via ATB bus) using links that are connecting
the emanating source to a sink(s).  Sinks serve as endpoints to the coresight
implementation, either storing the compressed stream in a memory buffer or
creating an interface to the outside world where data can be transferred to a
host without fear of filling up the onboard coresight memory buffer.

At typical coresight system would look like this::

  *****************************************************************
 **************************** AMBA AXI  ****************************===||
  *****************************************************************    ||
        ^                    ^                            |            ||
        |                    |                            *            **
     0000000    :::::     0000000    :::::    :::::    @@@@@@@    ||||||||||||
     0 CPU 0<-->: C :     0 CPU 0<-->: C :    : C :    @ STM @    || System ||
  |->0000000    : T :  |->0000000    : T :    : T :<--->@@@@@     || Memory ||
  |  #######<-->: I :  |  #######<-->: I :    : I :      @@@<-|   ||||||||||||
  |  # ETM #    :::::  |  # PTM #    :::::    :::::       @   |
  |   #####      ^ ^   |   #####      ^ !      ^ !        .   |   |||||||||
  | |->###       | !   | |->###       | !      | !        .   |   || DAP ||
  | |   #        | !   | |   #        | !      | !        .   |   |||||||||
  | |   .        | !   | |   .        | !      | !        .   |      |  |
  | |   .        | !   | |   .        | !      | !        .   |      |  *
  | |   .        | !   | |   .        | !      | !        .   |      | SWD/
  | |   .        | !   | |   .        | !      | !        .   |      | JTAG
  *****************************************************************<-|
 *************************** AMBA Debug APB ************************
  *****************************************************************
   |    .          !         .          !        !        .    |
   |    .          *         .          *        *        .    |
  *****************************************************************
 ******************** Cross Trigger Matrix (CTM) *******************
  *****************************************************************
   |    .     ^              .                            .    |
   |    *     !              *                            *    |
  *****************************************************************
 ****************** AMBA Advanced Trace Bus (ATB) ******************
  *****************************************************************
   |          !                        ===============         |
   |          *                         ===== F =====<---------|
   |   :::::::::                         ==== U ====
   |-->:: CTI ::<!!                       === N ===
   |   :::::::::  !                        == N ==
   |    ^         *                        == E ==
   |    !  &&&&&&&&&       IIIIIII         == L ==
   |------>&& ETB &&<......II     I        =======
   |    !  &&&&&&&&&       II     I           .
   |    !                    I     I          .
   |    !                    I REP I<..........
   |    !                    I     I
   |    !!>&&&&&&&&&       II     I           *Source: ARM ltd.
   |------>& TPIU  &<......II    I            DAP = Debug Access Port
           &&&&&&&&&       IIIIIII            ETM = Embedded Trace Macrocell
               ;                              PTM = Program Trace Macrocell
               ;                              CTI = Cross Trigger Interface
               *                              ETB = Embedded Trace Buffer
          To trace port                       TPIU= Trace Port Interface Unit
                                              SWD = Serial Wire Debug

While on target configuration of the components is done via the APB bus,
all trace data are carried out-of-band on the ATB bus.  The CTM provides
a way to aggregate and distribute signals between CoreSight components.

The coresight framework provides a central point to represent, configure and
manage coresight devices on a platform.  This first implementation centers on
the basic tracing functionality, enabling components such ETM/PTM, funnel,
replicator, TMC, TPIU and ETB.  Future work will enable more
intricate IP blocks such as STM and CTI.


Acronyms and Classification
---------------------------

Acronyms:

PTM:
    Program Trace Macrocell
ETM:
    Embedded Trace Macrocell
STM:
    System trace Macrocell
ETB:
    Embedded Trace Buffer
ITM:
    Instrumentation Trace Macrocell
TPIU:
     Trace Port Interface Unit
TMC-ETR:
        Trace Memory Controller, configured as Embedded Trace Router
TMC-ETF:
        Trace Memory Controller, configured as Embedded Trace FIFO
CTI:
    Cross Trigger Interface

Classification:

Source:
   ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
Link:
   Funnel, replicator (intelligent or not), TMC-ETR
Sinks:
   ETBv1.0, ETB1.1, TPIU, TMC-ETF
Misc:
   CTI


Device Tree Bindings
--------------------

See Documentation/devicetree/bindings/arm/coresight.txt for details.

As of this writing drivers for ITM, STMs and CTIs are not provided but are
expected to be added as the solution matures.


Framework and implementation
----------------------------

The coresight framework provides a central point to represent, configure and
manage coresight devices on a platform.  Any coresight compliant device can
register with the framework for as long as they use the right APIs:

.. c:function:: struct coresight_device *coresight_register(struct coresight_desc *desc);
.. c:function:: void coresight_unregister(struct coresight_device *csdev);

The registering function is taking a ``struct coresight_desc *desc`` and
register the device with the core framework. The unregister function takes
a reference to a ``struct coresight_device *csdev`` obtained at registration time.

If everything goes well during the registration process the new devices will
show up under /sys/bus/coresight/devices, as showns here for a TC2 platform::

    root:~# ls /sys/bus/coresight/devices/
    replicator  20030000.tpiu    2201c000.ptm  2203c000.etm  2203e000.etm
    20010000.etb         20040000.funnel  2201d000.ptm  2203d000.etm
    root:~#

The functions take a ``struct coresight_device``, which looks like this::

    struct coresight_desc {
            enum coresight_dev_type type;
            struct coresight_dev_subtype subtype;
            const struct coresight_ops *ops;
            struct coresight_platform_data *pdata;
            struct device *dev;
            const struct attribute_group **groups;
    };


The "coresight_dev_type" identifies what the device is, i.e, source link or
sink while the "coresight_dev_subtype" will characterise that type further.

The ``struct coresight_ops`` is mandatory and will tell the framework how to
perform base operations related to the components, each component having
a different set of requirement. For that ``struct coresight_ops_sink``,
``struct coresight_ops_link`` and ``struct coresight_ops_source`` have been
provided.

The next field ``struct coresight_platform_data *pdata`` is acquired by calling
``of_get_coresight_platform_data()``, as part of the driver's _probe routine and
``struct device *dev`` gets the device reference embedded in the ``amba_device``::

    static int etm_probe(struct amba_device *adev, const struct amba_id *id)
    {
     ...
     ...
     drvdata->dev = &adev->dev;
     ...
    }

Specific class of device (source, link, or sink) have generic operations
that can be performed on them (see ``struct coresight_ops``). The ``**groups``
is a list of sysfs entries pertaining to operations
specific to that component only.  "Implementation defined" customisations are
expected to be accessed and controlled using those entries.

Device Naming scheme
--------------------

The devices that appear on the "coresight" bus were named the same as their
parent devices, i.e, the real devices that appears on AMBA bus or the platform bus.
Thus the names were based on the Linux Open Firmware layer naming convention,
which follows the base physical address of the device followed by the device
type. e.g::

    root:~# ls /sys/bus/coresight/devices/
     20010000.etf  20040000.funnel      20100000.stm     22040000.etm
     22140000.etm  230c0000.funnel      23240000.etm     20030000.tpiu
     20070000.etr  20120000.replicator  220c0000.funnel
     23040000.etm  23140000.etm         23340000.etm

However, with the introduction of ACPI support, the names of the real
devices are a bit cryptic and non-obvious. Thus, a new naming scheme was
introduced to use more generic names based on the type of the device. The
following rules apply::

  1) Devices that are bound to CPUs, are named based on the CPU logical
     number.

     e.g, ETM bound to CPU0 is named "etm0"

  2) All other devices follow a pattern, "<device_type_prefix>N", where :

	<device_type_prefix> 	- A prefix specific to the type of the device
	N			- a sequential number assigned based on the order
				  of probing.

	e.g, tmc_etf0, tmc_etr0, funnel0, funnel1

Thus, with the new scheme the devices could appear as ::

    root:~# ls /sys/bus/coresight/devices/
     etm0     etm1     etm2         etm3  etm4      etm5      funnel0
     funnel1  funnel2  replicator0  stm0  tmc_etf0  tmc_etr0  tpiu0

Some of the examples below might refer to old naming scheme and some
to the newer scheme, to give a confirmation that what you see on your
system is not unexpected. One must use the "names" as they appear on
the system under specified locations.

Topology Representation
-----------------------

Each CoreSight component has a ``connections`` directory which will contain
links to other CoreSight components. This allows the user to explore the trace
topology and for larger systems, determine the most appropriate sink for a
given source. The connection information can also be used to establish
which CTI devices are connected to a given component. This directory contains a
``nr_links`` attribute detailing the number of links in the directory.

For an ETM source, in this case ``etm0`` on a Juno platform, a typical
arrangement will be::

  linaro-developer:~# ls - l /sys/bus/coresight/devices/etm0/connections
  <file details>  cti_cpu0 -> ../../../23020000.cti/cti_cpu0
  <file details>  nr_links
  <file details>  out:0 -> ../../../230c0000.funnel/funnel2

Following the out port to ``funnel2``::

  linaro-developer:~# ls -l /sys/bus/coresight/devices/funnel2/connections
  <file details> in:0 -> ../../../23040000.etm/etm0
  <file details> in:1 -> ../../../23140000.etm/etm3
  <file details> in:2 -> ../../../23240000.etm/etm4
  <file details> in:3 -> ../../../23340000.etm/etm5
  <file details> nr_links
  <file details> out:0 -> ../../../20040000.funnel/funnel0

And again to ``funnel0``::

  linaro-developer:~# ls -l /sys/bus/coresight/devices/funnel0/connections
  <file details> in:0 -> ../../../220c0000.funnel/funnel1
  <file details> in:1 -> ../../../230c0000.funnel/funnel2
  <file details> nr_links
  <file details> out:0 -> ../../../20010000.etf/tmc_etf0

Finding the first sink ``tmc_etf0``. This can be used to collect data
as a sink, or as a link to propagate further along the chain::

  linaro-developer:~# ls -l /sys/bus/coresight/devices/tmc_etf0/connections
  <file details> cti_sys0 -> ../../../20020000.cti/cti_sys0
  <file details> in:0 -> ../../../20040000.funnel/funnel0
  <file details> nr_links
  <file details> out:0 -> ../../../20150000.funnel/funnel4

via ``funnel4``::

  linaro-developer:~# ls -l /sys/bus/coresight/devices/funnel4/connections
  <file details> in:0 -> ../../../20010000.etf/tmc_etf0
  <file details> in:1 -> ../../../20140000.etf/tmc_etf1
  <file details> nr_links
  <file details> out:0 -> ../../../20120000.replicator/replicator0

and a ``replicator0``::

  linaro-developer:~# ls -l /sys/bus/coresight/devices/replicator0/connections
  <file details> in:0 -> ../../../20150000.funnel/funnel4
  <file details> nr_links
  <file details> out:0 -> ../../../20030000.tpiu/tpiu0
  <file details> out:1 -> ../../../20070000.etr/tmc_etr0

Arriving at the final sink in the chain, ``tmc_etr0``::

  linaro-developer:~# ls -l /sys/bus/coresight/devices/tmc_etr0/connections
  <file details> cti_sys0 -> ../../../20020000.cti/cti_sys0
  <file details> in:0 -> ../../../20120000.replicator/replicator0
  <file details> nr_links

As described below, when using sysfs it is sufficient to enable a sink and
a source for successful trace. The framework will correctly enable all
intermediate links as required.

Note: ``cti_sys0`` appears in two of the connections lists above.
CTIs can connect to multiple devices and are arranged in a star topology
via the CTM. See (Documentation/trace/coresight/coresight-ect.rst)
[#fourth]_ for further details.
Looking at this device we see 4 connections::

  linaro-developer:~# ls -l /sys/bus/coresight/devices/cti_sys0/connections
  <file details> nr_links
  <file details> stm0 -> ../../../20100000.stm/stm0
  <file details> tmc_etf0 -> ../../../20010000.etf/tmc_etf0
  <file details> tmc_etr0 -> ../../../20070000.etr/tmc_etr0
  <file details> tpiu0 -> ../../../20030000.tpiu/tpiu0


How to use the tracer modules
-----------------------------

There are two ways to use the Coresight framework:

1. using the perf cmd line tools.
2. interacting directly with the Coresight devices using the sysFS interface.

Preference is given to the former as using the sysFS interface
requires a deep understanding of the Coresight HW.  The following sections
provide details on using both methods.

1) Using the sysFS interface:

Before trace collection can start, a coresight sink needs to be identified.
There is no limit on the amount of sinks (nor sources) that can be enabled at
any given moment.  As a generic operation, all device pertaining to the sink
class will have an "active" entry in sysfs::

    root:/sys/bus/coresight/devices# ls
    replicator  20030000.tpiu    2201c000.ptm  2203c000.etm  2203e000.etm
    20010000.etb         20040000.funnel  2201d000.ptm  2203d000.etm
    root:/sys/bus/coresight/devices# ls 20010000.etb
    enable_sink  status  trigger_cntr
    root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
    root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
    1
    root:/sys/bus/coresight/devices#

At boot time the current etm3x driver will configure the first address
comparator with "_stext" and "_etext", essentially tracing any instruction
that falls within that range.  As such "enabling" a source will immediately
trigger a trace capture::

    root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
    root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
    1
    root:/sys/bus/coresight/devices# cat 20010000.etb/status
    Depth:          0x2000
    Status:         0x1
    RAM read ptr:   0x0
    RAM wrt ptr:    0x19d3   <----- The write pointer is moving
    Trigger cnt:    0x0
    Control:        0x1
    Flush status:   0x0
    Flush ctrl:     0x2001
    root:/sys/bus/coresight/devices#

Trace collection is stopped the same way::

    root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
    root:/sys/bus/coresight/devices#

The content of the ETB buffer can be harvested directly from /dev::

    root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
    of=~/cstrace.bin
    64+0 records in
    64+0 records out
    32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
    root:/sys/bus/coresight/devices#

The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.

Following is a DS-5 output of an experimental loop that increments a variable up
to a certain value.  The example is simple and yet provides a glimpse of the
wealth of possibilities that coresight provides.
::

    Info                                    Tracing enabled
    Instruction     106378866       0x8026B53C      E52DE004        false   PUSH     {lr}
    Instruction     0       0x8026B540      E24DD00C        false   SUB      sp,sp,#0xc
    Instruction     0       0x8026B544      E3A03000        false   MOV      r3,#0
    Instruction     0       0x8026B548      E58D3004        false   STR      r3,[sp,#4]
    Instruction     0       0x8026B54C      E59D3004        false   LDR      r3,[sp,#4]
    Instruction     0       0x8026B550      E3530004        false   CMP      r3,#4
    Instruction     0       0x8026B554      E2833001        false   ADD      r3,r3,#1
    Instruction     0       0x8026B558      E58D3004        false   STR      r3,[sp,#4]
    Instruction     0       0x8026B55C      DAFFFFFA        true    BLE      {pc}-0x10 ; 0x8026b54c
    Timestamp                                       Timestamp: 17106715833
    Instruction     319     0x8026B54C      E59D3004        false   LDR      r3,[sp,#4]
    Instruction     0       0x8026B550      E3530004        false   CMP      r3,#4
    Instruction     0       0x8026B554      E2833001        false   ADD      r3,r3,#1
    Instruction     0       0x8026B558      E58D3004        false   STR      r3,[sp,#4]
    Instruction     0       0x8026B55C      DAFFFFFA        true    BLE      {pc}-0x10 ; 0x8026b54c
    Instruction     9       0x8026B54C      E59D3004        false   LDR      r3,[sp,#4]
    Instruction     0       0x8026B550      E3530004        false   CMP      r3,#4
    Instruction     0       0x8026B554      E2833001        false   ADD      r3,r3,#1
    Instruction     0       0x8026B558      E58D3004        false   STR      r3,[sp,#4]
    Instruction     0       0x8026B55C      DAFFFFFA        true    BLE      {pc}-0x10 ; 0x8026b54c
    Instruction     7       0x8026B54C      E59D3004        false   LDR      r3,[sp,#4]
    Instruction     0       0x8026B550      E3530004        false   CMP      r3,#4
    Instruction     0       0x8026B554      E2833001        false   ADD      r3,r3,#1
    Instruction     0       0x8026B558      E58D3004        false   STR      r3,[sp,#4]
    Instruction     0       0x8026B55C      DAFFFFFA        true    BLE      {pc}-0x10 ; 0x8026b54c
    Instruction     7       0x8026B54C      E59D3004        false   LDR      r3,[sp,#4]
    Instruction     0       0x8026B550      E3530004        false   CMP      r3,#4
    Instruction     0       0x8026B554      E2833001        false   ADD      r3,r3,#1
    Instruction     0       0x8026B558      E58D3004        false   STR      r3,[sp,#4]
    Instruction     0       0x8026B55C      DAFFFFFA        true    BLE      {pc}-0x10 ; 0x8026b54c
    Instruction     10      0x8026B54C      E59D3004        false   LDR      r3,[sp,#4]
    Instruction     0       0x8026B550      E3530004        false   CMP      r3,#4
    Instruction     0       0x8026B554      E2833001        false   ADD      r3,r3,#1
    Instruction     0       0x8026B558      E58D3004        false   STR      r3,[sp,#4]
    Instruction     0       0x8026B55C      DAFFFFFA        true    BLE      {pc}-0x10 ; 0x8026b54c
    Instruction     6       0x8026B560      EE1D3F30        false   MRC      p15,#0x0,r3,c13,c0,#1
    Instruction     0       0x8026B564      E1A0100D        false   MOV      r1,sp
    Instruction     0       0x8026B568      E3C12D7F        false   BIC      r2,r1,#0x1fc0
    Instruction     0       0x8026B56C      E3C2203F        false   BIC      r2,r2,#0x3f
    Instruction     0       0x8026B570      E59D1004        false   LDR      r1,[sp,#4]
    Instruction     0       0x8026B574      E59F0010        false   LDR      r0,[pc,#16] ; [0x8026B58C] = 0x80550368
    Instruction     0       0x8026B578      E592200C        false   LDR      r2,[r2,#0xc]
    Instruction     0       0x8026B57C      E59221D0        false   LDR      r2,[r2,#0x1d0]
    Instruction     0       0x8026B580      EB07A4CF        true    BL       {pc}+0x1e9344 ; 0x804548c4
    Info                                    Tracing enabled
    Instruction     13570831        0x8026B584      E28DD00C        false   ADD      sp,sp,#0xc
    Instruction     0       0x8026B588      E8BD8000        true    LDM      sp!,{pc}
    Timestamp                                       Timestamp: 17107041535

2) Using perf framework:

Coresight tracers are represented using the Perf framework's Performance
Monitoring Unit (PMU) abstraction.  As such the perf framework takes charge of
controlling when tracing gets enabled based on when the process of interest is
scheduled.  When configured in a system, Coresight PMUs will be listed when
queried by the perf command line tool:

	linaro@linaro-nano:~$ ./perf list pmu

		List of pre-defined events (to be used in -e):

		cs_etm//                                    [Kernel PMU event]

	linaro@linaro-nano:~$

Regardless of the number of tracers available in a system (usually equal to the
amount of processor cores), the "cs_etm" PMU will be listed only once.

A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
listed along with configuration options within forward slashes '/'.  Since a
Coresight system will typically have more than one sink, the name of the sink to
work with needs to be specified as an event option.
On newer kernels the available sinks are listed in sysFS under
($SYSFS)/bus/event_source/devices/cs_etm/sinks/::

	root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls
	tmc_etf0  tmc_etr0  tpiu0

On older kernels, this may need to be found from the list of coresight devices,
available under ($SYSFS)/bus/coresight/devices/::

	root:~# ls /sys/bus/coresight/devices/
	 etm0     etm1     etm2         etm3  etm4      etm5      funnel0
	 funnel1  funnel2  replicator0  stm0  tmc_etf0  tmc_etr0  tpiu0
	root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program

As mentioned above in section "Device Naming scheme", the names of the devices could
look different from what is used in the example above. One must use the device names
as it appears under the sysFS.

The syntax within the forward slashes '/' is important.  The '@' character
tells the parser that a sink is about to be specified and that this is the sink
to use for the trace session.

More information on the above and other example on how to use Coresight with
the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
repository [#third]_.

2.1) AutoFDO analysis using the perf tools:

perf can be used to record and analyze trace of programs.

Execution can be recorded using 'perf record' with the cs_etm event,
specifying the name of the sink to record to, e.g::

    perf record -e cs_etm/@tmc_etr0/u --per-thread

The 'perf report' and 'perf script' commands can be used to analyze execution,
synthesizing instruction and branch events from the instruction trace.
'perf inject' can be used to replace the trace data with the synthesized events.
The --itrace option controls the type and frequency of synthesized events
(see perf documentation).

Note that only 64-bit programs are currently supported - further work is
required to support instruction decode of 32-bit Arm programs.

2.2) Tracing PID

The kernel can be built to write the PID value into the PE ContextID registers.
For a kernel running at EL1, the PID is stored in CONTEXTIDR_EL1.  A PE may
implement Arm Virtualization Host Extensions (VHE), which the kernel can
run at EL2 as a virtualisation host; in this case, the PID value is stored in
CONTEXTIDR_EL2.

perf provides PMU formats that program the ETM to insert these values into the
trace data; the PMU formats are defined as below:

  "contextid1": Available on both EL1 kernel and EL2 kernel.  When the
                kernel is running at EL1, "contextid1" enables the PID
                tracing; when the kernel is running at EL2, this enables
                tracing the PID of guest applications.

  "contextid2": Only usable when the kernel is running at EL2.  When
                selected, enables PID tracing on EL2 kernel.

  "contextid":  Will be an alias for the option that enables PID
                tracing.  I.e,
                contextid == contextid1, on EL1 kernel.
                contextid == contextid2, on EL2 kernel.

perf will always enable PID tracing at the relevant EL, this is accomplished by
automatically enable the "contextid" config - but for EL2 it is possible to make
specific adjustments using configs "contextid1" and "contextid2", E.g. if a user
wants to trace PIDs for both host and guest, the two configs "contextid1" and
"contextid2" can be set at the same time:

  perf record -e cs_etm/contextid1,contextid2/u -- vm


Generating coverage files for Feedback Directed Optimization: AutoFDO
---------------------------------------------------------------------

'perf inject' accepts the --itrace option in which case tracing data is
removed and replaced with the synthesized events. e.g.
::

	perf inject --itrace --strip -i perf.data -o perf.data.new

Below is an example of using ARM ETM for autoFDO.  It requires autofdo
(https://github.com/google/autofdo) and gcc version 5.  The bubble
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
::

	$ gcc-5 -O3 sort.c -o sort
	$ taskset -c 2 ./sort
	Bubble sorting array of 30000 elements
	5910 ms

	$ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort
	Bubble sorting array of 30000 elements
	12543 ms
	[ perf record: Woken up 35 times to write data ]
	[ perf record: Captured and wrote 69.640 MB perf.data ]

	$ perf inject -i perf.data -o inj.data --itrace=il64 --strip
	$ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
	$ taskset -c 2 ./sort_autofdo
	Bubble sorting array of 30000 elements
	5806 ms


How to use the STM module
-------------------------

Using the System Trace Macrocell module is the same as the tracers - the only
difference is that clients are driving the trace capture rather
than the program flow through the code.

As with any other CoreSight component, specifics about the STM tracer can be
found in sysfs with more information on each entry being found in [#first]_::

    root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0
    enable_source   hwevent_select  port_enable     subsystem       uevent
    hwevent_enable  mgmt            port_select     traceid
    root@genericarmv8:~#

Like any other source a sink needs to be identified and the STM enabled before
being used::

    root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink
    root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source

From there user space applications can request and use channels using the devfs
interface provided for that purpose by the generic STM API::

    root@genericarmv8:~# ls -l /dev/stm0
    crw-------    1 root     root       10,  61 Jan  3 18:11 /dev/stm0
    root@genericarmv8:~#

Details on how to use the generic STM API can be found here:
- Documentation/trace/stm.rst [#second]_.

The CTI & CTM Modules
---------------------

The CTI (Cross Trigger Interface) provides a set of trigger signals between
individual CTIs and components, and can propagate these between all CTIs via
channels on the CTM (Cross Trigger Matrix).

A separate documentation file is provided to explain the use of these devices.
(Documentation/trace/coresight/coresight-ect.rst) [#fourth]_.

CoreSight System Configuration
------------------------------

CoreSight components can be complex devices with many programming options.
Furthermore, components can be programmed to interact with each other across the
complete system.

A CoreSight System Configuration manager is provided to allow these complex programming
configurations to be selected and used easily from perf and sysfs.

See the separate document for further information.
(Documentation/trace/coresight/coresight-config.rst) [#fifth]_.


.. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm

.. [#second] Documentation/trace/stm.rst

.. [#third] https://github.com/Linaro/perf-opencsd

.. [#fourth] Documentation/trace/coresight/coresight-ect.rst

.. [#fifth] Documentation/trace/coresight/coresight-config.rst