Sunday, August 7, 2011

perf-backed disassembly

Since 2.6.31 or thereabouts, the Linux kernel has come with a built-in performance counter known as perf.

The common form of perf is well-known to be useful in gathering performance statistics on a running program:


bash$ perf stat -cv ./a.out 


cache-misses: 11313 2020574449 2020574449
cache-references: 62031796 2020574449 2020574449
branch-misses: 17909 2020574449 2020574449
branches: 606684832 2020574449 2020574449
instructions: 6324531571 2020574449 2020574449
cycles: 6408533747 2020574449 2020574449
page-faults: 304 2019963367 2019963367
CPU-migrations: 7 2019963367 2019963367
context-switches: 205 2019963367 2019963367
task-clock-msecs: 2019963367 2019963367 2019963367


 Performance counter stats for './a.out':

             11313 cache-misses             #      0.006 M/sec
          62031796 cache-references         #     30.709 M/sec
             17909 branch-misses            #      0.003 %    
         606684832 branches                 #    300.344 M/sec
        6324531571 instructions             #      0.987 IPC  
        6408533747 cycles                   #   3172.599 M/sec
               304 page-faults              #      0.000 M/sec
                 7 CPU-migrations           #      0.000 M/sec
               205 context-switches         #      0.000 M/sec
       2019.963367 task-clock-msecs         #      0.996 CPUs 


        2.027948307  seconds time elapsed

The  events to be recorded can be specified with the -e option in order to refine the output:

bash$ perf stat -e cpu-clock -e instructi
ons  

 Performance counter stats for './a.out':

       2026.748812 cpu-clock-msecs         
        6324293589 instructions             #      0.000 IPC  

        2.032519896  seconds time elapsed

A list of available events can be obtained via perf list:

bash$ perf list | head
List of pre-defined events (to be used in -e):

  cpu-cycles OR cycles                       [Hardware event]
  instructions                               [Hardware event]
  cache-references                           [Hardware event]
  cache-misses                               [Hardware event]
  branch-instructions OR branches            [Hardware event]
  branch-misses                              [Hardware event]
  bus-cycles                                 [Hardware event]


The perf toolchain also includes the utility perf top, which can be used to monitor a single process, or which can be used to monitor the kernel:

bash$ sudo perf top 2>/dev/null
-------------------------------------------------------------------------------
   PerfTop:       0 irqs/sec  kernel:-nan%  exact: -nan% [1000Hz cycles],  (all, 4 CPUs)
-------------------------------------------------------------------------------

             samples  pcnt function               DSO
             _______ _____ ______________________ __________________

               77.00 39.3% intel_idle             [kernel.kallsyms] 
               13.00  6.6% __pthread_mutex_unlock libpthread-2.13.so
               13.00  6.6% pthread_mutex_lock     libpthread-2.13.so
               12.00  6.1% __ticket_spin_lock     [kernel.kallsyms] 
                7.00  3.6% schedule               [kernel.kallsyms] 
                6.00  3.1% menu_select            [kernel.kallsyms] 
                6.00  3.1% fget_light             [kernel.kallsyms] 
                6.00  3.1% clear_page_c           [kernel.kallsyms] 


Where things start to get interesting, however, is with perf record. This utility is generally used along with perf report to record the performance counters of a process, and review them later.

This can be used, for example, to generate a call graph:

bash$  perf record -g -o /tmp/a.out.perf ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.148 MB /tmp/a.out.perf (~6461 samples) ]
bash$ perf report -g -i /tmp/a.out.perf
# Events: 1K cycles
#
# Overhead        Command  Shared Object  Symbol
# ........  .............  .............  ......
#
    99.90%          a.out  a.out          [.] main
            |
            --- main
                __libc_start_main

     0.10%          a.out  [l2cap]        [k] 0xffffffff8103804a
            |
            --- 0xffffffff8105f438
                0xffffffff8105f675
...


Once perf data has been recorded, the perf annotate utility can be used to display a disassembly of the instructions that were executed:

bash$  perf annotate -i /tmp/a.out.perf |more

------------------------------------------------
 Percent |      Source code & Disassembly of a.out
------------------------------------------------
         :
         :
         :
         :      Disassembly of section .text:
         :
         :      0000000000400554
:


    0.00 :  400554:       55                      push   %rbp
    0.00 :  400555:       48 89 e5                mov    %rsp,%rbp
    0.00 :  400558:       48 81 ec 30 00 0c 00    sub    $0xc0030,%rsp
    0.00 :  40055f:       48 8d 85 d0 ff fb ff    lea    -0x40030(%rbp),%rax
    0.00 :  400566:       ba 00 00 04 00          mov    $0x40000,%edx
    0.00 :  40056b:       be 00 00 00 00          mov    $0x0,%esi
    0.00 :  400570:       48 89 c7                mov    %rax,%rdi
    0.00 :  400573:       e8 b0 fe ff ff          callq  400428 <memset@plt>
    0.00 :  400578:       c7 45 fc 00 00 00 04    movl   $0x4000000,-0x4(%rbp)
    ...

    4.21 :  4006a5:       8b 45 d0                mov    -0x30(%rbp),%eax
   15.54 :  4006a8:       83 c0 01                add    $0x1,%eax
    4.97 :  4006ab:       89 45 d0                mov    %eax,-0x30(%rbp)
    4.87 :  4006ae:       8b 45 d0                mov    -0x30(%rbp),%eax
   17.79 :  4006b1:       83 c0 01                add    $0x1,%eax
    4.36 :  4006b4:       89 45 d0                mov    %eax,-0x30(%rbp)
    4.72 :  4006b7:       48 83 45 f0 01          addq   $0x1,-0x10(%rbp)
    0.00 :  4006bc:       48 8b 45 f0             mov    -0x10(%rbp),%rax
    ...

As to be expected from Torvalds and company, the utilities include a number of options for generating parser-friendly output, limiting reporting to specified events and symbols, and so forth. Check the man pages for details.

No comments:

Post a Comment