The common form of perf is well-known to be useful in gathering performance statistics on a running program:
bash$ perf stat -cv ./a.out
cache-misses: 11313 2020574449 2020574449
cache-references: 62031796 2020574449 2020574449
branch-misses: 17909 2020574449 2020574449
branches: 606684832 2020574449 2020574449
instructions: 6324531571 2020574449 2020574449
cycles: 6408533747 2020574449 2020574449
page-faults: 304 2019963367 2019963367
CPU-migrations: 7 2019963367 2019963367
context-switches: 205 2019963367 2019963367
task-clock-msecs: 2019963367 2019963367 2019963367
Performance counter stats for './a.out':
11313 cache-misses # 0.006 M/sec
62031796 cache-references # 30.709 M/sec
17909 branch-misses # 0.003 %
606684832 branches # 300.344 M/sec
6324531571 instructions # 0.987 IPC
6408533747 cycles # 3172.599 M/sec
304 page-faults # 0.000 M/sec
7 CPU-migrations # 0.000 M/sec
205 context-switches # 0.000 M/sec
2019.963367 task-clock-msecs # 0.996 CPUs
2.027948307 seconds time elapsed
The events to be recorded can be specified with the -e option in order to refine the output:
bash$ perf stat -e cpu-clock -e instructi
ons
Performance counter stats for './a.out':
2026.748812 cpu-clock-msecs
6324293589 instructions # 0.000 IPC
2.032519896 seconds time elapsed
A list of available events can be obtained via perf list:
bash$ perf list | head
List of pre-defined events (to be used in -e):
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
cache-references [Hardware event]
cache-misses [Hardware event]
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
The perf toolchain also includes the utility perf top, which can be used to monitor a single process, or which can be used to monitor the kernel:
bash$ sudo perf top 2>/dev/null
-------------------------------------------------------------------------------
PerfTop: 0 irqs/sec kernel:-nan% exact: -nan% [1000Hz cycles], (all, 4 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________ __________________
77.00 39.3% intel_idle [kernel.kallsyms]
13.00 6.6% __pthread_mutex_unlock libpthread-2.13.so
13.00 6.6% pthread_mutex_lock libpthread-2.13.so
12.00 6.1% __ticket_spin_lock [kernel.kallsyms]
7.00 3.6% schedule [kernel.kallsyms]
6.00 3.1% menu_select [kernel.kallsyms]
6.00 3.1% fget_light [kernel.kallsyms]
6.00 3.1% clear_page_c [kernel.kallsyms]
Where things start to get interesting, however, is with perf record. This utility is generally used along with perf report to record the performance counters of a process, and review them later.
This can be used, for example, to generate a call graph:
bash$ perf record -g -o /tmp/a.out.perf ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.148 MB /tmp/a.out.perf (~6461 samples) ]
bash$ perf report -g -i /tmp/a.out.perf
# Events: 1K cycles
#
# Overhead Command Shared Object Symbol
# ........ ............. ............. ......
#
99.90% a.out a.out [.] main
|
--- main
__libc_start_main
0.10% a.out [l2cap] [k] 0xffffffff8103804a
|
--- 0xffffffff8105f438
0xffffffff8105f675
...
Once perf data has been recorded, the perf annotate utility can be used to display a disassembly of the instructions that were executed:
bash$ perf annotate -i /tmp/a.out.perf |more
------------------------------------------------
Percent | Source code & Disassembly of a.out
------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 0000000000400554 :
0.00 : 400554: 55 push %rbp
0.00 : 400555: 48 89 e5 mov %rsp,%rbp
0.00 : 400558: 48 81 ec 30 00 0c 00 sub $0xc0030,%rsp
0.00 : 40055f: 48 8d 85 d0 ff fb ff lea -0x40030(%rbp),%rax
0.00 : 400566: ba 00 00 04 00 mov $0x40000,%edx
0.00 : 40056b: be 00 00 00 00 mov $0x0,%esi
0.00 : 400570: 48 89 c7 mov %rax,%rdi
0.00 : 400573: e8 b0 fe ff ff callq 400428 <memset@plt>
0.00 : 400578: c7 45 fc 00 00 00 04 movl $0x4000000,-0x4(%rbp)
0.00 : 400554: 55 push %rbp
0.00 : 400555: 48 89 e5 mov %rsp,%rbp
0.00 : 400558: 48 81 ec 30 00 0c 00 sub $0xc0030,%rsp
0.00 : 40055f: 48 8d 85 d0 ff fb ff lea -0x40030(%rbp),%rax
0.00 : 400566: ba 00 00 04 00 mov $0x40000,%edx
0.00 : 40056b: be 00 00 00 00 mov $0x0,%esi
0.00 : 400570: 48 89 c7 mov %rax,%rdi
0.00 : 400573: e8 b0 fe ff ff callq 400428 <memset@plt>
0.00 : 400578: c7 45 fc 00 00 00 04 movl $0x4000000,-0x4(%rbp)
...
4.21 : 4006a5: 8b 45 d0 mov -0x30(%rbp),%eax
15.54 : 4006a8: 83 c0 01 add $0x1,%eax
4.97 : 4006ab: 89 45 d0 mov %eax,-0x30(%rbp)
4.87 : 4006ae: 8b 45 d0 mov -0x30(%rbp),%eax
17.79 : 4006b1: 83 c0 01 add $0x1,%eax
4.36 : 4006b4: 89 45 d0 mov %eax,-0x30(%rbp)
4.72 : 4006b7: 48 83 45 f0 01 addq $0x1,-0x10(%rbp)
0.00 : 4006bc: 48 8b 45 f0 mov -0x10(%rbp),%rax
4.21 : 4006a5: 8b 45 d0 mov -0x30(%rbp),%eax
15.54 : 4006a8: 83 c0 01 add $0x1,%eax
4.97 : 4006ab: 89 45 d0 mov %eax,-0x30(%rbp)
4.87 : 4006ae: 8b 45 d0 mov -0x30(%rbp),%eax
17.79 : 4006b1: 83 c0 01 add $0x1,%eax
4.36 : 4006b4: 89 45 d0 mov %eax,-0x30(%rbp)
4.72 : 4006b7: 48 83 45 f0 01 addq $0x1,-0x10(%rbp)
0.00 : 4006bc: 48 8b 45 f0 mov -0x10(%rbp),%rax
...
As to be expected from Torvalds and company, the utilities include a number of options for generating parser-friendly output, limiting reporting to specified events and symbols, and so forth. Check the man pages for details.
No comments:
Post a Comment