Performance Monitoring (PerfMon)
API
LIKWID.PerfMon.add_event_set
LIKWID.PerfMon.get_event_results
LIKWID.PerfMon.get_id_of_active_group
LIKWID.PerfMon.get_id_of_event
LIKWID.PerfMon.get_id_of_group
LIKWID.PerfMon.get_id_of_metric
LIKWID.PerfMon.get_last_metric
LIKWID.PerfMon.get_last_result
LIKWID.PerfMon.get_longinfo_of_group
LIKWID.PerfMon.get_metric
LIKWID.PerfMon.get_metric_results
LIKWID.PerfMon.get_metric_results
LIKWID.PerfMon.get_name_of_counter
LIKWID.PerfMon.get_name_of_event
LIKWID.PerfMon.get_name_of_group
LIKWID.PerfMon.get_name_of_metric
LIKWID.PerfMon.get_number_of_events
LIKWID.PerfMon.get_number_of_groups
LIKWID.PerfMon.get_number_of_metrics
LIKWID.PerfMon.get_number_of_threads
LIKWID.PerfMon.get_result
LIKWID.PerfMon.get_shortinfo_of_group
LIKWID.PerfMon.get_time_of_group
LIKWID.PerfMon.init
LIKWID.PerfMon.isgroupsupported
LIKWID.PerfMon.list_events
LIKWID.PerfMon.list_metrics
LIKWID.PerfMon.perfmon
LIKWID.PerfMon.read_counters
LIKWID.PerfMon.setup_counters
LIKWID.PerfMon.start_counters
LIKWID.PerfMon.stop_counters
LIKWID.PerfMon.supported_groups
LIKWID.PerfMon.switch_group
LIKWID.PerfMon.@perfmon
LIKWID.GroupInfoCompact
Functions
LIKWID.PerfMon.add_event_set
— Methodadd_event_set(estr) -> groupid
Add a performance group or a custom event set to the perfmon module. Returns a groupid
(starting at 1) which is required to later specify the event set.
LIKWID.PerfMon.get_event_results
— Methodget_event_results([groupid_or_groupname, eventid_or_eventname, threadid::Integer])
Retrieve the results of monitored events. Same as get_metric_results
but for raw events.
LIKWID.PerfMon.get_id_of_active_group
— MethodReturn the groupid
of the currently activate group.
LIKWID.PerfMon.get_id_of_event
— MethodGet the id of the event with the given name.
LIKWID.PerfMon.get_id_of_group
— MethodGet the id of the group with the given name.
LIKWID.PerfMon.get_id_of_metric
— MethodGet the id of the metric with the given name.
LIKWID.PerfMon.get_last_metric
— MethodReturn the derived metric result of the last measurement cycle identified by group groupid
and the indices for metric metricidx
and thread threadidx
(all starting at 1).
LIKWID.PerfMon.get_last_result
— MethodReturn the raw counter register result of the last measurement cycle identified by group groupid
and the indices for event eventidx
and thread threadidx
(all starting at 1).
LIKWID.PerfMon.get_longinfo_of_group
— MethodReturn the (long) description of a performance group with id groupid
(starts at 1).
LIKWID.PerfMon.get_metric
— MethodReturn the derived metric result of all measurements identified by group groupid
and the indices for metric metricidx
and thread threadidx
(all starting at 1).
LIKWID.PerfMon.get_metric_results
— Methodget_metric_results([groupid_or_groupname, metricid_or_metricname, threadid::Integer])
Retrieve the results of monitored metrics.
Optionally, a group, metric, and threadid can be provided to select a subset of metrics or a single metric. If given as integers, note that groupid
, metricid
, and threadid
all start at 1 and the latter enumerates the monitored cpu threads.
If no arguments are provided, a nested data structure is returned in which different levels correspond to performance groups, cpu threads, and metrics (in this order).
Examples
julia> PerfMon.get_metric_results("FLOPS_DP")
4-element Vector{OrderedDict{String, Float64}}:
OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 0.0016642799007831007, "Clock [MHz]" => 2911.9285695819794, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)
OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 1.4755564705029072, "Clock [MHz]" => 3523.1114993407705, "CPI" => 0.3950777002592585, "DP [MFLOP/s]" => 17608.069202657578)
OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 7.80437228993214e-5, "Clock [MHz]" => 2638.6244625814124, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)
OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 7.050705084934875e-5, "Clock [MHz]" => 2807.7525945849698, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)
julia> PerfMon.get_metric_results("FLOPS_DP", 2) # results of second monitored cpu thread
OrderedDict{String, Float64} with 5 entries:
"Runtime (RDTSC) [s]" => 1.13812
"Runtime unhalted [s]" => 1.47556
"Clock [MHz]" => 3523.11
"CPI" => 0.395078
"DP [MFLOP/s]" => 17608.1
julia> PerfMon.get_metric_results("FLOPS_DP", "DP [MFLOP/s]", 2)
17608.069202657578
LIKWID.PerfMon.get_metric_results
— Methodget_metric_results()
Get the metric results for all performance groups and all monitored (PerfMon.init
) cpu threads.
Returns a an OrderedDict
whose keys correspond to the performance groups and the values hold the results for all monitored cpu threads.
Examples
julia> results = PerfMon.get_metric_results()
OrderedDict{String, Vector{OrderedDict{String, Float64}}} with 1 entry:
"FLOPS_DP" => [OrderedDict("Runtime (RDTSC) [s]"=>1.13812, "Runtime unhalted [s]"=>0.00166428, "Clock [MHz]"=>291…
julia> PerfMon.get_metric_results()["FLOPS_DP"][2]["DP [MFLOP/s]"]
17608.069202657578
LIKWID.PerfMon.get_name_of_counter
— MethodReturn the name of the counter register identified by groupid
and eventidx
(both starting at 1).
LIKWID.PerfMon.get_name_of_event
— MethodReturn the name of the event identified by groupid
and eventidx
(both starting at 1).
LIKWID.PerfMon.get_name_of_group
— MethodReturn the name of the group identified by groupid
(starts at 1). If it is a custom event set, the name is set to Custom
.
LIKWID.PerfMon.get_name_of_metric
— MethodReturn the name of a derived metric identified by groupid
and metricidx
(both starting at 1).
LIKWID.PerfMon.get_number_of_events
— MethodReturn the amount of events in the given group with id groupid
(starts at 1).
LIKWID.PerfMon.get_number_of_groups
— MethodReturn the number of groups currently registered in the perfmon module.
LIKWID.PerfMon.get_number_of_metrics
— MethodReturn the amount of metrics in the given group with id groupid
(starts at 1). Always zero for custom event sets.
LIKWID.PerfMon.get_number_of_threads
— MethodReturn the number of threads initialized in the perfmon module.
LIKWID.PerfMon.get_result
— MethodReturn the raw counter register result of all measurements identified by group groupid
and the indices for event eventidx
and thread threadidx
(all starting at 1).
LIKWID.PerfMon.get_shortinfo_of_group
— MethodReturn the short information about a performance group with id groupid
(starts at 1).
LIKWID.PerfMon.get_time_of_group
— MethodReturn the measurement time for group identified by groupid
(starts at 1).
LIKWID.PerfMon.init
— Functioninit(cpuid_or_cpuids)
Initialize LIKWID's PerfMon module for the cpu threads with the given ids (starting at 0!).
LIKWID.PerfMon.isgroupsupported
— MethodChecks if the given performance group is available on the current system.
LIKWID.PerfMon.list_events
— MethodList all the events of a given group (groupid
starts at 1).
LIKWID.PerfMon.list_metrics
— MethodList all the metrics of a given group (groupid
starts at 1).
LIKWID.PerfMon.perfmon
— Methodperfmon(f, group_or_groups[; cpuids, autopin=true]) -> metrics, events
Monitor performance groups while executing the given function f
on one or multiple Julia threads. Note that
PerfMon.init
andPerfMon.finalize
are called automatically- the measurement of multiple performance groups is sequential and requires multiple executions of
f
!
The returned data structures metrics
and events
are nested and different levels correspond to performance groups, threads, and measured metrics (in this order).
Keyword arguments:
cpuids
(default: currently used CPU threads): specify the CPU threads (~ cores) to be monitoredautopin
(default:true
): automatically pin Julia threads to the CPU threads (~ cores) they are currently running on (to avoid migration and wrong results).print
(default:true
): toggle printing of result tablesfinalize
(default:true
): callPerfMon.finalize
in the end
Example
julia> using LIKWID
julia> x = rand(1000); y = rand(1000);
julia> metrics, events = perfmon("FLOPS_DP") do
x .+ y;
end;
Group: FLOPS_DP
┌───────────────────────────┬───────────┐
│ Event │ Thread 1 │
├───────────────────────────┼───────────┤
│ ACTUAL_CPU_CLOCK │ 2.32582e8 │
│ MAX_CPU_CLOCK │ 1.61685e8 │
│ RETIRED_INSTRUCTIONS │ 3.12775e8 │
│ CPU_CLOCKS_UNHALTED │ 2.29064e8 │
│ RETIRED_SSE_AVX_FLOPS_ALL │ 4964.0 │
│ MERGE │ 0.0 │
└───────────────────────────┴───────────┘
┌──────────────────────┬───────────┐
│ Metric │ Thread 1 │
├──────────────────────┼───────────┤
│ Runtime (RDTSC) [s] │ 0.0659737 │
│ Runtime unhalted [s] │ 0.0949394 │
│ Clock [MHz] │ 3524.02 │
│ CPI │ 0.732361 │
│ DP [MFLOP/s] │ 0.0752421 │
└──────────────────────┴───────────┘
julia> first(metrics["FLOPS_DP"]) # all metrics of the first Julia thread
OrderedDict{String, Float64} with 5 entries:
"Runtime (RDTSC) [s]" => 0.0659737
"Runtime unhalted [s]" => 0.0949394
"Clock [MHz]" => 3524.02
"CPI" => 0.732361
"DP [MFLOP/s]" => 0.0752421
julia> first(events["FLOPS_DP"]) # all raw events of the first Julia thread
OrderedDict{String, Float64} with 6 entries:
"ACTUAL_CPU_CLOCK" => 2.32582e8
"MAX_CPU_CLOCK" => 1.61685e8
"RETIRED_INSTRUCTIONS" => 3.12775e8
"CPU_CLOCKS_UNHALTED" => 2.29064e8
"RETIRED_SSE_AVX_FLOPS_ALL" => 4964.0
"MERGE" => 0.0
julia> metrics, events = perfmon(("FLOPS_DP", "MEM1")) do
x .+ y;
end;
Group: FLOPS_DP
┌───────────────────────────┬──────────┐
│ Event │ Thread 1 │
├───────────────────────────┼──────────┤
│ ACTUAL_CPU_CLOCK │ 85773.0 │
│ MAX_CPU_CLOCK │ 60074.0 │
│ RETIRED_INSTRUCTIONS │ 6605.0 │
│ CPU_CLOCKS_UNHALTED │ 32291.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │ 1000.0 │
│ MERGE │ 0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│ Metric │ Thread 1 │
├──────────────────────┼────────────┤
│ Runtime (RDTSC) [s] │ 9.99103e-6 │
│ Runtime unhalted [s] │ 3.50123e-5 │
│ Clock [MHz] │ 3497.79 │
│ CPI │ 4.88887 │
│ DP [MFLOP/s] │ 100.09 │
└──────────────────────┴────────────┘
Group: MEM1
┌──────────────────────┬──────────┐
│ Event │ Thread 1 │
├──────────────────────┼──────────┤
│ ACTUAL_CPU_CLOCK │ 185118.0 │
│ MAX_CPU_CLOCK │ 129042.0 │
│ RETIRED_INSTRUCTIONS │ 6213.0 │
│ CPU_CLOCKS_UNHALTED │ 15122.0 │
│ DRAM_CHANNEL_0 │ 148.0 │
│ DRAM_CHANNEL_1 │ 110.0 │
│ DRAM_CHANNEL_2 │ 319.0 │
│ DRAM_CHANNEL_3 │ 326.0 │
└──────────────────────┴──────────┘
┌────────────────────────────────────────────┬────────────┐
│ Metric │ Thread 1 │
├────────────────────────────────────────────┼────────────┤
│ Runtime (RDTSC) [s] │ 6.53034e-6 │
│ Runtime unhalted [s] │ 7.55646e-5 │
│ Clock [MHz] │ 3514.37 │
│ CPI │ 2.43393 │
│ Memory bandwidth (channels 0-3) [MBytes/s] │ 8849.77 │
│ Memory data volume (channels 0-3) [GBytes] │ 5.7792e-5 │
└────────────────────────────────────────────┴────────────┘
LIKWID.PerfMon.read_counters
— MethodRead the counter registers. To be executed after start_counters
and before stop_counters
. Returns true
on success.
LIKWID.PerfMon.setup_counters
— MethodProgram the counter registers to measure all events in group groupid
(starts at 1). Returns true
on success.
LIKWID.PerfMon.start_counters
— MethodStart the counter registers. Returns true
on success.
LIKWID.PerfMon.stop_counters
— MethodStop the counter registers. Returns true
on success.
LIKWID.PerfMon.supported_groups
— MethodReturn a dictionary of all available perfmon groups.
Examples
julia> PerfMon.supported_groups()
Dict{String, LIKWID.GroupInfoCompact} with 18 entries:
"L2CACHE" => L2CACHE => L2 cache miss rate/ratio (experimental)
"MEM2" => MEM2 => Main memory bandwidth in MBytes/s (channels 4-7)
"NUMA" => NUMA => L2 cache bandwidth in MBytes/s (experimental)
"BRANCH" => BRANCH => Branch prediction miss rate/ratio
"FLOPS_SP" => FLOPS_SP => Single Precision MFLOP/s
"DIVIDE" => DIVIDE => Divide unit information
"CPI" => CPI => Cycles per instruction
"L2" => L2 => L2 cache bandwidth in MBytes/s (experimental)
"L3" => L3 => L3 cache bandwidth in MBytes/s
"L3CACHE" => L3CACHE => L3 cache miss rate/ratio (experimental)
"CACHE" => CACHE => Data cache miss rate/ratio
"ICACHE" => ICACHE => Instruction cache miss rate/ratio
"TLB" => TLB => TLB miss rate/ratio
"CLOCK" => CLOCK => Cycles per instruction
"FLOPS_DP" => FLOPS_DP => Double Precision MFLOP/s
"ENERGY" => ENERGY => Power and Energy consumption
"MEM1" => MEM1 => Main memory bandwidth in MBytes/s (channels 0-3)
"DATA" => DATA => Load to store ratio
LIKWID.PerfMon.switch_group
— MethodSwitch currently active group to groupid
(starts with 1). Returns true
on success.
LIKWID.PerfMon.@perfmon
— Macro@perfmon group_or_groups codeblock
See also: perfmon
Example
julia> using LIKWID
julia> x = rand(1000); y = rand(1000);
julia> metrics, events = @perfmon "FLOPS_DP" x .+ y;
Group: FLOPS_DP
┌───────────────────────────┬──────────┐
│ Event │ Thread 1 │
├───────────────────────────┼──────────┤
│ ACTUAL_CPU_CLOCK │ 88187.0 │
│ MAX_CPU_CLOCK │ 61789.0 │
│ RETIRED_INSTRUCTIONS │ 6705.0 │
│ CPU_CLOCKS_UNHALTED │ 34181.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │ 1000.0 │
│ MERGE │ 0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│ Metric │ Thread 1 │
├──────────────────────┼────────────┤
│ Runtime (RDTSC) [s] │ 1.08307e-5 │
│ Runtime unhalted [s] │ 3.59977e-5 │
│ Clock [MHz] │ 3496.42 │
│ CPI │ 5.09784 │
│ DP [MFLOP/s] │ 92.3302 │
└──────────────────────┴────────────┘
julia> first(metrics["FLOPS_DP"]) # all metrics of the first Julia thread
OrderedDict{String, Float64} with 5 entries:
"Runtime (RDTSC) [s]" => 8.56091e-6
"Runtime unhalted [s]" => 3.22377e-5
"Clock [MHz]" => 3506.47
"CPI" => 4.78484
"DP [MFLOP/s]" => 116.81
julia> first(events["FLOPS_DP"]) # all events of the first Julia thread
OrderedDict{String, Float64} with 6 entries:
"ACTUAL_CPU_CLOCK" => 78974.0
"MAX_CPU_CLOCK" => 55174.0
"RETIRED_INSTRUCTIONS" => 5977.0
"CPU_CLOCKS_UNHALTED" => 28599.0
"RETIRED_SSE_AVX_FLOPS_ALL" => 1000.0
"MERGE" => 0.0
Types
LIKWID.GroupInfoCompact
— TypeEssential information about a performance group