Performance Monitoring (PerfMon)
API
LIKWID.PerfMon.add_event_setLIKWID.PerfMon.get_event_resultsLIKWID.PerfMon.get_id_of_active_groupLIKWID.PerfMon.get_id_of_eventLIKWID.PerfMon.get_id_of_groupLIKWID.PerfMon.get_id_of_metricLIKWID.PerfMon.get_last_metricLIKWID.PerfMon.get_last_resultLIKWID.PerfMon.get_longinfo_of_groupLIKWID.PerfMon.get_metricLIKWID.PerfMon.get_metric_resultsLIKWID.PerfMon.get_metric_resultsLIKWID.PerfMon.get_name_of_counterLIKWID.PerfMon.get_name_of_eventLIKWID.PerfMon.get_name_of_groupLIKWID.PerfMon.get_name_of_metricLIKWID.PerfMon.get_number_of_eventsLIKWID.PerfMon.get_number_of_groupsLIKWID.PerfMon.get_number_of_metricsLIKWID.PerfMon.get_number_of_threadsLIKWID.PerfMon.get_resultLIKWID.PerfMon.get_shortinfo_of_groupLIKWID.PerfMon.get_time_of_groupLIKWID.PerfMon.initLIKWID.PerfMon.isgroupsupportedLIKWID.PerfMon.list_eventsLIKWID.PerfMon.list_metricsLIKWID.PerfMon.perfmonLIKWID.PerfMon.read_countersLIKWID.PerfMon.setup_countersLIKWID.PerfMon.start_countersLIKWID.PerfMon.stop_countersLIKWID.PerfMon.supported_groupsLIKWID.PerfMon.switch_groupLIKWID.PerfMon.@perfmonLIKWID.GroupInfoCompact
Functions
LIKWID.PerfMon.add_event_set — Methodadd_event_set(estr) -> groupidAdd a performance group or a custom event set to the perfmon module. Returns a groupid (starting at 1) which is required to later specify the event set.
LIKWID.PerfMon.get_event_results — Methodget_event_results([groupid_or_groupname, eventid_or_eventname, threadid::Integer])
Retrieve the results of monitored events. Same as get_metric_results but for raw events.
LIKWID.PerfMon.get_id_of_active_group — MethodReturn the groupid of the currently activate group.
LIKWID.PerfMon.get_id_of_event — MethodGet the id of the event with the given name.
LIKWID.PerfMon.get_id_of_group — MethodGet the id of the group with the given name.
LIKWID.PerfMon.get_id_of_metric — MethodGet the id of the metric with the given name.
LIKWID.PerfMon.get_last_metric — MethodReturn the derived metric result of the last measurement cycle identified by group groupid and the indices for metric metricidx and thread threadidx (all starting at 1).
LIKWID.PerfMon.get_last_result — MethodReturn the raw counter register result of the last measurement cycle identified by group groupid and the indices for event eventidx and thread threadidx (all starting at 1).
LIKWID.PerfMon.get_longinfo_of_group — MethodReturn the (long) description of a performance group with id groupid (starts at 1).
LIKWID.PerfMon.get_metric — MethodReturn the derived metric result of all measurements identified by group groupid and the indices for metric metricidx and thread threadidx (all starting at 1).
LIKWID.PerfMon.get_metric_results — Methodget_metric_results([groupid_or_groupname, metricid_or_metricname, threadid::Integer])
Retrieve the results of monitored metrics.
Optionally, a group, metric, and threadid can be provided to select a subset of metrics or a single metric. If given as integers, note that groupid, metricid, and threadid all start at 1 and the latter enumerates the monitored cpu threads.
If no arguments are provided, a nested data structure is returned in which different levels correspond to performance groups, cpu threads, and metrics (in this order).
Examples
julia> PerfMon.get_metric_results("FLOPS_DP")
4-element Vector{OrderedDict{String, Float64}}:
OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 0.0016642799007831007, "Clock [MHz]" => 2911.9285695819794, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)
OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 1.4755564705029072, "Clock [MHz]" => 3523.1114993407705, "CPI" => 0.3950777002592585, "DP [MFLOP/s]" => 17608.069202657578)
OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 7.80437228993214e-5, "Clock [MHz]" => 2638.6244625814124, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)
OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 7.050705084934875e-5, "Clock [MHz]" => 2807.7525945849698, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)
julia> PerfMon.get_metric_results("FLOPS_DP", 2) # results of second monitored cpu thread
OrderedDict{String, Float64} with 5 entries:
"Runtime (RDTSC) [s]" => 1.13812
"Runtime unhalted [s]" => 1.47556
"Clock [MHz]" => 3523.11
"CPI" => 0.395078
"DP [MFLOP/s]" => 17608.1
julia> PerfMon.get_metric_results("FLOPS_DP", "DP [MFLOP/s]", 2)
17608.069202657578LIKWID.PerfMon.get_metric_results — Methodget_metric_results()
Get the metric results for all performance groups and all monitored (PerfMon.init) cpu threads.
Returns a an OrderedDict whose keys correspond to the performance groups and the values hold the results for all monitored cpu threads.
Examples
julia> results = PerfMon.get_metric_results()
OrderedDict{String, Vector{OrderedDict{String, Float64}}} with 1 entry:
"FLOPS_DP" => [OrderedDict("Runtime (RDTSC) [s]"=>1.13812, "Runtime unhalted [s]"=>0.00166428, "Clock [MHz]"=>291…
julia> PerfMon.get_metric_results()["FLOPS_DP"][2]["DP [MFLOP/s]"]
17608.069202657578LIKWID.PerfMon.get_name_of_counter — MethodReturn the name of the counter register identified by groupid and eventidx (both starting at 1).
LIKWID.PerfMon.get_name_of_event — MethodReturn the name of the event identified by groupid and eventidx (both starting at 1).
LIKWID.PerfMon.get_name_of_group — MethodReturn the name of the group identified by groupid (starts at 1). If it is a custom event set, the name is set to Custom.
LIKWID.PerfMon.get_name_of_metric — MethodReturn the name of a derived metric identified by groupid and metricidx (both starting at 1).
LIKWID.PerfMon.get_number_of_events — MethodReturn the amount of events in the given group with id groupid (starts at 1).
LIKWID.PerfMon.get_number_of_groups — MethodReturn the number of groups currently registered in the perfmon module.
LIKWID.PerfMon.get_number_of_metrics — MethodReturn the amount of metrics in the given group with id groupid (starts at 1). Always zero for custom event sets.
LIKWID.PerfMon.get_number_of_threads — MethodReturn the number of threads initialized in the perfmon module.
LIKWID.PerfMon.get_result — MethodReturn the raw counter register result of all measurements identified by group groupid and the indices for event eventidx and thread threadidx (all starting at 1).
LIKWID.PerfMon.get_shortinfo_of_group — MethodReturn the short information about a performance group with id groupid (starts at 1).
LIKWID.PerfMon.get_time_of_group — MethodReturn the measurement time for group identified by groupid (starts at 1).
LIKWID.PerfMon.init — Functioninit(cpuid_or_cpuids)Initialize LIKWID's PerfMon module for the cpu threads with the given ids (starting at 0!).
LIKWID.PerfMon.isgroupsupported — MethodChecks if the given performance group is available on the current system.
LIKWID.PerfMon.list_events — MethodList all the events of a given group (groupid starts at 1).
LIKWID.PerfMon.list_metrics — MethodList all the metrics of a given group (groupid starts at 1).
LIKWID.PerfMon.perfmon — Methodperfmon(f, group_or_groups[; cpuids, autopin=true]) -> metrics, eventsMonitor performance groups while executing the given function f on one or multiple Julia threads. Note that
PerfMon.initandPerfMon.finalizeare called automatically- the measurement of multiple performance groups is sequential and requires multiple executions of
f!
The returned data structures metrics and events are nested and different levels correspond to performance groups, threads, and measured metrics (in this order).
Keyword arguments:
cpuids(default: currently used CPU threads): specify the CPU threads (~ cores) to be monitoredautopin(default:true): automatically pin Julia threads to the CPU threads (~ cores) they are currently running on (to avoid migration and wrong results).print(default:true): toggle printing of result tablesfinalize(default:true): callPerfMon.finalizein the end
Example
julia> using LIKWID
julia> x = rand(1000); y = rand(1000);
julia> metrics, events = perfmon("FLOPS_DP") do
x .+ y;
end;
Group: FLOPS_DP
┌───────────────────────────┬───────────┐
│ Event │ Thread 1 │
├───────────────────────────┼───────────┤
│ ACTUAL_CPU_CLOCK │ 2.32582e8 │
│ MAX_CPU_CLOCK │ 1.61685e8 │
│ RETIRED_INSTRUCTIONS │ 3.12775e8 │
│ CPU_CLOCKS_UNHALTED │ 2.29064e8 │
│ RETIRED_SSE_AVX_FLOPS_ALL │ 4964.0 │
│ MERGE │ 0.0 │
└───────────────────────────┴───────────┘
┌──────────────────────┬───────────┐
│ Metric │ Thread 1 │
├──────────────────────┼───────────┤
│ Runtime (RDTSC) [s] │ 0.0659737 │
│ Runtime unhalted [s] │ 0.0949394 │
│ Clock [MHz] │ 3524.02 │
│ CPI │ 0.732361 │
│ DP [MFLOP/s] │ 0.0752421 │
└──────────────────────┴───────────┘
julia> first(metrics["FLOPS_DP"]) # all metrics of the first Julia thread
OrderedDict{String, Float64} with 5 entries:
"Runtime (RDTSC) [s]" => 0.0659737
"Runtime unhalted [s]" => 0.0949394
"Clock [MHz]" => 3524.02
"CPI" => 0.732361
"DP [MFLOP/s]" => 0.0752421
julia> first(events["FLOPS_DP"]) # all raw events of the first Julia thread
OrderedDict{String, Float64} with 6 entries:
"ACTUAL_CPU_CLOCK" => 2.32582e8
"MAX_CPU_CLOCK" => 1.61685e8
"RETIRED_INSTRUCTIONS" => 3.12775e8
"CPU_CLOCKS_UNHALTED" => 2.29064e8
"RETIRED_SSE_AVX_FLOPS_ALL" => 4964.0
"MERGE" => 0.0
julia> metrics, events = perfmon(("FLOPS_DP", "MEM1")) do
x .+ y;
end;
Group: FLOPS_DP
┌───────────────────────────┬──────────┐
│ Event │ Thread 1 │
├───────────────────────────┼──────────┤
│ ACTUAL_CPU_CLOCK │ 85773.0 │
│ MAX_CPU_CLOCK │ 60074.0 │
│ RETIRED_INSTRUCTIONS │ 6605.0 │
│ CPU_CLOCKS_UNHALTED │ 32291.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │ 1000.0 │
│ MERGE │ 0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│ Metric │ Thread 1 │
├──────────────────────┼────────────┤
│ Runtime (RDTSC) [s] │ 9.99103e-6 │
│ Runtime unhalted [s] │ 3.50123e-5 │
│ Clock [MHz] │ 3497.79 │
│ CPI │ 4.88887 │
│ DP [MFLOP/s] │ 100.09 │
└──────────────────────┴────────────┘
Group: MEM1
┌──────────────────────┬──────────┐
│ Event │ Thread 1 │
├──────────────────────┼──────────┤
│ ACTUAL_CPU_CLOCK │ 185118.0 │
│ MAX_CPU_CLOCK │ 129042.0 │
│ RETIRED_INSTRUCTIONS │ 6213.0 │
│ CPU_CLOCKS_UNHALTED │ 15122.0 │
│ DRAM_CHANNEL_0 │ 148.0 │
│ DRAM_CHANNEL_1 │ 110.0 │
│ DRAM_CHANNEL_2 │ 319.0 │
│ DRAM_CHANNEL_3 │ 326.0 │
└──────────────────────┴──────────┘
┌────────────────────────────────────────────┬────────────┐
│ Metric │ Thread 1 │
├────────────────────────────────────────────┼────────────┤
│ Runtime (RDTSC) [s] │ 6.53034e-6 │
│ Runtime unhalted [s] │ 7.55646e-5 │
│ Clock [MHz] │ 3514.37 │
│ CPI │ 2.43393 │
│ Memory bandwidth (channels 0-3) [MBytes/s] │ 8849.77 │
│ Memory data volume (channels 0-3) [GBytes] │ 5.7792e-5 │
└────────────────────────────────────────────┴────────────┘
LIKWID.PerfMon.read_counters — MethodRead the counter registers. To be executed after start_counters and before stop_counters. Returns true on success.
LIKWID.PerfMon.setup_counters — MethodProgram the counter registers to measure all events in group groupid (starts at 1). Returns true on success.
LIKWID.PerfMon.start_counters — MethodStart the counter registers. Returns true on success.
LIKWID.PerfMon.stop_counters — MethodStop the counter registers. Returns true on success.
LIKWID.PerfMon.supported_groups — MethodReturn a dictionary of all available perfmon groups.
Examples
julia> PerfMon.supported_groups()
Dict{String, LIKWID.GroupInfoCompact} with 18 entries:
"L2CACHE" => L2CACHE => L2 cache miss rate/ratio (experimental)
"MEM2" => MEM2 => Main memory bandwidth in MBytes/s (channels 4-7)
"NUMA" => NUMA => L2 cache bandwidth in MBytes/s (experimental)
"BRANCH" => BRANCH => Branch prediction miss rate/ratio
"FLOPS_SP" => FLOPS_SP => Single Precision MFLOP/s
"DIVIDE" => DIVIDE => Divide unit information
"CPI" => CPI => Cycles per instruction
"L2" => L2 => L2 cache bandwidth in MBytes/s (experimental)
"L3" => L3 => L3 cache bandwidth in MBytes/s
"L3CACHE" => L3CACHE => L3 cache miss rate/ratio (experimental)
"CACHE" => CACHE => Data cache miss rate/ratio
"ICACHE" => ICACHE => Instruction cache miss rate/ratio
"TLB" => TLB => TLB miss rate/ratio
"CLOCK" => CLOCK => Cycles per instruction
"FLOPS_DP" => FLOPS_DP => Double Precision MFLOP/s
"ENERGY" => ENERGY => Power and Energy consumption
"MEM1" => MEM1 => Main memory bandwidth in MBytes/s (channels 0-3)
"DATA" => DATA => Load to store ratioLIKWID.PerfMon.switch_group — MethodSwitch currently active group to groupid (starts with 1). Returns true on success.
LIKWID.PerfMon.@perfmon — Macro@perfmon group_or_groups codeblockSee also: perfmon
Example
julia> using LIKWID
julia> x = rand(1000); y = rand(1000);
julia> metrics, events = @perfmon "FLOPS_DP" x .+ y;
Group: FLOPS_DP
┌───────────────────────────┬──────────┐
│ Event │ Thread 1 │
├───────────────────────────┼──────────┤
│ ACTUAL_CPU_CLOCK │ 88187.0 │
│ MAX_CPU_CLOCK │ 61789.0 │
│ RETIRED_INSTRUCTIONS │ 6705.0 │
│ CPU_CLOCKS_UNHALTED │ 34181.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │ 1000.0 │
│ MERGE │ 0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│ Metric │ Thread 1 │
├──────────────────────┼────────────┤
│ Runtime (RDTSC) [s] │ 1.08307e-5 │
│ Runtime unhalted [s] │ 3.59977e-5 │
│ Clock [MHz] │ 3496.42 │
│ CPI │ 5.09784 │
│ DP [MFLOP/s] │ 92.3302 │
└──────────────────────┴────────────┘
julia> first(metrics["FLOPS_DP"]) # all metrics of the first Julia thread
OrderedDict{String, Float64} with 5 entries:
"Runtime (RDTSC) [s]" => 8.56091e-6
"Runtime unhalted [s]" => 3.22377e-5
"Clock [MHz]" => 3506.47
"CPI" => 4.78484
"DP [MFLOP/s]" => 116.81
julia> first(events["FLOPS_DP"]) # all events of the first Julia thread
OrderedDict{String, Float64} with 6 entries:
"ACTUAL_CPU_CLOCK" => 78974.0
"MAX_CPU_CLOCK" => 55174.0
"RETIRED_INSTRUCTIONS" => 5977.0
"CPU_CLOCKS_UNHALTED" => 28599.0
"RETIRED_SSE_AVX_FLOPS_ALL" => 1000.0
"MERGE" => 0.0Types
LIKWID.GroupInfoCompact — TypeEssential information about a performance group