Performance Monitoring (PerfMon)

API

Functions

LIKWID.PerfMon.add_event_setMethod
add_event_set(estr) -> groupid

Add a performance group or a custom event set to the perfmon module. Returns a groupid (starting at 1) which is required to later specify the event set.

source
LIKWID.PerfMon.get_last_metricMethod

Return the derived metric result of the last measurement cycle identified by group groupid and the indices for metric metricidx and thread threadidx (all starting at 1).

source
LIKWID.PerfMon.get_last_resultMethod

Return the raw counter register result of the last measurement cycle identified by group groupid and the indices for event eventidx and thread threadidx (all starting at 1).

source
LIKWID.PerfMon.get_metricMethod

Return the derived metric result of all measurements identified by group groupid and the indices for metric metricidx and thread threadidx (all starting at 1).

source
LIKWID.PerfMon.get_metric_resultsMethod

get_metric_results([groupid_or_groupname, metricid_or_metricname, threadid::Integer])

Retrieve the results of monitored metrics.

Optionally, a group, metric, and threadid can be provided to select a subset of metrics or a single metric. If given as integers, note that groupid, metricid, and threadid all start at 1 and the latter enumerates the monitored cpu threads.

If no arguments are provided, a nested data structure is returned in which different levels correspond to performance groups, cpu threads, and metrics (in this order).

Examples

julia> PerfMon.get_metric_results("FLOPS_DP")
4-element Vector{OrderedDict{String, Float64}}:
 OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 0.0016642799007831007, "Clock [MHz]" => 2911.9285695819794, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)
 OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 1.4755564705029072, "Clock [MHz]" => 3523.1114993407705, "CPI" => 0.3950777002592585, "DP [MFLOP/s]" => 17608.069202657578)
 OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 7.80437228993214e-5, "Clock [MHz]" => 2638.6244625814124, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)
 OrderedDict("Runtime (RDTSC) [s]" => 1.1381168037989857, "Runtime unhalted [s]" => 7.050705084934875e-5, "Clock [MHz]" => 2807.7525945849698, "CPI" => NaN, "DP [MFLOP/s]" => 0.0)

julia> PerfMon.get_metric_results("FLOPS_DP", 2) # results of second monitored cpu thread
OrderedDict{String, Float64} with 5 entries:
  "Runtime (RDTSC) [s]"  => 1.13812
  "Runtime unhalted [s]" => 1.47556
  "Clock [MHz]"          => 3523.11
  "CPI"                  => 0.395078
  "DP [MFLOP/s]"         => 17608.1

julia> PerfMon.get_metric_results("FLOPS_DP", "DP [MFLOP/s]", 2)
17608.069202657578
source
LIKWID.PerfMon.get_metric_resultsMethod

get_metric_results()

Get the metric results for all performance groups and all monitored (PerfMon.init) cpu threads.

Returns a an OrderedDict whose keys correspond to the performance groups and the values hold the results for all monitored cpu threads.

Examples

julia> results = PerfMon.get_metric_results()
OrderedDict{String, Vector{OrderedDict{String, Float64}}} with 1 entry:
  "FLOPS_DP" => [OrderedDict("Runtime (RDTSC) [s]"=>1.13812, "Runtime unhalted [s]"=>0.00166428, "Clock [MHz]"=>291…

julia> PerfMon.get_metric_results()["FLOPS_DP"][2]["DP [MFLOP/s]"]
17608.069202657578
source
LIKWID.PerfMon.get_resultMethod

Return the raw counter register result of all measurements identified by group groupid and the indices for event eventidx and thread threadidx (all starting at 1).

source
LIKWID.PerfMon.initFunction
init(cpuid_or_cpuids)

Initialize LIKWID's PerfMon module for the cpu threads with the given ids (starting at 0!).

source
LIKWID.PerfMon.perfmonMethod
perfmon(f, group_or_groups[; cpuids, autopin=true]) -> metrics, events

Monitor performance groups while executing the given function f on one or multiple Julia threads. Note that

  • PerfMon.init and PerfMon.finalize are called automatically
  • the measurement of multiple performance groups is sequential and requires multiple executions of f!

The returned data structures metrics and events are nested and different levels correspond to performance groups, threads, and measured metrics (in this order).

Keyword arguments:

  • cpuids (default: currently used CPU threads): specify the CPU threads (~ cores) to be monitored
  • autopin (default: true): automatically pin Julia threads to the CPU threads (~ cores) they are currently running on (to avoid migration and wrong results).
  • print (default: true): toggle printing of result tables
  • finalize (default: true): call PerfMon.finalize in the end

Example

julia> using LIKWID

julia> x = rand(1000); y = rand(1000);

julia> metrics, events = perfmon("FLOPS_DP") do
           x .+ y;
       end;

Group: FLOPS_DP
┌───────────────────────────┬───────────┐
│                     Event │  Thread 1 │
├───────────────────────────┼───────────┤
│          ACTUAL_CPU_CLOCK │ 2.32582e8 │
│             MAX_CPU_CLOCK │ 1.61685e8 │
│      RETIRED_INSTRUCTIONS │ 3.12775e8 │
│       CPU_CLOCKS_UNHALTED │ 2.29064e8 │
│ RETIRED_SSE_AVX_FLOPS_ALL │    4964.0 │
│                     MERGE │       0.0 │
└───────────────────────────┴───────────┘
┌──────────────────────┬───────────┐
│               Metric │  Thread 1 │
├──────────────────────┼───────────┤
│  Runtime (RDTSC) [s] │ 0.0659737 │
│ Runtime unhalted [s] │ 0.0949394 │
│          Clock [MHz] │   3524.02 │
│                  CPI │  0.732361 │
│         DP [MFLOP/s] │ 0.0752421 │
└──────────────────────┴───────────┘

julia> first(metrics["FLOPS_DP"]) # all metrics of the first Julia thread
OrderedDict{String, Float64} with 5 entries:
  "Runtime (RDTSC) [s]"  => 0.0659737
  "Runtime unhalted [s]" => 0.0949394
  "Clock [MHz]"          => 3524.02
  "CPI"                  => 0.732361
  "DP [MFLOP/s]"         => 0.0752421

julia> first(events["FLOPS_DP"]) # all raw events of the first Julia thread
OrderedDict{String, Float64} with 6 entries:
  "ACTUAL_CPU_CLOCK"          => 2.32582e8
  "MAX_CPU_CLOCK"             => 1.61685e8
  "RETIRED_INSTRUCTIONS"      => 3.12775e8
  "CPU_CLOCKS_UNHALTED"       => 2.29064e8
  "RETIRED_SSE_AVX_FLOPS_ALL" => 4964.0
  "MERGE"                     => 0.0

julia> metrics, events = perfmon(("FLOPS_DP", "MEM1")) do
           x .+ y;
       end;

Group: FLOPS_DP
┌───────────────────────────┬──────────┐
│                     Event │ Thread 1 │
├───────────────────────────┼──────────┤
│          ACTUAL_CPU_CLOCK │  85773.0 │
│             MAX_CPU_CLOCK │  60074.0 │
│      RETIRED_INSTRUCTIONS │   6605.0 │
│       CPU_CLOCKS_UNHALTED │  32291.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │   1000.0 │
│                     MERGE │      0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│               Metric │   Thread 1 │
├──────────────────────┼────────────┤
│  Runtime (RDTSC) [s] │ 9.99103e-6 │
│ Runtime unhalted [s] │ 3.50123e-5 │
│          Clock [MHz] │    3497.79 │
│                  CPI │    4.88887 │
│         DP [MFLOP/s] │     100.09 │
└──────────────────────┴────────────┘

Group: MEM1
┌──────────────────────┬──────────┐
│                Event │ Thread 1 │
├──────────────────────┼──────────┤
│     ACTUAL_CPU_CLOCK │ 185118.0 │
│        MAX_CPU_CLOCK │ 129042.0 │
│ RETIRED_INSTRUCTIONS │   6213.0 │
│  CPU_CLOCKS_UNHALTED │  15122.0 │
│       DRAM_CHANNEL_0 │    148.0 │
│       DRAM_CHANNEL_1 │    110.0 │
│       DRAM_CHANNEL_2 │    319.0 │
│       DRAM_CHANNEL_3 │    326.0 │
└──────────────────────┴──────────┘
┌────────────────────────────────────────────┬────────────┐
│                                     Metric │   Thread 1 │
├────────────────────────────────────────────┼────────────┤
│                        Runtime (RDTSC) [s] │ 6.53034e-6 │
│                       Runtime unhalted [s] │ 7.55646e-5 │
│                                Clock [MHz] │    3514.37 │
│                                        CPI │    2.43393 │
│ Memory bandwidth (channels 0-3) [MBytes/s] │    8849.77 │
│ Memory data volume (channels 0-3) [GBytes] │  5.7792e-5 │
└────────────────────────────────────────────┴────────────┘
source
LIKWID.PerfMon.supported_groupsMethod

Return a dictionary of all available perfmon groups.

Examples

julia> PerfMon.supported_groups()
Dict{String, LIKWID.GroupInfoCompact} with 18 entries:
  "L2CACHE"  => L2CACHE => L2 cache miss rate/ratio (experimental)
  "MEM2"     => MEM2 => Main memory bandwidth in MBytes/s (channels 4-7)
  "NUMA"     => NUMA => L2 cache bandwidth in MBytes/s (experimental)
  "BRANCH"   => BRANCH => Branch prediction miss rate/ratio
  "FLOPS_SP" => FLOPS_SP => Single Precision MFLOP/s
  "DIVIDE"   => DIVIDE => Divide unit information
  "CPI"      => CPI => Cycles per instruction
  "L2"       => L2 => L2 cache bandwidth in MBytes/s (experimental)
  "L3"       => L3 => L3 cache bandwidth in MBytes/s
  "L3CACHE"  => L3CACHE => L3 cache miss rate/ratio (experimental)
  "CACHE"    => CACHE => Data cache miss rate/ratio
  "ICACHE"   => ICACHE => Instruction cache miss rate/ratio
  "TLB"      => TLB => TLB miss rate/ratio
  "CLOCK"    => CLOCK => Cycles per instruction
  "FLOPS_DP" => FLOPS_DP => Double Precision MFLOP/s
  "ENERGY"   => ENERGY => Power and Energy consumption
  "MEM1"     => MEM1 => Main memory bandwidth in MBytes/s (channels 0-3)
  "DATA"     => DATA => Load to store ratio
source
LIKWID.PerfMon.@perfmonMacro
@perfmon group_or_groups codeblock

See also: perfmon

Example

julia> using LIKWID

julia> x = rand(1000); y = rand(1000);

julia> metrics, events = @perfmon "FLOPS_DP" x .+ y;

Group: FLOPS_DP
┌───────────────────────────┬──────────┐
│                     Event │ Thread 1 │
├───────────────────────────┼──────────┤
│          ACTUAL_CPU_CLOCK │  88187.0 │
│             MAX_CPU_CLOCK │  61789.0 │
│      RETIRED_INSTRUCTIONS │   6705.0 │
│       CPU_CLOCKS_UNHALTED │  34181.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │   1000.0 │
│                     MERGE │      0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│               Metric │   Thread 1 │
├──────────────────────┼────────────┤
│  Runtime (RDTSC) [s] │ 1.08307e-5 │
│ Runtime unhalted [s] │ 3.59977e-5 │
│          Clock [MHz] │    3496.42 │
│                  CPI │    5.09784 │
│         DP [MFLOP/s] │    92.3302 │
└──────────────────────┴────────────┘

julia> first(metrics["FLOPS_DP"]) # all metrics of the first Julia thread
OrderedDict{String, Float64} with 5 entries:
  "Runtime (RDTSC) [s]"  => 8.56091e-6
  "Runtime unhalted [s]" => 3.22377e-5
  "Clock [MHz]"          => 3506.47
  "CPI"                  => 4.78484
  "DP [MFLOP/s]"         => 116.81

julia> first(events["FLOPS_DP"]) # all events of the first Julia thread
OrderedDict{String, Float64} with 6 entries:
  "ACTUAL_CPU_CLOCK"          => 78974.0
  "MAX_CPU_CLOCK"             => 55174.0
  "RETIRED_INSTRUCTIONS"      => 5977.0
  "CPU_CLOCKS_UNHALTED"       => 28599.0
  "RETIRED_SSE_AVX_FLOPS_ALL" => 1000.0
  "MERGE"                     => 0.0
source

Types