Marker API (CPU)

Example

(See https://github.com/JuliaPerf/LIKWID.jl/tree/main/examples/perfctr.)

# perfctr.jl
using LIKWID
using LinearAlgebra

Marker.init()

A = rand(128, 64)
B = rand(64, 128)
C = zeros(128, 128)

@marker for _ in 1:100
    mul!(C, A, B)
end

Marker.close()

Manual

# perfctr.jl
using LIKWID
using LinearAlgebra

Marker.init()

A = rand(128, 64)
B = rand(64, 128)
C = zeros(128, 128)

Marker.registerregion("matmul") # optional
Marker.startregion("matmul")
for _ in 1:100
    mul!(C, A, B)
end
Marker.stopregion("matmul")

Marker.close()

Index

API

LIKWID.Marker.closeMethod

Close the connection to the LIKWID Marker API and write out measurement data to file. This file will be evaluated by likwid-perfctr.

source
LIKWID.Marker.getregionMethod
getregion(regiontag::AbstractString, [num_events]) -> nevents, events, time, count

Get the intermediate results of the region identified by regiontag. On success, it returns * nevents: the number of events in the current group, * events: a list with all the aggregated event results, * time: the measurement time for the region and * count: the number of calls.

source
LIKWID.Marker.initMethod

Initialize the Marker API, assuming that julia is running under likwid-perfctr. Must be called previous to all other functions.

source
LIKWID.Marker.init_dynamicMethod
init_dynamic(group_or_groups; kwargs...)

Initialize the full Marker API from within the current Julia session (i.e. no likwird-perfctr necessary). A performance group, e.g. "FLOPS_DP", must be provided as the first argument.

source
LIKWID.Marker.init_nothreadsMethod

Initialize the Marker API only on the main thread (assuming that julia is running under likwid-perfctr). LIKWID.Marker.threadinit() must be called manually.

source
LIKWID.Marker.isactiveMethod

Checks whether the Marker API is active (by checking if the LIKWID_MODE environment variable has been set).

source
LIKWID.Marker.markerMethod
marker(f, regiontag::AbstractString)

Adds a LIKWID marker region around the execution of the given function f using Marker.startregion, Marker.stopregion under the hood. Note that LIKWID.Marker.init() and LIKWID.Marker.close() must be called before and after, respectively.

Examples

julia> using LIKWID

julia> Marker.init()

julia> marker("sleeping...") do
           sleep(1)
       end
true

julia> marker(()->rand(100), "create rand vec")
true

julia> Marker.close()
source
LIKWID.Marker.nextgroupMethod

Switch to the next event set in a round-robin fashion. If you have set only one event set on the command line, this function performs no operation.

source
LIKWID.Marker.perfmon_markerMethod
perfmon_marker(f, group_or_groups[; kwargs...])

Monitor performance groups in marked areas (see @marker) while executing the given function f on one or multiple Julia threads.

This is an experimental feature!

Note that

  • Marker.init_dynamic, Marker.init, Marker.close, and PerfMon.finalize are called automatically
  • the measurement of multiple performance groups is sequential and requires multiple executions of f!

Keyword arguments:

  • cpuids (default: currently used CPU threads): specify the CPU threads (~ cores) to be monitored
  • autopin (default: true): automatically pin Julia threads to the CPU threads (~ cores) they are currently running on (to avoid migration and wrong results).
  • keep (default: false): keep the temporarily created marker file

Example

julia> using LIKWID

julia> perfmon_marker("FLOPS_DP") do
           # only the marked regions are monitored!
           NUM_FLOPS = 100_000_000
           a = 1.8
           b = 3.2
           c = 1.3
           @marker "calc_flops" for _ in 1:NUM_FLOPS
                c = a * b + c
            end
           z = a*b+c
           @marker "exponential" exp(z)
           sin(c)
       end

Region: calc_flops, Group: FLOPS_DP
┌───────────────────────────┬───────────┐
│                     Event │  Thread 1 │
├───────────────────────────┼───────────┤
│          ACTUAL_CPU_CLOCK │ 3.00577e8 │
│             MAX_CPU_CLOCK │ 2.08917e8 │
│      RETIRED_INSTRUCTIONS │ 3.00005e8 │
│       CPU_CLOCKS_UNHALTED │ 3.00067e8 │
│ RETIRED_SSE_AVX_FLOPS_ALL │     1.0e8 │
│                     MERGE │       0.0 │
└───────────────────────────┴───────────┘
┌──────────────────────┬───────────┐
│               Metric │  Thread 1 │
├──────────────────────┼───────────┤
│  Runtime (RDTSC) [s] │ 0.0852431 │
│ Runtime unhalted [s] │  0.122687 │
│          Clock [MHz] │   3524.84 │
│                  CPI │   1.00021 │
│         DP [MFLOP/s] │   1173.12 │
└──────────────────────┴───────────┘

Region: exponential, Group: FLOPS_DP
┌───────────────────────────┬──────────┐
│                     Event │ Thread 1 │
├───────────────────────────┼──────────┤
│          ACTUAL_CPU_CLOCK │  85696.0 │
│             MAX_CPU_CLOCK │  59192.0 │
│      RETIRED_INSTRUCTIONS │   5072.0 │
│       CPU_CLOCKS_UNHALTED │   6013.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │     27.0 │
│                     MERGE │      0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│               Metric │   Thread 1 │
├──────────────────────┼────────────┤
│  Runtime (RDTSC) [s] │ 2.60005e-7 │
│ Runtime unhalted [s] │ 3.49786e-5 │
│          Clock [MHz] │    3546.95 │
│                  CPI │    1.18553 │
│         DP [MFLOP/s] │    103.844 │
└──────────────────────┴────────────┘
source
LIKWID.Marker.registerregionMethod

Register a region with name regiontag to the Marker API. On success, true is returned.

This is an optional function to reduce the overhead of region registration at Marker.startregion. If you don't call registerregion, the registration is done at startregion.

source
LIKWID.Marker.@perfmon_markerMacro
@perfmon_marker group_or_groups codeblock

This is an experimental feature!

See also: perfmon_marker

Example

julia> using LIKWID

julia> @perfmon_marker "FLOPS_DP" begin
           @marker "exponential" exp(3.141)
       end

Region: exponential, Group: FLOPS_DP
┌───────────────────────────┬──────────┐
│                     Event │ Thread 1 │
├───────────────────────────┼──────────┤
│          ACTUAL_CPU_CLOCK │ 115146.0 │
│             MAX_CPU_CLOCK │  78547.0 │
│      RETIRED_INSTRUCTIONS │   4208.0 │
│       CPU_CLOCKS_UNHALTED │   7112.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │     10.0 │
│                     MERGE │      0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│               Metric │   Thread 1 │
├──────────────────────┼────────────┤
│  Runtime (RDTSC) [s] │ 3.02056e-8 │
│ Runtime unhalted [s] │ 4.70008e-5 │
│          Clock [MHz] │     3591.4 │
│                  CPI │    1.69011 │
│         DP [MFLOP/s] │    331.064 │
└──────────────────────┴────────────┘
source