About¶
instrumentation-infra is an infrastructure for program instrumentation. It builds benchmark programs with custom instrumentation flags (e.g., LLVM passes) and runs them. The design is modular, so it is designed to be extended by users.
Overview¶
The infrastructure uses three high-level concepts to specify benchmarks and build flags:
- A target is a benchmark program (or a collection of programs) that is to be
instrumented. An example is
SPEC-CPU2006
. - An instance specifies how to build a target. An example is
infra.instances.Clang
which builds targets using the Clang compiler. For SPEC2006, one of the resulting binaries would be called400.perlbench-clang
. - Targets and instances can specify dependencies in the form of packages, which are built automatically before the target is built.
The infrastructure provides a number of common targets and their dependencies as packages. It also defines baseline instances for LLVM, along with packages for its build dependencies. There are some utility passes and a source patch for LLVM that lets you develop instrumentation passes in a shared object, without having to link them into the compiler after every rebuild.
A typical use case is a programmer that has implemented some security feature in
an LLVM pass, and wants to apply this pass to real-world benchmarks to measure
its performance impact. They would create an instance that adds the relevant
arguments to CFLAGS, create a setup script that registers this instance in the
infrastructure, and run the setup script with the build
and run
commands
to quickly see if things work on the builtin targets (e.g., SPEC).
Getting started¶
The easiest way to get started with the framework is to clone and adapt our skeleton repository which creates an example target and instrumentation instance. Consult the API docs for extensive documentation on the functions used. Read the usage guide to find our how to set up your own project otherwise, and for examples of how to invoke build and run commands.
Usage¶
instrumentation-infra is meant to be used as a submodule in a git
repository. To use it, you must create a setup script. The setup script
specifies which targets and instances are used by the current project, including
any custom targets and instances. An example can be found in our skeleton
repository here. The
setup script (which we will call setup.py
from now on) is an executable
Python script that calls Setup.main()
. The script has
a number of subcommands of which the basic usage is discussed below. Each
subcommand has an extensive --help
option that shows all of its knobs and
bells.
Installing dependencies¶
The infrastructure’s only hard dependency is Python 3.5. If you intend to use LLVM, however, there are some build dependencies. This is what you need for LLVM on a fresh Ubuntu 16.04 installation:
sudo apt-get install bison build-essential gettext git pkg-config python ssh
For nicer command-line usage, install the following Python packages (optional):
pip3 install --user coloredlogs argcomplete
# OR, in user space (add to ~/.bashrc):
sudo pip3 install coloredlogs argcomplete
argcomplete
enables command-line argument completion, but it needs to be
activated first (optional):
# in user space (add to ~/.bashrc, works for files called "setup.py"):
eval "$(register-python-argcomplete --complete-arguments -o nospace -o default -- setup.py)"
# OR, use global activation (only needed once, works for any file/user):
sudo activate-global-python-argcomplete --complete-arguments -o nospace -o default
Note: if you’re using zsh
you first need to load and run
bashcompinit
as shown here.
Cloning the framework in your project¶
First add the infrastructure as a git submodule. This creates a .gitmodules
file that you should commit:
git submodule add -b master git@github.com:vusec/instrumentation-infra.git infra
git add infra .gitmodules
git commit -m "Clone instrumentation infrastructure"
Next, create a setup script (recommended name setup.py
) in your project root
that invokes the infrastructure’s main function. Consult the skeleton example and
API docs
for this step.
Finally, write any target
, instance
and package
definitions needed or
your project so that you can use them in the commands below.
The build
and pkg-build
commands¶
./setup.py build TARGET INSTANCE ... [-j JOBS] [--iterations=N] [<target-options>]
./setup.py pkg-build PACKAGE [-j JOBS]
build
builds one or more instances of a target program. Only registered
targets/instances are valid. The API docs
explain how to
register them. Each target and instance specifies which packages it depends on.
For example, an instance that runs LLVM passes depends on LLVM, which in turn
depends on some libraries depending on the version used. Before building a
target programs, build
lists its dependencies, downloads and builds them,
and adds their installation directories to the PATH. All generated build files
are put in the build/
directory in the root of your project.
Each package specifies a simple test for the setup script to see if it has
already been built (e.g., it checks if install/bin/<binary>
exists). If so,
the build is skipped. This avoids having to run make
all the time for each
dependency, but sometimes you do want to force-run make
, for example while
debugging a custom package, or when you hackfixed the source code of a package.
In this case, you can use --force-rebuild-deps
to skip the checks and
rebuild everything, and optionally --clean
to first remove all generated
files the target (this behaves as if you just cloned the
project, use it with care).
The -j
option is forwarded to make
commands, allowing parallel builds
of object files. It defaults to the number of cores available on the machine,
with a maximum of 16 (but you can manually set it to larger values if you think
enough RAM is available).
pkg-build
builds a single package and its dependencies. It is useful for
debugging new packages or force-building a patched dependency.
The clean
command¶
./setup.py clean [--targets TARGET ...] [--packages PACKAGE ...]
clean
removes all generated files for a target program or package. This is
the opposite of build
. You can overwrite the behavior for your own targets
and packages (see the API docs
), but by default it
removes the entire build/{targets,packages}/<name>
directory.
clean
is particularly useful for cleaning build files of a custom package,
such as a runtime library with source code embedded in your project, before
running build
on a target that depends on the runtime library.
The run
command¶
./setup.py run TARGET INSTANCE ... [--build] [--iterations=N] [<target-options>]
run
runs one or more instances of a single target program. When --build
is passed, it first runs the build
command for that target. Valid values for
<target-options>
differ per target, the API docs
explain how to add options for your own targets.
The example below builds and runs the test workload of 401.bzip2 from the SPEC2006 suite, both compiled with Clang but with link-time optimizations disabled and disabled respectively:
./setup.py run --build spec2006 clang clang-lto --test --benchmarks 401.bzip2
The --iterations
option specifies the number of times to run the target, to
be able to compute a median and standard deviation for the runtime.
Parallel builds and runs¶
build
and run
both have the --parallel
option that divides the
workload over multiple cores or machines. The amount of parallelism is
controlled with --parallelmax=N
. There are two types:
--parallel=proc
spawns jobs as processes on the current machine.N
is the number of parallel processes running at any given time, and defaults to the number of cores. This is particularly useful for local development of link-time passes where single-threaded linking is the bottleneck. Do use this in conjunction with-j
to limit the amount of forked processes per job.--parallel=prun
schedules jobs asprun
jobs on different machines on the DAS-5 cluster. HereN
indicates the maximum number of node reservations of simultaneously scheduled jobs (both running and pending), defaulting to 64 (tailored to the VU cluster). Additional options such as job time can be passed directly toprun
using--prun-opts
.
The example below builds and runs the C/C++ subset of SPEC2006 with the test
workload, in order to test if the myinst
instance breaks anything. The
machine has 8 cores, so we limit the number of parallel program builds to 8
(which is also the default) and limit the number of build processes per program
using -j 2
to avoid excessive context switching:
./setup.py run --build --parallel proc --parallelmax 8 -j 2 \
spec2006 myinst --test --benchmarks all_c all_cpp
The report
command¶
./setup.py report TARGET RUNDIRS -i INSTANCE ... [--field FIELD:AGGREGATION ...] [--overhead BASELINE]
./setup.py report TARGET RUNDIRS -i INSTANCE --raw
./setup.py report TARGET RUNDIRS --help-fields
report
displays a table with benchmark results for the specified target,
gathered from a given list of run directories that have been populated by a
(parallel) run
invocation. Each target defines a number of reportable
fields that are measured during benchmarks, which are listed by
--help-fields
.
The report aggregates results by default, grouping them on the default field
set by infra.Target.aggregation_field
. This can be overridden using the
--groupby
option. The user must specify an aggregation function for each
reported field in the -f|--field
option. For instance, suppose we ran the
clang
and myinst
instances of the spec2006
target and want to
report the results. First we report the mean runtime and standard deviation to
see if the result (“count” shows the number of results):
./setup.py report spec2006 results/run.* -f runtime:count:mean:stdev_percent
Let’s assume the standard deviations are low and the runtimes look believable,
so we want to compute the overhead the runtime+memory overheads of the
instrumentation added in the myinst
instance, compared to the clang
instance:
./setup.py report spec2006 results/run.* -i myinst -f runtime:median maxrss:median --overhead clang
Alternatively, the --raw
option makes the command output all results
without aggregation. This can be useful when creating scatter plots, for
example:
./setup.py report spec2006 results/run.* -i myinst -f benchmark runtime maxrss --raw
The config
command¶
./setup.py config --targets
./setup.py config --instances
./setup.py config --packages
config
prints information about the setup configuration, such as the
registered targets, instances and packages (the union of all registered
dependencies).
The pkg-config
command¶
./setup.py pkg-config PACKAGE <package-options>
pkg-config
prints information about a single package, such as its
installation prefix or, in the case of a library package, the CFLAGS needed to
compile a program that uses the library. Each package can define its own options
here (see API docs
), but there are
two defaults:
--root
returnsbuild/packages/<package>
.--prefix
returnsbuild/packages/<package>/install
.
pkg-config
is intended to be used build systems of targets that need to call
into the setup script from a different process than the ./setup.py build ...
invocation. For example, our skeleton repository uses this to make the Makefile
for its LLVM passes stand-alone, allowing developers to run make
directly in
the llvm-passes/
directory rather than ../setup.py build --packages llvm-passes-skeleton
.
API documentation¶
Setup¶
-
class
infra.
Setup
(setup_path)[source]¶ Defines framework commands.
The setup takes care of complicated things like command-line parsing, logging, parallelism, environment setup and generating build paths. You should only need to use the methods documented here. To use the setup, you must first populate it with targets and instances using
add_target()
andadd_instance()
, and then callmain()
to run the command issued in the command-line arguments:setup = infra.Setup(__file__) setup.add_instance(MyAwesomeInstance()) setup.add_target(MyBeautifulTarget()) setup.main()
main()
creates acontext
that it passes to methods of targets/instances/packages. You can see it being used asctx
by many API methods below. The context contains setup configuration data, such as absolute build paths, and environment variables for build/run commands, such as which compiler and CFLAGS to use to build the current target. Your own targets and instances should read/write to the context.The job of an instance is to manipulate the the context such that a target is built in the desired way. This manipulation happens in predefined API methods which you must overwrite (see below). Hence, these methods receive the context as a parameter.
Parameters: setup_path (str) – Path to the script running Setup.main()
. Needed to allow build scripts to call back into the setup script for build hooks.-
add_command
(self, command)[source]¶ Register a setup command.
Parameters: command (Command) – The command to register. Return type: None
-
add_instance
(self, instance)[source]¶ Register an instance. Only registered instances can be referenced in commands, so also built-in instances must be registered.
Parameters: instance (Instance) – The instance to register. Return type: None
-
add_target
(self, target)[source]¶ Register a target. Only registered targets can be referenced in commands, so also built-in targets must be registered.
Parameters: target (Target) – The target to register. Return type: None
-
Context¶
-
class
infra.context.
Context
(paths, log, loglevel=0, args=<factory>, hooks=<factory>, runenv=<factory>, starttime=<factory>, target_run_wrapper='', runlog_file=None, runtee=None, jobs=8, arch='unknown', cc='cc', cxx='cxx', fc='fc', ar='ar', nm='nm', ranlib='ranlib', cflags=<factory>, cxxflags=<factory>, fcflags=<factory>, ldflags=<factory>, lib_ldflags=<factory>)[source]¶ The global configuration context, used by all targets, instances, etc.
For example, an instance can configure its compiler flags in this context, which are then used by targets.
Return type: None -
paths
= None¶ Absolute paths to be used (readonly) throughout the framework.
Type:
context.ContextPaths
-
log
= None¶ The logging object used for status updates.
Type:
logging.Logger
-
loglevel
= 0¶ The logging level as requested by the user.
Note that is differs from the logging object’s log level, since all debug output is written to a file regardless of the requested loglevel.
Type:
int
-
args
= None¶ Populated with processed command-line arguments. Targets and instances can add additional command-line arguments, which can be accessed through this object.
Type:
argparse.Namespace
-
hooks
= None¶ An object with hooks for various points in the building/running process.
Type:
context.ContextHooks
-
runenv
= None¶ Environment variables that are used when running a target.
Type:
typing.Dict[str, typing.Union[str, typing.List[str]]]
-
starttime
= None¶ When the current run of the infra was started.
Type:
datetime.datetime
-
target_run_wrapper
= ''¶ Command(s) to prepend in front of the target’s run command (executed directly on the command line). This can be set to a custom shell script, or for example
perf
orvalgrind
.Type:
str
-
runlog_file
= None¶ File object used for writing all executed commands, if enabled.
Type:
_io.TextIOWrapper or None
-
runtee
= None¶ Object used to redirect the output of executed commands to a file and stdout.
Type:
io.IOBase or None
-
jobs
= 8¶ The amount of parallel jobs to use. Contains the value of the
-j
command-line option, defaulting to the number of CPU cores returned bymultiprocessing.cpu_count()
.Type:
int
-
arch
= 'unknown'¶ Architecture to build targets for. Initialized to
platform.machine()
. Valid values includex86_64
andarm64
/aarch64
; for more, refer touname -m
andplatform.machine()
.Type:
str
-
cflags
= None¶ C compilation flags to use when building targets.
Type:
typing.List[str]
-
cxxflags
= None¶ C++ compilation flags to use when building targets.
Type:
typing.List[str]
-
fcflags
= None¶ Fortran compilation flags to use when building targets.
Type:
typing.List[str]
-
ldflags
= None¶ Linker flags to use when building targets.
Type:
typing.List[str]
-
lib_ldflags
= None¶ Special set of linker flags set by some packages, and is passed when linking target libraries that will later be (statically) linked into the binary.
In practice it is either empty or
['-flto']
when compiling with LLVM.Type:
typing.List[str]
-
-
class
infra.context.
ContextPaths
(infra, setup, workdir)[source]¶ Absolute, read-only, paths used throughout the infra.
Normally instances, targets, and packages do not need to consult these pathsdirectly, but instead use their respective
path
method.Return type: None -
root
¶ Root directory, that contains the user’s script invoking the infra.
-
buildroot
¶ Build directory.
-
log
¶ Directory containing all logs.
-
debuglog
¶ Path to the debug log.
-
runlog
¶ Path to the log of all executed commands.
-
packages
¶ Build directory for packages.
-
targets
¶ Build directory for targets.
-
pool_results
¶ Directory containing all results of running targets.
-
-
class
infra.context.
ContextHooks
(pre_build=<factory>, post_build=<factory>)[source]¶ Hooks (i.e., functions) that are executed at various stages during the building and running of targets.
Return type: None -
pre_build
= None¶ Hooks to execute before building a target.
Type:
typing.List[typing.Callable]
-
post_build
= None¶ Hooks to execute after a target is built.
This can be used to do additional post-processing on the generated binaries.
Type:
typing.List[typing.Callable]
-
Targets¶
-
class
infra.
Target
(*args, **kwargs)[source]¶ Abstract base class for target definitions. Built-in derived classes are listed here.
Each target must define a
name
attribute that is used to reference the target on the command line. The name must be unique among all registered targets. Each target must also implement a number of methods that are called bySetup
when running commands.The build command follows the following steps for each target:
- It calls
add_build_args()
to include any custom command-line arguments for this target, and then parses the command-line arguments. - It calls
is_fetched()
to see if the source code for this target has been downloaded yet. - If
is_fetched() == False
, it callsfetch()
. - It calls
Instance.configure()
on the instance that will be passed tobuild()
. - All packages listed by
dependencies()
are built and installed into the environment (i.e.,PATH
and such are set). - It calls
build()
to build the target binaries. - If any post-build hooks are installed by the current instance, it calls
binary_paths()
to get paths to all built binaries. These are then passed directly to the build hooks.
For the run command:
- It calls
add_run_args()
to include any custom command-line arguments for this target. - If
--build
was specified, it performs all build steps above. - It calls
Instance.prepare_run()
on the instance that will be passed torun()
. - It calls
run()
to run the target binaries.
For the clean command:
- It calls
is_clean()
to see if any build files exist for this target. - If
is_clean() == False
, it callsclean()
.
For the report command:
- It calls
parse_outfile()
for every log file before creating the report.
Naturally, when defining your own target, all the methods listed above must have working implementations. Some implementations are optional and some have a default implementation that works for almost all cases (see docs below), but the following are mandatory to implement for each new target:
is_fetched()
,fetch()
,build()
andrun()
.-
reportable_fields
(self)[source]¶ Run-time statistics reported by this target. Examples include the runtime of the benchmark, its memory or CPU utilization, or benchmarks-specific measurements such as throughput and latency of requests.
The format is a dictionary mapping the name of the statistic to a (human-readable) description. For each entry, the name is looked up in the logs and saved per run.
Return type: typing.Mapping[str, str]
-
add_build_args
(self, parser)[source]¶ Extend the command-line arguments for the build command with custom arguments for this target. These arguments end up in the global context, so it is a good idea to prefix them with the target name to avoid collisions with other targets and instances.
For example,
SPEC2006
defines--spec2006-benchmarks
(rather than--benchmarks
).Parameters: parser (argparse.ArgumentParser) – the argument parser to extend Return type: None
-
add_run_args
(self, parser)[source]¶ Extend the command-line arguments for the run command with custom arguments for this target. Since only a single target can be run at a time, prefixing to avoid naming conflicts with other targets is not necessary here.
For example,
SPEC2006
defines--benchmarks
and--test
.Parameters: parser (argparse.ArgumentParser) – the argument parser to extend Return type: None
-
dependencies
(self)[source]¶ Specify dependencies that should be built and installed in the run environment before building this target.
Return type: typing.Iterator[infra.package.Package]
-
path
(self, ctx, *args)[source]¶ Get the absolute path to the build directory of this target, optionally suffixed with a subpath.
Parameters: - ctx (context.Context) – the configuration context
- args (str) – additional subpath to pass to
os.path.join()
Returns: build/targets/<name>[/<subpath>]
Return type:
-
is_fetched
(self, ctx)[source]¶ Returns
True
iffetch()
should be called before building.Parameters: ctx (context.Context) – the configuration context Return type: bool
-
fetch
(self, ctx)[source]¶ Fetches the source code for this target. This step is separated from
build()
because thebuild
command first fetches all packages and targets before starting the build process.Parameters: ctx (context.Context) – the configuration context Return type: None
-
build
(self, ctx, instance, pool=None)[source]¶ Build the target object files. Called some time after
fetch()
(seeabove
).ctx.runenv
will have been populated with the exported environments of all packages returned bydependencies()
(i.e.,Package.install_env()
has been called for each dependency). This means that when you callutil.run()
here, the programs and libraries from the dependencies are available inPATH
andLD_LIBRARY_PATH
, so you don’t need to reference them with absolute paths.The build function should respect variables set in the configuration context such as
ctx.cc
andctx.cflags
, passing them to the underlying build system as required.Setup.ctx
shows default variables in the context that should at least be respected, but complex instances may optionally overwrite them to be used by custom targets.Any custom command-line arguments set by
add_build_args()
are available here inctx.args
.If
pool
is defined (i.e., when--parallel
is passed), the target is expected to usepool.run()
instead ofutil.run()
to invoke build commands.Parameters: - ctx (context.Context) – the configuration context
- instance (Instance) – instance to build
- pool (parallel.Pool or None) – parallel process pool if
--parallel
is specified
Return type:
-
run
(self, ctx, instance, pool=None)[source]¶ Run the target binaries. This should be done using
util.run()
so thatctx.runenv
is used (which can be set by an instance or dependencies). It is recommended to passteeout=True
to make the output of the process stream tostdout
.Any custom command-line arguments set by
add_run_args()
are available here inctx.args
.If
pool
is defined (i.e., when--parallel
is passed), the target is expected to usepool.run()
instead ofutil.run()
to launch runs.Implementations of this method should respect the
--iterations
option of the run command.Parameters: - ctx (context.Context) – the configuration context
- instance (Instance) – instance to run
- pool (parallel.Pool or None) – parallel process pool if
--parallel
is specified
Return type:
-
parse_outfile
(self, ctx, outfile)[source]¶ Callback method for
commands.report.parse_logs()
. Used by report command to get reportable results.Parameters: - ctx (context.Context) – the configuration context
- outfile (str) – path to outfile to parse
Return type: typing.Iterator[typing.MutableMapping[str, typing.Union[bool, int, float, str]]]
Raises: NotImplementedError – unless implemented
-
is_clean
(self, ctx)[source]¶ Returns
True
ifclean()
should be called before cleaning.Parameters: ctx (context.Context) – the configuration context Return type: bool
-
clean
(self, ctx)[source]¶ Clean generated files for this target, called by the clean command. By default, this removes
build/targets/<name>
.Parameters: ctx (context.Context) – the configuration context Return type: None
-
binary_paths
(self, ctx, instance)[source]¶ If implemented, this should return a list of absolute paths to binaries created by
build()
for the given instance. This is only used if the instance specifies post-build hooks. Each hook is called for each of the returned paths.Parameters: - ctx (context.Context) – the configuration context
- instance (Instance) – instance to get paths for
Returns: paths to binaries
Return type: Raises: NotImplementedError – unless implemented
- It calls
Instances¶
-
class
infra.
Instance
(*args, **kwargs)[source]¶ Abstract base class for instance definitions. Built-in derived classes are listed here.
Each instance must define a
name
attribute that is used to reference the instance on the command line. The name must be unique among all registered instances.An instance changes variables in the
configuration context
that are used to apply instrumentation while building a target byTarget.build()
andTarget.link()
. This is done byconfigure()
.Additionally, instances that need runtime support, such as a shared library, can implement
prepare_run()
which is called by therun
command just before running the target withTarget.run()
.-
name
¶ The instance’s name, must be unique.
-
add_build_args
(self, parser)[source]¶ Extend the command-line arguments for the build command with custom arguments for this instance. These arguments end up in the global context, so it is a good idea to prefix them with the instance name to avoid collisions with other instances and targets.
Use this to enable build flags for your instance on the command line, rather than having to create separate instances for every option when experimenting.
Parameters: parser (argparse.ArgumentParser) – the argument parser to extend Return type: None
-
dependencies
(self)[source]¶ Specify dependencies that should be built and installed in the run environment before building a target with this instance. Called before
configure()
andprepare_run()
.Return type: typing.Iterator[infra.package.Package]
-
configure
(self, ctx)[source]¶ Modify context variables to change how a target is built.
Typically, this would set one or more of
ctx.{cc,cxx,cflags,cxxflags,ldflags,hooks.post_build}
. It is recommended to use+=
rather than=
when assigning to lists in the context to avoid undoing changes by dependencies.Any custom command-line arguments set by
add_build_args()
are available here inctx.args
.Parameters: ctx (context.Context) – the configuration context Return type: None
-
prepare_run
(self, ctx)[source]¶ Modify context variables to change how a target is run.
Typically, this would change
ctx.runenv
, e.g., by settingctx.runenv.LD_LIBRARY_PATH
.Target.run()
is expected to callutil.run()
which will inherit the modified environment.Parameters: ctx (context.Context) – the configuration context Return type: None
-
Packages¶
-
class
infra.
Package
(*args, **kwargs)[source]¶ Abstract base class for package definitions. Built-in derived classes are listed here.
Each package must define a
ident()
method that returns a unique ID for the package instance. This is similar toTarget.name
, except that each instantiation of a package can return a different ID depending on its parameters. For example aBash
package might be initialized with a version number and be identified asbash-4.1
andbash-4.3
, which are different packages with different build directories.A dependency is built in three steps:
fetch()
downloads the source code, typically tobuild/packages/<ident>/src
.build()
builds the code, typically inbuild/packages/<ident>/obj
.install()
installs the built binaries/libraries, typically intobuild/packages/<ident>/install
.
The functions above are only called if
is_fetched()
,is_built()
andis_installed()
returnFalse
respectively. Additionally, ifis_installed()
returnsTrue
, fetching and building is skipped altogether. All these methods are abstract and thus require an implementation in a pacakge definition.clean()
removes all generated package files when the clean command is run. By default, this removesbuild/packages/<ident>
.The package needs to be able to install itself into
ctx.runenv
so that it can be used by targets/instances/packages that depend on it. This is done byinstall_env()
, which by default addsbuild/packages/<ident>/install/bin
to thePATH
andbuild/packages/<ident>/install/lib
to theLD_LIBRARY_PATH
.Finally, the setup script has a pkg-config command that prints package information such as the installation prefix of compilation flags required to build software that uses the package. These options are configured by
pkg_config_options()
.-
ident
(self)[source]¶ Returns a unique identifier to this package instantiation.
Two packages are considered identical if their identifiers are equal. This means that if multiple targets/instances/packages return different instantiations of a package as dependency that share the same identifier, they are assumed to be equal and only the first will be built. This way, different implementations of
dependencies()
can instantiate the same class in order to share a dependency.Return type: str
-
dependencies
(self)[source]¶ Specify dependencies that should be built and installed in the run environment before building this package.
Return type: typing.Iterator[ForwardRef(‘Package’)]
-
is_fetched
(self, ctx)[source]¶ Returns
True
iffetch()
should be called before building.Parameters: ctx (context.Context) – the configuration context Return type: bool
-
is_built
(self, ctx)[source]¶ Returns
True
ifbuild()
should be called before installing.Parameters: ctx (context.Context) – the configuration context Return type: bool
-
is_installed
(self, ctx)[source]¶ Returns
True
if the pacakge has not been installed yet, and thus needs to be fetched, built and installed.Parameters: ctx (context.Context) – the configuration context Return type: bool
-
fetch
(self, ctx)[source]¶ Fetches the source code for this package. This step is separated from
build()
because thebuild
command first fetches all packages and targets before starting the build process.Parameters: ctx (context.Context) – the configuration context Return type: None
-
build
(self, ctx)[source]¶ Build the package. Usually amounts to running
make -j<ctx.jobs>
usingutil.run()
.Parameters: ctx (context.Context) – the configuration context Return type: None
-
install
(self, ctx)[source]¶ Install the package. Usually amounts to running
make install
usingutil.run()
. It is recommended to install toself.path(ctx, 'install')
, which results inbuild/packages/<ident>/install
. Assuming that a bin and/or lib directories are generated in the install directory, the default behaviour ofinstall_env()
will automatically add those to[LD_LIBRARY_]PATH
.Parameters: ctx (context.Context) – the configuration context Return type: None
-
is_clean
(self, ctx)[source]¶ Returns
True
ifclean()
should be called before cleaning.Parameters: ctx (context.Context) – the configuration context Return type: bool
-
clean
(self, ctx)[source]¶ Clean generated files for this target, called by the clean command. By default, this removes
build/packages/<ident>
.Parameters: ctx (context.Context) – the configuration context Return type: None
-
path
(self, ctx, *args)[source]¶ Get the absolute path to the build directory of this package, optionally suffixed with a subpath.
Parameters: - ctx (context.Context) – the configuration context
- args (str) – additional subpath to pass to
os.path.join()
Returns: build/packages/<ident>[/<subpath>]
Return type:
-
install_env
(self, ctx)[source]¶ Install the package into
ctx.runenv
so that it can be used in subsequent calls toutil.run()
. By default, it addsbuild/packages/<ident>/install/bin
to thePATH
andbuild/packages/<ident>/install/lib
to theLD_LIBRARY_PATH
(but only if the directories exist).Parameters: ctx (context.Context) – the configuration context Return type: None
-
pkg_config_options
(self, ctx)[source]¶ Yield options for the pkg-config command. Each option is an (option, description, value) triple. The defaults are
--root
which returns the root directorybuild/packages/<ident>
, and--prefix
which returns the install directory populated byinstall()
:build/packages/<ident>/install
.When reimplementing this method in a derived package class, it is recommended to end the implementation with
yield from super().pkg_config_options(ctx)
to add the two default options.Parameters: ctx (context.Context) – the configuration context Return type: typing.Iterator[typing.Tuple[str, str, typing.Union[str, typing.Iterable[str]]]]
Utility functions¶
-
infra.util.
add_c_cxxflag
(ctx, flag)[source]¶ Add a flag both to ctx.cflags & ctx.cxxflags if new
Return type: None
-
infra.util.
add_lib_ldflag
(ctx, flag, also_ldflag=False)[source]¶ Add flag to ctx.lib_ldflags if new
Return type: None
-
class
infra.util.
Index
(thing_name)[source]¶ -
keys
(self)[source]¶ Return type: typing.KeysView[str]
-
values
(self)[source]¶ Return type: typing.ValuesView[~T]
-
items
(self)[source]¶ Return type: typing.ItemsView[str, ~T]
-
-
exception
infra.util.
FatalError
[source]¶ Raised for errors that should stop the execution immediately, but do not need a backtrace. Results in only the exception message being logged. This typically means there is an error in the user input, rather than in the code that raises the error.
-
infra.util.
apply_patch
(ctx, path, strip_count)[source]¶ Applies a patch in the current directory by calling
patch -p<strip_count> < <path>
.Afterwards, a stamp file called
.patched-<basename>
is created to indicate that the patch has been applied. If the stamp file is already present, the patch is not applied at all.<basename>
is generated from the patch file name:path/to/my-patch.patch
becomesmy-patch
.Parameters: - ctx (context.Context) – the configuration context
- path (str) – path to the patch file
- strip_count (int) – number of leading elements to strip from patch paths
Returns: True
if the patch was applied,False
if it was already applied beforeReturn type:
-
infra.util.
run
(ctx, cmd, allow_error=False, silent=False, teeout=False, defer=False, env={}, **kwargs)[source]¶ Wrapper for
subprocess.run()
that does environment/output logging and provides a few useful options. The log file isbuild/log/commands.txt
. Where possible, use this wrapper in favor ofsubprocess.run()
to facilitate easier debugging.It is useful to permanently have a terminal window open running
tail -f build/log/commands.txt
, This way, command output is available in case of errors but does not clobber the setup’s progress log.The run environment is based on
os.environ
, first addingctx.runenv
(populated by packages/instances, see alsoSetup
) and then theenv
parameter. The combination ofctx.runenv
andenv
is logged to the log file. Any lists of strings in environment values are joined with a ‘:’ separator.If the command exits with a non-zero status code, the corresponding output is logged to the command line and the process is killed with
sys.exit(-1)
.Parameters: - ctx (context.Context) – the configuration context
- cmd (str or typing.Iterable[typing.Any]) – command to run, can be a string or a list of strings like in
subprocess.run()
- allow_error (bool) – avoids calling
sys.exit(-1)
if the command returns an error - silent (bool) – disables output logging (only logs the invocation and environment)
- teeout (bool) – streams command output to
sys.stdout
as well as to the log file - defer (bool) – Do not wait for the command to finish. Similar to
./program &
in Bash. Returns asubprocess.Popen
instance. - env (typing.Mapping[str, typing.Union[str, typing.List[str]]]) – variables to add to the environment
- kwargs (Any) – passed directly to
subprocess.run()
(orsubprocess.Popen
ifdefer==True
)
Returns: a handle to the completed or running process
Return type:
-
infra.util.
qjoin
(args)[source]¶ Join the command-line arguments to a single string to make it safe to pass to paste in a shell. Basically this adds quotes to each element containing spaces (uses
shlex.quote()
). Arguments are stringified bystr
before joining.Parameters: args (typing.Iterable[typing.Any]) – arguments to join Return type: str
-
infra.util.
download
(ctx, url, outfile=None)[source]¶ Download a file (logs to the debug log).
Parameters: - ctx (context.Context) – the configuration context
- url (str) – URL to the file to download
- outfile (str or None) – optional path/filename to download to
Return type:
-
infra.util.
require_program
(ctx, name, error=None)[source]¶ Require a program to be available in
PATH
orctx.runenv.PATH
.Parameters: - ctx (context.Context) – the configuration context
- name (str) – name of required program
- error (str or None) – optional error message
Return type: Raises: FatalError – if program is not found
-
infra.util.
untar
(ctx, tarname, dest=None, *, remove=True, basename=None)[source]¶ TODO: docs
Return type: None
-
class
infra.parallel.
Pool
(logger, parallelmax)[source]¶ A pool is used to run processes in parallel as jobs when
--parallel
is specified on the command line. The pool is created automatically bySetup
and passed toTarget.build()
andTarget.run()
. However, the pool is only passed if the method implementation defines a parameter for the pool, i.e.:class MyTarget(Target): def build(self, ctx, instance, pool): # receives Pool instance ... def run(self, ctx, instance): # does not receive it ...
The maximum number of parallel jobs is controlled by
--parallelmax
. For--parallel=proc
this is simply the number of parallel processes on the current machine. For--parallel=prun
it is the maximum number of simultaneous jobs in the job queue (pending or running).Parameters: - logger (logging.Logger) – logging object for status updates (set to
ctx.log
) - parallelmax (int) – value of
--parallelmax
-
wait_all
(self)[source]¶ Block (busy-wait) until all jobs in the queue have been completed. Called automatically by
Setup
after thebuild
andrun
commands.Return type: None
-
run
(self, ctx, cmd, jobid, outfile, nnodes, onsuccess=None, onerror=None, **kwargs)[source]¶ A non-blocking wrapper for
util.run()
, to be used when--parallel
is specified.Parameters: - ctx (context.Context) – the configuration context
- cmd (str or typing.Iterable[str]) – the command to run
- jobid (str) – a human-readable ID for status reporting
- outfile (str) – full path to target file for command output
- nnodes (int) – number of cores or machines to run the command on
- onsuccess (typing.Callable[[infra.parallel.Job], NoneType] or None) – callback when the job finishes successfully
- onerror (typing.Callable[[infra.parallel.Job], NoneType] or None) – callback when the job exits with (typically I/O) error
- kwargs (Any) – passed directly to
util.run()
Returns: handles to created job processes
Return type: typing.Iterable[infra.parallel.Job]
- logger (logging.Logger) – logging object for status updates (set to
Built-in targets¶
SPEC¶
-
class
infra.targets.
SPEC2006
(source_type, source, patches=[], toolsets=[], nothp=True, force_cpu=0, default_benchmarks=['all_c', 'all_cpp'], reporters=[<class 'infra.packages.tools.RusageCounters'>])[source]¶ The SPEC-CPU2006 benchmarking suite.
Since SPEC may not be redistributed, you need to provide your own copy in
source
. We support the following types forsource_type
:isofile
: ISO file to mount (requiresfuseiso
to be installed)mounted
: mounted/extracted ISO directoryinstalled
: pre-installed SPEC directory in another projecttarfile
: compressed tarfile with ISO contentsgit
: git repo containing extracted ISO
The
--spec2006-benchmarks
command-line argument is added for the build and run commands. It supports full individual benchmark names such as ‘400.perlbench’, and the following benchmark sets defined by SPEC:all_c
: C benchmarksall_cpp
: C++ benchmarksall_fortran
: Fortran benchmarksall_mixed
: C/Fortran benchmarksint
: integer benchmarksfp
: floating-point benchmarks
Mutiple sets and individual benchmarks can be specified, duplicates are removed and the list is sorted automatically. When unspecified, the benchmarks default to
all_c all_cpp
.The following options are added only for the run command:
--benchmarks
: alias for--spec2006-benchmarks
--test
: run the test workload--measuremem
: use an alternative runscript that bypassesrunspec
to measure memory usage--runspec-args
: passed directly torunspec
Parallel builds and runs using the
--parallel
option are supported. Command output will end up in theresults/
directory in that case. Note that even though the parallel job may finish successfully, you still need to check the output for errors manually using thereport
command.The
--iterations
option of the run command is translated into the number of nodes per job when--parallel
is specified, and to--runspec-args -n <iterations>
otherwise.The report command analyzes logs in the results directory and reports the aggregated data in a table. It receives a list of run directories (
results/run.X
) as positional arguments to traverse for log files. By default, the columns list runtimes, memory usages, overheads, standard deviations and iterations. The computed values are appended to each log file with the prefix[setup-report]
, and read from there by subsequent report commands if available (see alsoRusageCounters
). This makes log files portable to different machines without copying over the entire SPEC directory. The script depends on a couple of Python libraries for its output:pip3 install [--user] terminaltables termcolor
Some useful command-line options change what is displayed by
report
:TODO: move some of these from below to general report command docs
--fields
changes which data fields are printed. A column is added for each instance for each field. The options are autocompleted and default to status, overheads, runtime, memory usage, stddevs and iterations. Custom counter fields from runtime libraries can also be specified (but are not autocompleted).--baseline
changes the baseline for overhead computation. By default, the script looks for baseline, clang-lto or clang.--csv
/--tsv
change the output from human-readable to comma/tab-separated for script processing. E.g., use in conjunction withcut
to obtain a column of values.--nodes
adds a (possibly very large) table of runtimes of individual nodes. This is useful for identifying bad nodes on the DAS-5 when some standard deviations are high while using--parallel prun
.--ascii
disables UTF-8 output so that output can be saved to a log file or piped toless
.
Finally, you may specify a list of patches to apply before building. These may be paths to .patch files that will be applied with
patch -p1
, or choices from the following built-in patches:- dealII-stddef Fixes error in dealII compilation on recent compilers
when
ptrdiff_t
is used without includingstddef.h
. (you basically always want this) - asan applies the AddressSanitizer patch, needed to make
-fsanitize=address
work on LLVM. - gcc-init-ptr zero-initializes a pointer on the stack so that type analysis at LTO time does not get confused.
- omnetpp-invalid-ptrcheck fixes a code copy-paste bug in an edge case of a switch statement, where a pointer from a union is used while it is initialized as an int.
Name: spec2006
Parameters: - source_type (str) – see above
- source (str) – where to install spec from
- patches (typing.List[str]) – patches to apply after installing
- toolsets (typing.List[str]) – approved toolsets to add additionally
- nothp (bool) – run without transparent huge pages (they tend to introduce
noise in performance measurements), implies
Nothp
dependency ifTrue
- force_cpu (int) – bind runspec to this cpu core (-1 to disable)
- default_benchmarks (typing.List[str]) – specify benchmarks run by default
-
custom_allocs_flags
= ['-allocs-custom-funcs=Perl_safesysmalloc:malloc:0.Perl_safesyscalloc:calloc:1:0.Perl_safesysrealloc:realloc:1.Perl_safesysfree:free:-1.ggc_alloc:malloc:0.alloc_anon:malloc:1.xmalloc:malloc:0.xcalloc:calloc:1:0.xrealloc:realloc:1']¶ list
Command line arguments for the built-in-allocs
pass; Registers custom allocation function wrappers in SPEC benchmarks.
-
class
infra.targets.
SPEC2017
(source_type, source, patches=[], nothp=True, force_cpu=0, default_benchmarks=['intspeed_pure_c', 'intspeed_pure_cpp', 'fpspeed_pure_c'], reporters=[<class 'infra.packages.tools.RusageCounters'>])[source]¶ The SPEC-CPU2017 benchmarking suite.
Since SPEC may not be redistributed, you need to provide your own copy in
source
. We support the following types forsource_type
:isofile
: ISO file to mount (requiresfuseiso
to be installed)mounted
: mounted/extracted ISO directoryinstalled
: pre-installed SPEC directory in another projecttarfile
: compressed tarfile with ISO contentsgit
: git repo containing extracted ISO
The following options are added only for the run command:
--benchmarks
: alias for--spec2017-benchmarks
--test
: run the test workload--measuremem
: use an alternative runscript that bypassesrunspec
to measure memory usage--runspec-args
: passed directly torunspec
Parallel builds and runs using the
--parallel
option are supported. Command output will end up in theresults/
directory in that case. Note that even though the parallel job may finish successfully, you still need to check the output for errors manually using thereport
command.The
--iterations
option of the run command is translated into the number of nodes per job when--parallel
is specified, and to--runspec-args -n <iterations>
otherwise.The report command analyzes logs in the results directory and reports the aggregated data in a table. It receives a list of run directories (
results/run.X
) as positional arguments to traverse for log files. By default, the columns list runtimes, memory usages, overheads, standard deviations and iterations. The computed values are appended to each log file with the prefix[setup-report]
, and read from there by subsequent report commands if available (see alsoRusageCounters
). This makes log files portable to different machines without copying over the entire SPEC directory. The script depends on a couple of Python libraries for its output:pip3 install [--user] terminaltables termcolor
Some useful command-line options change what is displayed by
report
:TODO: move some of these from below to general report command docs
--fields
changes which data fields are printed. A column is added for each instance for each field. The options are autocompleted and default to status, overheads, runtime, memory usage, stddevs and iterations. Custom counter fields from runtime libraries can also be specified (but are not autocompleted).--baseline
changes the baseline for overhead computation. By default, the script looks for baseline, clang-lto or clang.--csv
/--tsv
change the output from human-readable to comma/tab-separated for script processing. E.g., use in conjunction withcut
to obtain a column of values.--nodes
adds a (possibly very large) table of runtimes of individual nodes. This is useful for identifying bad nodes on the DAS-5 when some standard deviations are high while using--parallel prun
.--ascii
disables UTF-8 output so that output can be saved to a log file or piped toless
.
Name: spec2017
Parameters: - source_type (str) – see above
- source (str) – where to install spec from
- patches (typing.List[str]) – patches to apply after installing
- nothp (bool) – run without transparent huge pages (they tend to introduce
noise in performance measurements), implies
Nothp
dependency ifTrue
- force_cpu (int) – bind runspec to this cpu core (-1 to disable)
- default_benchmarks (typing.List[str]) – specify benchmarks run by default
Web servers¶
-
class
infra.targets.
Nginx
(version, build_flags=[])[source]¶ The Nginx web server.
Name: nginx Parameters: version (str) – which (open source) version to download
Juliet¶
-
class
infra.targets.
Juliet
(mitigation_return_code=None)[source]¶ The Juliet Test Suite for C/C++.
This test suite contains a large amount of programs, categorized by vulnerability type (CWE). Most programs include both a “good” and “bad” version, where the good version should succeed (no bug) whereas the bad version should be detected by the applied mitigation. In other words, the good version tests for false positives, and the bad version for false negatives.
The
--cwe
command-line argument specifies which CWEs to build and/or run, and can be a CWE-ID (416
orCWE416
) or an alias (e.g.,uaf
). A mix of CWE-IDs and aliases is allowed.The Juliet suite contains multiple flow variants per test case. These are different control-flows in the program, that in the end all arrive at the same bug. This is only relevant for static analysis tools, and for run-time mitigations these are unsuitable. In particular, some flow variants (e.g., 12) do not (always) trigger or reach the bug at runtime. Therefore, by default only flow variant 01 is used, but others can be specified with the
--variants
command-line argument.By default, a good test is counted as successful (true negative) if its returncode is 0, and a bad test is counted as successful (true positive) if its returncode is non-zero. The latter behavior can be fine-tuned via the
mitigation_return_code
argument to this class, which can be set to match the returncode of the mitigation.Each test receives a fixed string to stdin. Tests that are based on sockets are currently not supported, as this requires running two tests at the same time (a client and a server).
Tests can be built in parallel (using
--parallel=proc
), since this process might take a while when multiple CWEs or variants are selected. Running tests in parallel is not supported (yet).Name: juliet Parameters: mitigation_return_code (int or None) – Return code the mitigation exits with, to distinguish true positives for the bad version of testcases. If None
, any non-zero value is considered a success.
Built-in instances¶
Clang¶
-
class
infra.instances.
Clang
(llvm, *, optlevel=2, lto=False, alloc='system')[source]¶ Sets
clang
as the compiler. The version of clang used is determined by the LLVM package passed to the constructor.By default, -O2 optimization is set in CFLAGS and CXXFLAGS. This can be customized by setting optlevel to 0/1/2/3/s.
alloc can be system (the default) or tcmalloc. For custom tcmalloc hackery, overwrite the
gperftools
property of this package with a customGperftools
object.Name: clang[-O<optlevel>][-lto][-tcmalloc]
Parameters: - llvm (packages.LLVM) – an LLVM package containing the relevant clang version
- optlevel (int or str) – optimization level for
-O
(default: 2) - lto (bool) – whether to apply link-time optimizations
- alloc (str) – which allocator to use (default: system)
AddressSanitizer¶
-
class
infra.instances.
ASan
(llvm, temporal=True, stack=True, glob=True, check_writes=True, check_reads=True, redzone=None, optlevel=2, lto=False)[source]¶ AddressSanitizer instance. Added
-fsanitize=address
plus any configuration options at compile time and link time, and setsASAN_OPTIONS
at runtime.Runtime options are currently hard-coded to the following:
alloc_dealloc_mismatch=0
detect_odr_violation=0
detect_leaks=0
Name: asan[-heap|-nostack|-noglob][-wo][-lto]
Parameters: - llvm (packages.LLVM) – an LLVM package with compiler-rt included
- stack (bool) – toggle stack instrumentation
- temporal (bool) – toggle temporal safety (False sets quarantine size to 0)
- glob (bool) – toggle globals instrumentation
- check_writes (bool) – toggle checks on stores
- check_reads (bool) – toggle checks on loads
- lto (bool) – perform link-time optimizations
- redzone (int or None) – minimum heap redzone size (default 16, always 32 for stack)
Return type:
Built-in packages¶
LLVM¶
-
class
infra.packages.
LLVM
(version, compiler_rt, commit=None, lld=False, patches=[], build_flags=[])[source]¶ LLVM dependency package. Includes the Clang compiler and optionally compiler-rt (which contains runtime support for ASan).
Supports a number of patches to be passed as arguments, which are
applied
(withpatch -p1
) before building. A patch in the list can either be a full path to a patch file, or the name of a built-in patch. Available built-in patches are:- gold-plugins (for 3.8.0/3.9.1/4.0.0/5.0.0/7.0.0): adds a
-load
option to load passes from a shared object file during link-time optimizations, best used in combination withLLVMPasses
- statsfilter (for 3.8.0/3.9.1/5.0.0/7.0.0): adds
-stats-only
option, which relates to-stats
like-debug-only
relates to-debug
- lto-nodiscard-value-names (for 7.0.0): preserves value names when producing bitcode for LTO, this is very useful when debugging passes
- safestack (for 3.8.0): adds
-fsanitize=safestack
for old LLVM - compiler-rt-typefix (for 4.0.0): fixes a compiler-rt-4.0.0 bug to make
it compile for recent glibc, is applied automatically if
compiler_rt
is set
Identifier: llvm-<version>
Parameters: - version (str) – the full LLVM version to download, like X.Y.Z
- compiler_rt (bool) – whether to enable compiler-rt
- patches (typing.List[str]) – optional patches to apply before building
- build_flags (typing.List[str]) – additional build flags to pass to cmake
-
configure
(self, ctx)[source]¶ Set LLVM toolchain programs in ctx. Should be called from the
configure
method of an instance.Parameters: ctx (context.Context) – the configuration context Return type: None
-
static
add_plugin_flags
(ctx, *flags, gold_passes=True)[source]¶ Helper to pass link-time flags to the LLVM gold plugin. Prefixes all flags with
-Wl,-plugin-opt=
before adding them toctx.ldflags
.Parameters: - ctx (context.Context) – the configuration context
- flags (typing.Iterable[str]) – flags to pass to the gold plugin
Return type:
- gold-plugins (for 3.8.0/3.9.1/4.0.0/5.0.0/7.0.0): adds a
Dependencies¶
-
class
infra.packages.
AutoConf
(version, m4)[source]¶ Identifier: autoconf-<version>
Parameters: - version (str) – version to download
- m4 (packages.M4) – M4 package
-
class
infra.packages.
AutoMake
(version, autoconf, libtool)[source]¶ Identifier: automake-<version>
Parameters: - version (str) – version to download
- autoconf (packages.AutoConf) – autoconf package
- libtool (packages.LibTool or None) – optional libtool package to install .m4 files from
-
class
infra.packages.
Bash
(version)[source]¶ Identifier: bash-<version> Parameters: version (str) – version to download
-
class
infra.packages.
BinUtils
(version, gold=True)[source]¶ Identifier: binutils-<version>[-gold]
Parameters:
-
class
infra.packages.
CMake
(version)[source]¶ Identifier: cmake-<version> Parameters: version (str) – version to download
-
class
infra.packages.
CoreUtils
(version)[source]¶ Identifier: coreutils-<version> Parameters: version (str) – version to download
-
class
infra.packages.
LibElf
(version)[source]¶ Identifier: libelf-<version> Parameters: version (str) – version to download
-
class
infra.packages.
LibTool
(version)[source]¶ Identifier: libtool-<version> Parameters: version (str) – version to download
-
class
infra.packages.
M4
(version)[source]¶ Identifier: m4-<version> Parameters: version (str) – version to download
LLVM passes¶
-
class
infra.packages.
LLVMPasses
(llvm, srcdir, build_suffix, use_builtins, debug=False, gold_passes=True)[source]¶ LLVM passes dependency. Use this to add your own passes as a dependency to your own instances. In your own passes directory, your Makefile should look like this (see the skeleton for an example):
BUILD_SUFFIX = <build_suffix> LLVM_VERSION = <llvm_version> SETUP_SCRIPT = <path_to_setup.py> SUBDIRS = <optional list of subdir names containing passes> include <path_to_infra>/infra/packages/llvm_passes/Makefile
The makefile can be run as-is using
make
in your passes directory during development, without invoking the setup script directly. It creates two shared objects inbuild/packages/llvm-passes-<build_suffix>/install
:libpasses-gold.so
: used to load the passes at link time in Clang. This is the default usage.libpasses-opt.so
: used to run the passes with LLVM’sopt
utility. Can be used in a customized build system or for debugging.
The passes are invoked at link time by a patched LLVM gold plugin. The gold-plugin patch of the
LLVM
package adds an option to load custom passes into the plugin. Passes are invoked by adding their registered names to the flags passed to the LLVM gold plugin by the linker. In other words, by adding-Wl,-plugin-opt=<passname>
toctx.ldflags
in theconfigure
method of your instance. TheLLVM.add_plugin_flags()
helper does exactly that. Before using passes, you must callllvm_passes.configure(ctx)
to load the passes into the plugin. See the skeleton LibcallCount instance for an example.For the pkg-config command of this package, the
--objdir
option points to the build directory.Identifier: llvm-passes-<build_suffix>
Parameters: - llvm (packages.LLVM) – LLVM package to link against
- srcdir (str) – source directory containing your LLVM passes
- build_suffix (str) – identifier for this set of passes
- use_builtins (bool) – whether to include built-in LLVM passes in the shared object
- debug (bool) – enable to compile passes with
-O0 -ggdb
Todo: extend this to support compile-time plugins
-
configure
(self, ctx, *, linktime=True, compiletime=True)[source]¶ Set build/link flags in ctx. Should be called from the
configure
method of an instance.linktime and compiletime can be set to false to avoid loading the pass libraries at link time and at compile time, respectively. Loading passes at link time requires LLVM to be built with the gold-plugin patch.
Parameters: - ctx (context.Context) – the configuration context
- linktime (bool) – are the passes used at link time?
- compiletime (bool) – are the passes used at compile time?
Return type:
-
runtime_cflags
(self, ctx)[source]¶ Returns a list of CFLAGS to pass to a runtime library that depends on features from passes. These set include directories for header includes of built-in pass functionalities such as the
NOINSTRUMENT
macro.Parameters: ctx (context.Context) – the configuration context Return type: typing.Iterable[str]
-
class
infra.packages.
BuiltinLLVMPasses
(llvm, gold_passes=True)[source]¶ Subclass of
LLVMPasses
for built-in passes. Use this if you don’t have any custom passes and just want to use the built-in passes. Configuration happens in the same way as described above: by calling theconfigure()
method.In addition to the shared objects listed above, this package also produces a static library called
libpasses-builtin.a
which is used by theLLVMPasses
to include built-in passes whenuse_builtins
isTrue
.For the pkg-config command of this package, the following options are added in addition to
--root
/--prefix
/--objdir
:--cxxflags
lists compilation flags for custom passes that depend on built-in analysis passes (sets include path for headers).--runtime-cflags
prints the value ofLLVMPasses.runtime_cflags()
.
Identifier: llvm-passes-builtin-<llvm.version> Parameters: llvm (packages.LLVM) – LLVM package to link against
Address space shrinking¶
-
class
infra.packages.
LibShrink
(addrspace_bits, commit='master', debug=False)[source]¶ Dependency package for libshrink.
Libshrink shrinks the application address space to a maximum number of bits. It moves the stack and TLS to a memory region that is within the allowed bitrange, and prelinks all shared libraries as well so that they do not exceed the address space limitations. It also defines a
run_wrapper()
that should be put inctx.target_run_wrapper
by an instance that uses libshrink.Identifier: libshrink-<addrspace_bits>
Parameters: -
configure
(self, ctx, static=True)[source]¶ Set build/link flags in ctx. Should be called from the
configure
method of an instance. Uses post-build hooks, so any target compiled with this libary must implementinfra.Target.binary_paths()
.Parameters: - ctx (context.Context) – the configuration context
- static (bool) – use the static library? (shared library otherwise)
Return type: Raises: NotImplementedError – if static is not
True
(TODO)
-
run_wrapper
(self, ctx)[source]¶ Run wrapper for targets. Links to a script that sets the
rpath
before any libraries are loaded, so that any dependencies of shared libraries loaded by the applications are also loaded from the directory of prelinked libraries (which is created by a post-build hook).Parameters: ctx (context.Context) – the configuration context Return type: str
-
TCmalloc¶
-
class
infra.packages.
Gperftools
(commit, libunwind_version='1.4-rc1', patches=[])[source]¶ Identifier: gperftools-<version>
Parameters: - commit (str) – git branch/commit to check out after cloning
- libunwind_version (str) – libunwind version to use
- patches (typing.List[str]) – optional patches to apply before building
-
configure
(self, ctx)[source]¶ Set build/link flags in ctx. Should be called from the
configure
method of an instance.Sets the necessary
-I/-L/-l
flags, and additionally adds-fno-builtin-{malloc,calloc,realloc,free}
to CFLAGS.Parameters: ctx (context.Context) – the configuration context Return type: None
Tools¶
-
class
infra.packages.
RusageCounters
(*args, **kwargs)[source]¶ Utility library for targets that want to measure resource counters:
- memory (max resident set size)
- page faults
- I/O operations
- context switches
- runtime (esimated by gettimeofday in constructor+destructor)
The target only needs to depend on this package and
configure()
it to link the static library which will then log a reportable result in a destructor. SeeSPEC2006
for a usage example.Identifier: rusage-counters -
classmethod
parse_results
(ctx, path, allow_missing=False)[source]¶ Parse any results containing counters by this package.
Parameters: - ctx (context.Context) – the configuration context
- path (str) – path to file to parse
Returns: counter results
Return type: typing.MutableMapping[str, typing.Union[bool, int, float, str]]
-
configure
(self, ctx)[source]¶ Set build/link flags in ctx. Should be called from the
build
method of a target to link in the static library.Parameters: ctx (context.Context) – the configuration context Return type: None
-
pkg_config_options
(self, ctx)[source]¶ Yield options for the pkg-config command. Each option is an (option, description, value) triple. The defaults are
--root
which returns the root directorybuild/packages/<ident>
, and--prefix
which returns the install directory populated byinstall()
:build/packages/<ident>/install
.When reimplementing this method in a derived package class, it is recommended to end the implementation with
yield from super().pkg_config_options(ctx)
to add the two default options.Parameters: ctx (context.Context) – the configuration context Return type: typing.Iterator[typing.Tuple[str, str, typing.Union[str, typing.Iterable[str]]]]
Apache benchmark (ab)¶
-
class
infra.packages.
ApacheBench
(httpd_version, apr, apr_util)[source]¶ Apache’s
ab
benchmark.Identifier: ab-<version>
Parameters: - httpd_version (str) – httpd version
- apr (packages.APR) – APR package to depend on
- apr_util (packages.APRUtil) – APR utilities package to depend on
Return type:
Dependencies¶
-
class
infra.packages.
APR
(version)[source]¶ The Apache Portable Runtime.
Identifier: apr-<version> Parameters: version (str) – version to download
-
class
infra.packages.
APRUtil
(version, apr)[source]¶ The Apache Portable Runtime utilities.
Identifier: apr-util-<version>
Parameters: - version (str) – version to download
- apr (packages.APR) – APR package to depend on
Return type:
Built-in LLVM passes¶
The framework features a number of useful analysis/transformation passes that you can use in your own instances/passes. The passes are listed below, with the supported LLVM versions in parentheses.
Transform passes¶
-dump-ir
(3.8.0/4.0.0): Dumps the current module IR of the program that is
being linked in human-readable bitcode file with the “.ll” extension. Prints the
location of the created file to stderr
. Optionally, the target filename can
be set by calling DEBUG_MODULE_NAME("myname");
after including
dump-ir-helper.h
from the built-in passes.
-custominline
(3.8.0/4.0.0): Custom inliner for helper functions from
statically linked runtime libraries. Inlines calls to functions that have
__attribute__((always_inline))
and functions whose name starts with
__noinstrument__inline_
.
-defer-global-init
(3.8.0): Changes all global initializers to
zero-initializers and adds a global constructor function that initializes the
globals instead. In combination with -expand-const-global-uses
, this is
useful for instrumenting globals without having to deal with constant
expressions (but only with instructions).
-expand-const-global-uses
(3.8.0): Expands all uses of constant expressions
(ConstExpr
) in functions to equivalent instructions. This limts edge cases
during instrumentation, and can be undone with -instcombine
.
TODO: Combine -defer-global-init
and -expand-const-global-uses
into a
single -expand-constexprs
pass that expands all constant expressions to
instructions.
Analysis passes¶
-sizeof-types
(3.8.0): Finds allocated types for calls to malloc
based
on sizeof
expression in the source code. Must be used in conjunction with
the accompanying compiler wrapper and compile-time pass. See header file
for usage.
Utility headers¶
Utilities to be used in custom LLVM pass implementations. These require
use_builtins=True
to be passed to infra.packages.LLVM
. See the
source code
for a complete reference.
builtin/Common.h
(3.8.0/4.0.0): Includes a bunch of much-used LLVM headers and
defines some helper functions.
builtin/Allocation.h
(3.8.0/4.0.0): Helpers to populate an AllocationSite
struct with standardized information about any stack/heap allocations.
TODO: rewrite builtin/Allocation.h
to an -allocs
analysis pass.
builtin/CustomFunctionPass.h
(3.8.0/4.0.0): Defines the CustomFunctionPass
class which serves as a drop-in replacement for LLVM’s FunctionPass
, but
really is a ModulePass
. This is necessary because the link-time passes
plugin does not support function passes because of things and reasons.
SPEC CPU benchmarking¶
SPEC benchmarking 101¶
The SPEC CPU benchmarking suites contain a number of C, C++ and Fortran benchmarks. Each benchmark is based on an existing, real-world, program (e.g., the Perl interpreter, the GCC compiler, etc), and has different characteristics. Some programs might be very CPU/FPU intensive, some might be memory intensive, and so on. It is widely used for paper evaluations because of this.
The latest version is SPEC CPU2017, although SPEC CPU2006 is also still in wide use (partly to compare against older systems and papers). SPEC CPU2000 has mostly fallen out of use except for comparing against very old papers, and CPU95/CPU92 are not used at all anymore. The infra currently supports SPEC CPU2006 and CPU2017. The concepts are mostly the same between these two, most information here is applicable to both versions unless otherwise stated. This guide will refer to both as “SPEC” for convenience.
Benchmarks in each SPEC version are often grouped in several (overlapping) sets. For example, CPU2006 has the CINT and CFP sets (for integer and floating-point respectively), but also had sets like all_c, all_cpp, all_fortan and all_mixed (grouping the benchmark per language). When running and reporting SPEC results, you should pick a suitable/established set, and you should not cherry-pick or leave out certain benchmarks. Typically, you’ll want to run the full suite, although running only CINT or CFP is acceptable in some cases. However, Fortran support is currently still lacking in compilers such as LLVM. Therefore, most papers omit (pure or mixed) Fortran benchmarks. For CPU2006, running all C and C++ benchmarks (19 in total) is the most common configuration, and the default for the infra.
Adding to the infra¶
While the infra contains SPEC targets, it does not include SPEC itself, as it is a commercial product that we are not allowed to redistribute. Therefore, step one is to acquire a copy of SPEC and point the infra to this.
Note
If you are a student in VUSec, you should contact your supervisor for access to a copy of SPEC.
The infra supports several different formats of the SPEC installation: the raw .iso file, an extracted version of the .iso file, or a manually installed version. A single SPEC installation can be used between different infra instances, and generally has the preference:
mkdir speciso
sudo mount -o loop spec2006.iso speciso
cd speciso
./install.sh -f -d /path/to/install/spec2006 # E.g., /home/$USER/spec2006
Then, open setup.py and add at bottom (but before setup.main()):
setup.add_target(infra.targets.SPEC2006(
source = '/path/to/spec',
source_type = 'installed'
))
If you use any other type of source_type
(isofile
, mounted
,
tarfile
, git
), the infra will install spec for you inside its own build
directory.
Building and running¶
You can build and run SPEC in the infra like any other target, e.g.:
./setup.py run spec2006 baseline deltapointers --build
However, some special flags that are relevant here:
--benchmark BENCHMARK[,BENCHMARK]
- This option allows you to run only a subset of the benchmarks. This option is especially useful when testing or debugging a single benchmark. E.g.,
--benchmark 400.perlbench
--test
- SPEC comes with multiple “input sets” – inputs that are fed into the aforementioned benchmark programs. By default it uses the “ref” set, which are pretty big and run for a long time. With the
--test
option it instead uses the “test” input set, which consist of smaller inputs, so all of SPEC can run within a minute. This cannot be used for benchmarks, but is useful for checking if everything is OK before starting a full ref run. Note that a system might work on one input set fine, but not on the other, because one input set might stress different part of the programs. One common example is the test set of400.perlbench
, which is the only SPEC benchmark that executes afork()
.
--iterations=N
- To reduce noise when benchmarking, you want to do multiple runs of each benchmark to take the median runtime. On most systems 3 or 5 runs are sufficient, but if high standard deviations are observed more are required.
--parallel=proc --parallelmax=1
- By passing
--parallel=<something>
the infra will produce/process the output of SPEC. Here,--parallel=proc
means run it as processes on the local machine (instead of distributing the jobs over a cluster or remote machines). The--parallelmax=1
means only one benchmark runs at a time, so they don’t interfere with each other. For testing runs, where you don’t care about measuring performance, you can set--parallelmax
to your CPU count for example.
So overall, for running full spec and measure overhead, you’d use:
./setup.py run spec2006 baseline --iterations=3 --parallel=proc --parallelmax=1
This will produce a new directory in the results/
directory. To keep track
of different runs, it’s convenient to rename these directories manually after
it’s done (e.g., from results/run-2020-04-16.10-15-55
to
results/baseline
).
Note
You need to pass the --parallel=proc
argument to actually generate
results that can be reported.
Parsing the results¶
The infra can produce tables of the results for you with the normal report command:
./setup.py report spec2006 results/baseline -f runtime:median:stdev_percent
The thing at the end means “give me the median and standard deviation of the
runtimes per benchmark”. You can similarly do -f maxrss:median
to print the
memory overhead. You can give it multiple result directories. If you pass in
--overhead baseline
it will calculate everything as normalized overheads
relative to the baseline instance.
SPEC CPU2017¶
SPEC CPU2017 comes with two distinct sets of benchmarks: the speed and the
rate suites. The speed set is similar to older versions of SPEC, where a
single benchmark is started and its execution time is measured. The new rate
metric, on the other hand, launches multiple binaries at the same time (matching
the number of CPU cores) and measures throughput. More information is available
in the SPEC documentation. Each of these to sets
as its own list of benchmark programs: speed benchmarks start with 6xx
,
whereas rate benchmarks start with 5xx
.
Typically we only use the speed set for our papers.
Running on a cluster¶
Note
The following information is specific to the DAS clusters offered by dutch universities,
although it can be used on any cluster that uses prun
to issue jobs to
nodes. The DAS clusters can generally be used by any (BSc, MSc or PhD)
student at the VU, LU, UvA, and TUD.
On a cluster, it is possible to run multiple SPEC benchmarks in parallel for
much faster end-to-end benchmarking. The infra has full support for clusters
that utilize the prun
command to issue jobs, as described on the usage
page. For running SPEC we recommend the DAS-5 over the DAS-6
cluster, as it features more nodes (instead of fewer more powerful nodes).
You will first need to request an account. When doing so as a student, you should mention your supervisor.
Some additional notes on using the DAS cluster:
- Your homedir is limited is space, so use
/var/scratch/$USER
instead (for both the infra and the spec install dir).- Use
--parallel=prun
. You can omit--parallelmax
, since defaults to 64 to match DAS-5 cluster.- By default jobs are killed after 15min. This is usually fine (baseline longest benchmark,
464.h264ref
, takes 8 minutes) but if you have a super slow defense it might exceed it. For those cases, you can outside office hours use--prun-opts="-asocial -t 30:00"
- The results on the DAS-5 are much noisier since we cannot control things like CPU frequency scaling. Therefore you should do 11 iterations (instead of 5) and take median. Do also take note of the stddev: if that’s crazy high it might indicate some defective nodes. Contact the DAS sysadmin or your supervisor if that’s becoming a serious problem, since a reboot fixes these issues. Note that we have scripts to find these defective nodes based on benchmarking results.
So overall, in most cases you’d simply use something like:
./setup.py run spec2006 baseline asan --iterations=11 --parallel=prun
Debugging¶
When debugging issues with a particular instance, it is often required to run a
SPEC benchmark under a debugger such as GDB. The infra itself launches spec
benchmarks via the specrun
command, which in turn invokes the binary of the
particular benchmark several times with different command line arguments. For
example, 400.perlbench
runs the perl binary several times with different
perl scripts. In this example we use 400.perlbench
from CPU2006, but this
procedure is the same for any benchmarks of any SPEC version.
To run one of these tests manually with gdb, we bypass both the infra and
specrun
. To determine the correct command line arguments for the benchmark
(and to set up the relevant input files), the first step is to run the
particular benchmark via the infra normally (see above). This will set up the
correct run directory, for example in
$SPEC/benchspec/CPU2006/400.perlbench/run/run_base_ref_infra-baseline.0000
,
where the last directory name depends on the instance (here baseline
) and
input set (ref
or test
).
Inside this directory should be a speccmds.cmd
file, which contains the run
environment and arguments of the binary, and is normally parsed by
specinvoke
. Lines starting with -E
and -C
define the environment
variables and working directory, respectively, and can be ignored. The lines
starting with -o
define the actual runs of the binary, and might for example
look like:
-o checkspam.2500.5.25.11.150.1.1.1.1.out -e checkspam.2500.5.25.11.150.1.1.1.1.err ../run_base_ref_infra-baseline.0000/perlbench_base.infra-baseline -I./lib checkspam.pl 2500 5 25 11 150 1 1 1 1
The first two bits (-o
and -e
) tell specinvoke
where to redirect
stdout/stderr, and we don’t need. Then comes the binary (including a relative
path into the current directory), and thus is perlbench_base.infra-baseline
in our case. After that follow all actual arguments, which we need to pass.
If we want to run this under gdb, we can thus call is as follows:
gdb --args perlbench_base.infra-baseline -I./lib checkspam.pl 2500 5 25 11 150 1 1 1 1
Webserver benchmarking¶
The infra has built-in support for benchmarking webserver applications like
Nginx. In such setups, the infra runs an instrumented version of the server
application, and then runs a (uninstrumented) client program to benchmark the
performance of the server (typically, wrk
).
The setup to run webserver benchmarks, however, is more complicated than it is for targets like SPEC. In particular, two machines are required (one for the server and one for the client), with a fast network connection between them (e.g., 40 Gbit). The key goal of webserver benchmarks is to reach CPU saturation already on the baseline. If saturation is not reached, any measured overhead is practically meaningless (since it’s hidden by the spare CPU cycles). While far from ideal, it is preferable to use a loopback setup (running client and server on a single machine, dividing the cores evenly) rather than use a setup where no saturation is reached (e.g., 1 Gbit connection).
For benchmarks, the saturation/peak performance point should be determined for the baseline, and that point is then used to measure the overhead (both in throughput decrease and latency increase). To do so, we typically construct a graph as shown below. This increases the pressure of the client by increasing its number of connection (X-axis), and measures both the throughput (in requests/second) and CPU utilization. In this graph, we see a peak at 256 connections, at which point the throughput overhead for “DangZero” is 18% (623 kReqs/s -> 516 kReqs/s). Not shown in this graph is the latency: that should be measured at the same saturation point, and reported separately in a table (as percentiles, e.g., 99th percentile).

The infra has several options for running this setup automatically on separate
machines. The recommended way is to use the SSH method (using
--parallel=ssh
). This guide follows this method. Note that this setup can
use localhost as an SSH target, meaning one (or even both, for loopback
experiments) of the nodes can be the same as the one running the infra.
This whole process currently requires a lot of arguments to setup.py
. Below,
we show a script that provides good defaults for most arguments.
#!/bin/bash
set -euo pipefail
servers="nginx"
instances="baseline dangzero"
# Sweep over connection count, decreasing in density as we go higher
connections="`seq 16 16 256` `seq 256 128 1024` `seq 1024 256 1536`"
# SSH names - used as `ssh <host>`, so can be a host config the SSH config
sshclient=father
sshserver=son
# Local hosts - how to connect to each node via TCP
hostclient=localhost
hostserver=192.168.0.10
# Benchmark host (100G NIC) - how the client connects to server
serverip=10.0.0.10
serverport=20000
iterations=3 # Repeat experiments a few times
filesize=64 # Data per request, in bytes
duration=30 # Time per experiment in seconds
wait_time=1 # Time to wait between experiments
client_threads=`nproc` # Threads - should always be max, i.e., nproc
server_workers=`nproc` # Worker processes on server - should be max
server_worker_connections=1024 # Max connections per worker - do not change
# Statistics to collect of server
stats="cpu rss" # Space-separated list of {cpu, cpu-proc, rss, vms}
stats_interval=1 # Time between measurements, in seconds
for server in $servers; do
python3 ./setup.py run $server $instances \
-t bench \
--parallel=ssh \
--ssh-nodes $sshclient $sshserver \
--remote-client-host $hostclient \
--remote-server-host $hostserver \
--server-ip $serverip \
--port $serverport \
--duration $duration \
--threads $client_threads \
--iterations $iterations \
--workers $server_workers \
--worker-connections $server_worker_connections \
--filesize $filesize \
--collect-stats $stats \
--collect-stats-interval $stats_interval \
--connections $connections \
--restart-server-between-runs
done
Options you may want to have a look at:
connections
should cover a range so that you can observe the growth to saturation, and after the peak point a drop-off in throughput (with lower number more densely sampled).iterations
repeats each experiment N times to reduce noise. A value of 3 or 5 is recommended, unless high standard deviations are observed.filesize
is the size of the file that the benchmark retrieves. Higher values put more pressure on the network link without increasing CPU pressure, and thus lower values are recommended for CPU saturation.duration
is the length of each experiment in seconds. Normally 30 second runs are fine, but if you are benchmarking something with increased memory pressure over time you may need to run longer benchmarks (e.g., 10 minutes).
Finally, there are the SSH, host and server IP settings which require some explanation:
- The
sshclient
andsshserver
describe how the setup.py script can reach the machines running the client (wrk
) and server (the webserver). These are SSH hostnames, and can be an IP or a hostname from the.ssh/config
file.- The setup.py script spawns a python script (
remoterunner.py
) on both the client and server machines via SSH. After that it connects to these scripts via TCP directly, andhostclient
andhostserver
describe the IP addresses of how to connect to these. If you used IP addresses for the SSH client/server fields, these fields probably hold the same values.- Finally, once the benchmark starts the client machine will run run
wrk
against the webserver on the host. The IP address that the client machine uses to connect to the server machine is configured viaserverip
. This might be the same IP ashostserver
, but it might also be different: for the SSH and host fields these connections can go over any link (localhost, built-in 1 Gbit NIC, QEMU virtual NIC, etc). For theserverip
field, however, the IP associated to the fast NIC (e.g., 40 or 100 Gbit) should be used to ensure CPU saturation.
The setup.py script can run on one of the two machines (client or server): in
the example above, the setup.py script runs on the client machine (the one that
will also run wrk
). It furthermore assumes the father
(client) and
son
(server) hosts are in .ssh/config
and can be used without a
passphrase (e.g., via an SSH agent). The machines are in a LAN in the
192.168.0.0/24
range, whereas the 100 Gbit NICs use the 10.0.0.0/24
range. This is configured manually via:
father $ ifconfig ens4 10.0.0.20 up
son $ ifconfig ens4 10.0.0.10 up
Finally, the infra can collect statistics during the execution of each test on the server. One of these statistics is the CPU usage, which is used to ensure saturation was reached. These statistics can be sampled every N seconds, and the following are supported:
cpu
: total CPU load of the system.cpu-proc
: CPU load per process.rss
: RSS (resident set size) of the server. I.e., physical memory usage.vms
: VMS (virtual memory size) of the server.
In a VM¶
Some mitigations, especially those featuring kernel or hypervisor modifications, require running the target webserver in a VM. Running benchmarks in a VM is fine, but care has to be taken to ensure a proper setup.
As a basis for any reasonable benchmark, the VM should be hardware accelerated (e.g., using KVM with Intel VMX or AMD-V), with sufficient memory and CPU cores assigned. Additionally, a VM may optionally be backed by hugepages.
As with the experiments on bare-metal (as described above), the VM also needs
direct access to a fast NIC. Using something like virtio
is, in our
experience, not fast enough. Instead, a fast NIC should be directly
assigned to the VM. This can be achieved through either SR-IOV (for devices
that support virtualization and assigning part of it to a VM), or full PCI
passthrough of the device. For this guide, we assume the latter as it is more
generically applicable.
Enabling IOMMU¶
Passing the NIC to the guest requires an IOMMU to be enabled in the system. For
this, ensure the IOMMU (VT-d or AMD-Vi) is enabled in the BIOS settings. Add
intel_iommu=on
or amd_iommu=on
to the kernel boot parameters (e.g., by
modifying GRUB_CMDLINE_LINUX_DEFAULT
in /etc/default/grub
and then
running update-grub
).
After this, running dmesg
after boot should show messages related to
IOMMU/DMAR being enabled.
Next we need to check the IOMMU groups. It is only possible to pass a whole VM
group to a VM, not only part of its devices. First ensure
/sys/kernel/iommu_groups/
exists and has a few directories. Then, run the
following command in your terminal:
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
echo "IOMMU Group ${g##*/}:"
for d in $g/devices/*; do
echo -e "\t$(lspci -nns ${d##*/})"
done;
done;
If the NIC does not have its own IOMMU group, try plugging it into a different slot on the main board. Typically, the “primary” or first slot of a mainboard has its own IOMMU group at least.
VFIO¶
To assign the device to the VM, we need to unbind its original driver (e.g.,
mlx5_core
for Mellanox cards), and bind it to the vfio-pci
driver.
First, find the BDF (bus:device.function, basically the physical slot of the PCI card) and vendor:device pair of the card:
$ lspci -nn
...
b3:00.0 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]
...
We can see here that its BDF is b3:00.0
(in full form, 0000:b3:00.0
, and
the vendor:device pair is 15b3:1013
.
Now, check which driver is in use for this device:
$ lspci -d 15b3:1013 -k
b3:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
Subsystem: Mellanox Technologies MT27700 Family [ConnectX-4]
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
Which is the mlx5_core
Mellanox driver. We need to unbind this driver from
the card:
echo 0000:b3:00.0 | sudo tee /sys/bus/pci/drivers/mlx5_core/unbind
Then, allow vfio-pci
to bind to this device:
echo 15b3 1013 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
When running lspci -d 15b3:1013 -k
again, it should report Kernel driver
in use: vfio-pci
. If this is not already the case, execute the following
command to force the binding:
echo 0000:b3:00.0 | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
QEMU¶
To pass the device to the VM, we add the -device vfio-pci,host=<BDF>
option
to qemu:
sudo qemu-system-x86_64 -m 8G -enable-kvm -cpu host -device vfio-pci,host=b3:00.0 -nographic -serial mon:stdio debian.img
We run this with sudo
, otherwise we get errors about mapping memory and
such.
Inside the VM, we should see the card show up like it did on the host before:
vm $ lspci -d 15b3:1013 -k
00:04.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
Subsystem: Mellanox Technologies MT27700 Family [ConnectX-4]
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
Note it now has the same vendor:device identifier, but a different BDF
(00:04.0
). We can now check which network interface is associated with this
NIC:
vm $ ls /sys/bus/pci/devices/0000\:00\:04.0/net/
ens2
Which we can then configure as normal:
vm $ ifconfig ens2 10.0.0.10 up
Hugepage backing for VM¶
Forcing hugepage backing for the VM is not required: in most cases we have noticed no significant effect for webserver applications. However, it might be required if the instrumentation of the target increases memory or TLB pressure a lot. In this case, you might notice significant performance differences between runs, depending on when the THP (transparent huge pages) on the host kick in.
You can follow the guide from RedHat: https://access.redhat.com/solutions/36741
When using QEMU directly instead of libvirt, add the following command line options (instead of the modifications to guest.xml):
-mem-prealloc
-mem-path /hugepages/libvirt/qemu