About

instrumentation-infra is an infrastructure for program instrumentation. It builds benchmark programs with custom instrumentation flags (e.g., LLVM passes) and runs them. The design is modular, so it is designed to be extended by users.

Overview

The infrastructure uses three high-level concepts to specify benchmarks and build flags:

  1. A target is a benchmark program (or a collection of programs) that is to be instrumented. An example is SPEC-CPU2006.
  2. An instance specifies how to build a target. An example is infra.instances.Clang which builds targets using the Clang compiler. For SPEC2006, one of the resulting binaries would be called 400.perlbench-clang.
  3. Targets and instances can specify dependencies in the form of packages, which are built automatically before the target is built.

The infrastructure provides a number of common targets and their dependencies as packages. It also defines baseline instances for LLVM, along with packages for its build dependencies. There are some utility passes and a source patch for LLVM that lets you develop instrumentation passes in a shared object, without having to link them into the compiler after every rebuild.

A typical use case is a programmer that has implemented some security feature in an LLVM pass, and wants to apply this pass to real-world benchmarks to measure its performance impact. They would create an instance that adds the relevant arguments to CFLAGS, create a setup script that registers this instance in the infrastructure, and run the setup script with the build and run commands to quickly see if things work on the builtin targets (e.g., SPEC).

Getting started

The easiest way to get started with the framework is to clone and adapt our skeleton repository which creates an example target and instrumentation instance. Consult the API docs for extensive documentation on the functions used. Read the usage guide to find our how to set up your own project otherwise, and for examples of how to invoke build and run commands.

Usage

instrumentation-infra is meant to be used as a submodule in a git repository. To use it, you must create a setup script. The setup script specifies which targets and instances are used by the current project, including any custom targets and instances. An example can be found in our skeleton repository here. The setup script (which we will call setup.py from now on) is an executable Python script that calls Setup.main(). The script has a number of subcommands of which the basic usage is discussed below. Each subcommand has an extensive --help option that shows all of its knobs and bells.

Installing dependencies

The infrastructure’s only hard dependency is Python 3.5. If you intend to use LLVM, however, there are some build dependencies. This is what you need for LLVM on a fresh Ubuntu 16.04 installation:

sudo apt-get install bison build-essential gettext git pkg-config python ssh

For nicer command-line usage, install the following Python packages (optional):

pip3 install --user coloredlogs argcomplete
# OR, in user space (add to ~/.bashrc):
sudo pip3 install coloredlogs argcomplete

argcomplete enables command-line argument completion, but it needs to be activated first (optional):

# in user space (add to ~/.bashrc, works for files called "setup.py"):
eval "$(register-python-argcomplete --complete-arguments -o nospace -o default -- setup.py)"
# OR, use global activation (only needed once, works for any file/user):
sudo activate-global-python-argcomplete --complete-arguments -o nospace -o default

Note: if you’re using zsh you first need to load and run bashcompinit as shown here.

Cloning the framework in your project

First add the infrastructure as a git submodule. This creates a .gitmodules file that you should commit:

git submodule add -b master git@github.com:vusec/instrumentation-infra.git infra
git add infra .gitmodules
git commit -m "Clone instrumentation infrastructure"

Next, create a setup script (recommended name setup.py) in your project root that invokes the infrastructure’s main function. Consult the skeleton example and API docs for this step.

Finally, write any target, instance and package definitions needed or your project so that you can use them in the commands below.

The build and pkg-build commands

./setup.py build TARGET INSTANCE ... [-j JOBS] [--iterations=N] [<target-options>]
./setup.py pkg-build PACKAGE [-j JOBS]

build builds one or more instances of a target program. Only registered targets/instances are valid. The API docs explain how to register them. Each target and instance specifies which packages it depends on. For example, an instance that runs LLVM passes depends on LLVM, which in turn depends on some libraries depending on the version used. Before building a target programs, build lists its dependencies, downloads and builds them, and adds their installation directories to the PATH. All generated build files are put in the build/ directory in the root of your project.

Each package specifies a simple test for the setup script to see if it has already been built (e.g., it checks if install/bin/<binary> exists). If so, the build is skipped. This avoids having to run make all the time for each dependency, but sometimes you do want to force-run make, for example while debugging a custom package, or when you hackfixed the source code of a package. In this case, you can use --force-rebuild-deps to skip the checks and rebuild everything, and optionally --clean to first remove all generated files the target (this behaves as if you just cloned the project, use it with care).

The -j option is forwarded to make commands, allowing parallel builds of object files. It defaults to the number of cores available on the machine, with a maximum of 16 (but you can manually set it to larger values if you think enough RAM is available).

pkg-build builds a single package and its dependencies. It is useful for debugging new packages or force-building a patched dependency.

The clean command

./setup.py clean [--targets TARGET ...] [--packages PACKAGE ...]

clean removes all generated files for a target program or package. This is the opposite of build. You can overwrite the behavior for your own targets and packages (see the API docs), but by default it removes the entire build/{targets,packages}/<name> directory.

clean is particularly useful for cleaning build files of a custom package, such as a runtime library with source code embedded in your project, before running build on a target that depends on the runtime library.

The run command

./setup.py run TARGET INSTANCE ... [--build] [--iterations=N] [<target-options>]

run runs one or more instances of a single target program. When --build is passed, it first runs the build command for that target. Valid values for <target-options> differ per target, the API docs explain how to add options for your own targets.

The example below builds and runs the test workload of 401.bzip2 from the SPEC2006 suite, both compiled with Clang but with link-time optimizations disabled and disabled respectively:

./setup.py run --build spec2006 clang clang-lto --test --benchmarks 401.bzip2

The --iterations option specifies the number of times to run the target, to be able to compute a median and standard deviation for the runtime.

Parallel builds and runs

build and run both have the --parallel option that divides the workload over multiple cores or machines. The amount of parallelism is controlled with --parallelmax=N. There are two types:

  • --parallel=proc spawns jobs as processes on the current machine. N is the number of parallel processes running at any given time, and defaults to the number of cores. This is particularly useful for local development of link-time passes where single-threaded linking is the bottleneck. Do use this in conjunction with -j to limit the amount of forked processes per job.
  • --parallel=prun schedules jobs as prun jobs on different machines on the DAS-5 cluster. Here N indicates the maximum number of node reservations of simultaneously scheduled jobs (both running and pending), defaulting to 64 (tailored to the VU cluster). Additional options such as job time can be passed directly to prun using --prun-opts.

The example below builds and runs the C/C++ subset of SPEC2006 with the test workload, in order to test if the myinst instance breaks anything. The machine has 8 cores, so we limit the number of parallel program builds to 8 (which is also the default) and limit the number of build processes per program using -j 2 to avoid excessive context switching:

./setup.py run --build --parallel proc --parallelmax 8 -j 2 \
    spec2006 myinst --test --benchmarks all_c all_cpp

The report command

./setup.py report TARGET RUNDIRS -i INSTANCE ... [--field FIELD:AGGREGATION ...] [--overhead BASELINE]
./setup.py report TARGET RUNDIRS -i INSTANCE --raw
./setup.py report TARGET RUNDIRS --help-fields

report displays a table with benchmark results for the specified target, gathered from a given list of run directories that have been populated by a (parallel) run invocation. Each target defines a number of reportable fields that are measured during benchmarks, which are listed by --help-fields.

The report aggregates results by default, grouping them on the default field set by infra.Target.aggregation_field. This can be overridden using the --groupby option. The user must specify an aggregation function for each reported field in the -f|--field option. For instance, suppose we ran the clang and myinst instances of the spec2006 target and want to report the results. First we report the mean runtime and standard deviation to see if the result (“count” shows the number of results):

./setup.py report spec2006 results/run.* -f runtime:count:mean:stdev_percent

Let’s assume the standard deviations are low and the runtimes look believable, so we want to compute the overhead the runtime+memory overheads of the instrumentation added in the myinst instance, compared to the clang instance:

./setup.py report spec2006 results/run.* -i myinst -f runtime:median maxrss:median --overhead clang

Alternatively, the --raw option makes the command output all results without aggregation. This can be useful when creating scatter plots, for example:

./setup.py report spec2006 results/run.* -i myinst -f benchmark runtime maxrss --raw

The config command

./setup.py config --targets
./setup.py config --instances
./setup.py config --packages

config prints information about the setup configuration, such as the registered targets, instances and packages (the union of all registered dependencies).

The pkg-config command

./setup.py pkg-config PACKAGE <package-options>

pkg-config prints information about a single package, such as its installation prefix or, in the case of a library package, the CFLAGS needed to compile a program that uses the library. Each package can define its own options here (see API docs), but there are two defaults:

  • --root returns build/packages/<package>.
  • --prefix returns build/packages/<package>/install.

pkg-config is intended to be used build systems of targets that need to call into the setup script from a different process than the ./setup.py build ... invocation. For example, our skeleton repository uses this to make the Makefile for its LLVM passes stand-alone, allowing developers to run make directly in the llvm-passes/ directory rather than ../setup.py build --packages llvm-passes-skeleton.

API documentation

Setup

class infra.Setup(setup_path)[source]

Defines framework commands.

The setup takes care of complicated things like command-line parsing, logging, parallelism, environment setup and generating build paths. You should only need to use the methods documented here. To use the setup, you must first populate it with targets and instances using add_target() and add_instance(), and then call main() to run the command issued in the command-line arguments:

setup = infra.Setup(__file__)
setup.add_instance(MyAwesomeInstance())
setup.add_target(MyBeautifulTarget())
setup.main()

main() creates a context that it passes to methods of targets/instances/packages. You can see it being used as ctx by many API methods below. The context contains setup configuration data, such as absolute build paths, and environment variables for build/run commands, such as which compiler and CFLAGS to use to build the current target. Your own targets and instances should read/write to the context.

The job of an instance is to manipulate the the context such that a target is built in the desired way. This manipulation happens in predefined API methods which you must overwrite (see below). Hence, these methods receive the context as a parameter.

Parameters:setup_path (str) – Path to the script running Setup.main(). Needed to allow build scripts to call back into the setup script for build hooks.
add_command(self, command)[source]

Register a setup command.

Parameters:command (Command) – The command to register.
Return type:None
add_instance(self, instance)[source]

Register an instance. Only registered instances can be referenced in commands, so also built-in instances must be registered.

Parameters:instance (Instance) – The instance to register.
Return type:None
add_target(self, target)[source]

Register a target. Only registered targets can be referenced in commands, so also built-in targets must be registered.

Parameters:target (Target) – The target to register.
Return type:None
main(self)[source]

Run the configured setup:

  1. Parse command-line arguments.
  2. Create build directories and log files.
  3. Run the issued command.
Return type:None

Context

class infra.context.Context(paths, log, loglevel=0, args=<factory>, hooks=<factory>, runenv=<factory>, starttime=<factory>, target_run_wrapper='', runlog_file=None, runtee=None, jobs=8, arch='unknown', cc='cc', cxx='cxx', fc='fc', ar='ar', nm='nm', ranlib='ranlib', cflags=<factory>, cxxflags=<factory>, fcflags=<factory>, ldflags=<factory>, lib_ldflags=<factory>)[source]

The global configuration context, used by all targets, instances, etc.

For example, an instance can configure its compiler flags in this context, which are then used by targets.

Return type:None
paths = None

Absolute paths to be used (readonly) throughout the framework.

Type: context.ContextPaths

log = None

The logging object used for status updates.

Type: logging.Logger

loglevel = 0

The logging level as requested by the user.

Note that is differs from the logging object’s log level, since all debug output is written to a file regardless of the requested loglevel.

Type: int

args = None

Populated with processed command-line arguments. Targets and instances can add additional command-line arguments, which can be accessed through this object.

Type: argparse.Namespace

hooks = None

An object with hooks for various points in the building/running process.

Type: context.ContextHooks

runenv = None

Environment variables that are used when running a target.

Type: typing.Dict[str, typing.Union[str, typing.List[str]]]

starttime = None

When the current run of the infra was started.

Type: datetime.datetime

target_run_wrapper = ''

Command(s) to prepend in front of the target’s run command (executed directly on the command line). This can be set to a custom shell script, or for example perf or valgrind.

Type: str

runlog_file = None

File object used for writing all executed commands, if enabled.

Type: _io.TextIOWrapper or None

runtee = None

Object used to redirect the output of executed commands to a file and stdout.

Type: io.IOBase or None

jobs = 8

The amount of parallel jobs to use. Contains the value of the -j command-line option, defaulting to the number of CPU cores returned by multiprocessing.cpu_count().

Type: int

arch = 'unknown'

Architecture to build targets for. Initialized to platform.machine(). Valid values include x86_64 and arm64/aarch64; for more, refer to uname -m and platform.machine().

Type: str

cc = 'cc'

C compiler to use when building targets.

Type: str

cxx = 'cxx'

C++ compiler to use for building targets.

Type: str

fc = 'fc'

Fortran compiler to use for building targets.

Type: str

ar = 'ar'

Command for creating static library archives.

Type: str

nm = 'nm'

Command to read an object’s symbols.

Type: str

ranlib = 'ranlib'

Command to generate the index of an archive.

Type: str

cflags = None

C compilation flags to use when building targets.

Type: typing.List[str]

cxxflags = None

C++ compilation flags to use when building targets.

Type: typing.List[str]

fcflags = None

Fortran compilation flags to use when building targets.

Type: typing.List[str]

ldflags = None

Linker flags to use when building targets.

Type: typing.List[str]

lib_ldflags = None

Special set of linker flags set by some packages, and is passed when linking target libraries that will later be (statically) linked into the binary.

In practice it is either empty or ['-flto'] when compiling with LLVM.

Type: typing.List[str]

copy(self)[source]

Make a partial deepcopy of this Context, copying only fields of type ContextPaths|list|dict.

Return type:Context
class infra.context.ContextPaths(infra, setup, workdir)[source]

Absolute, read-only, paths used throughout the infra.

Normally instances, targets, and packages do not need to consult these pathsdirectly, but instead use their respective path method.

Return type:None
infra = None

Root dir of the infra itself.

Type: str

setup = None

Path to the user’s script that invoked the infra.

Type: str

workdir = None

Working directory when the infra was started.

Type: str

root

Root directory, that contains the user’s script invoking the infra.

buildroot

Build directory.

log

Directory containing all logs.

debuglog

Path to the debug log.

runlog

Path to the log of all executed commands.

packages

Build directory for packages.

targets

Build directory for targets.

pool_results

Directory containing all results of running targets.

class infra.context.ContextHooks(pre_build=<factory>, post_build=<factory>)[source]

Hooks (i.e., functions) that are executed at various stages during the building and running of targets.

Return type:None
pre_build = None

Hooks to execute before building a target.

Type: typing.List[typing.Callable]

post_build = None

Hooks to execute after a target is built.

This can be used to do additional post-processing on the generated binaries.

Type: typing.List[typing.Callable]

Targets

class infra.Target(*args, **kwargs)[source]

Abstract base class for target definitions. Built-in derived classes are listed here.

Each target must define a name attribute that is used to reference the target on the command line. The name must be unique among all registered targets. Each target must also implement a number of methods that are called by Setup when running commands.

The build command follows the following steps for each target:

  1. It calls add_build_args() to include any custom command-line arguments for this target, and then parses the command-line arguments.
  2. It calls is_fetched() to see if the source code for this target has been downloaded yet.
  3. If is_fetched() == False, it calls fetch().
  4. It calls Instance.configure() on the instance that will be passed to build().
  5. All packages listed by dependencies() are built and installed into the environment (i.e., PATH and such are set).
  6. It calls build() to build the target binaries.
  7. If any post-build hooks are installed by the current instance, it calls binary_paths() to get paths to all built binaries. These are then passed directly to the build hooks.

For the run command:

  1. It calls add_run_args() to include any custom command-line arguments for this target.
  2. If --build was specified, it performs all build steps above.
  3. It calls Instance.prepare_run() on the instance that will be passed to run().
  4. It calls run() to run the target binaries.

For the clean command:

  1. It calls is_clean() to see if any build files exist for this target.
  2. If is_clean() == False, it calls clean().

For the report command:

  1. It calls parse_outfile() for every log file before creating the report.

Naturally, when defining your own target, all the methods listed above must have working implementations. Some implementations are optional and some have a default implementation that works for almost all cases (see docs below), but the following are mandatory to implement for each new target: is_fetched(), fetch(), build() and run().

name = None

str The target’s name, must be unique.

Type: str

reportable_fields(self)[source]

Run-time statistics reported by this target. Examples include the runtime of the benchmark, its memory or CPU utilization, or benchmarks-specific measurements such as throughput and latency of requests.

The format is a dictionary mapping the name of the statistic to a (human-readable) description. For each entry, the name is looked up in the logs and saved per run.

Return type:typing.Mapping[str, str]
add_build_args(self, parser)[source]

Extend the command-line arguments for the build command with custom arguments for this target. These arguments end up in the global context, so it is a good idea to prefix them with the target name to avoid collisions with other targets and instances.

For example, SPEC2006 defines --spec2006-benchmarks (rather than --benchmarks).

Parameters:parser (argparse.ArgumentParser) – the argument parser to extend
Return type:None
add_run_args(self, parser)[source]

Extend the command-line arguments for the run command with custom arguments for this target. Since only a single target can be run at a time, prefixing to avoid naming conflicts with other targets is not necessary here.

For example, SPEC2006 defines --benchmarks and --test.

Parameters:parser (argparse.ArgumentParser) – the argument parser to extend
Return type:None
dependencies(self)[source]

Specify dependencies that should be built and installed in the run environment before building this target.

Return type:typing.Iterator[infra.package.Package]
path(self, ctx, *args)[source]

Get the absolute path to the build directory of this target, optionally suffixed with a subpath.

Parameters:
Returns:

build/targets/<name>[/<subpath>]

Return type:

str

is_fetched(self, ctx)[source]

Returns True if fetch() should be called before building.

Parameters:ctx (context.Context) – the configuration context
Return type:bool
fetch(self, ctx)[source]

Fetches the source code for this target. This step is separated from build() because the build command first fetches all packages and targets before starting the build process.

Parameters:ctx (context.Context) – the configuration context
Return type:None
build(self, ctx, instance, pool=None)[source]

Build the target object files. Called some time after fetch() (see above).

ctx.runenv will have been populated with the exported environments of all packages returned by dependencies() (i.e., Package.install_env() has been called for each dependency). This means that when you call util.run() here, the programs and libraries from the dependencies are available in PATH and LD_LIBRARY_PATH, so you don’t need to reference them with absolute paths.

The build function should respect variables set in the configuration context such as ctx.cc and ctx.cflags, passing them to the underlying build system as required. Setup.ctx shows default variables in the context that should at least be respected, but complex instances may optionally overwrite them to be used by custom targets.

Any custom command-line arguments set by add_build_args() are available here in ctx.args.

If pool is defined (i.e., when --parallel is passed), the target is expected to use pool.run() instead of util.run() to invoke build commands.

Parameters:
Return type:

None

run(self, ctx, instance, pool=None)[source]

Run the target binaries. This should be done using util.run() so that ctx.runenv is used (which can be set by an instance or dependencies). It is recommended to pass teeout=True to make the output of the process stream to stdout.

Any custom command-line arguments set by add_run_args() are available here in ctx.args.

If pool is defined (i.e., when --parallel is passed), the target is expected to use pool.run() instead of util.run() to launch runs.

Implementations of this method should respect the --iterations option of the run command.

Parameters:
Return type:

None

parse_outfile(self, ctx, outfile)[source]

Callback method for commands.report.parse_logs(). Used by report command to get reportable results.

Parameters:
  • ctx (context.Context) – the configuration context
  • outfile (str) – path to outfile to parse
Return type:

typing.Iterator[typing.MutableMapping[str, typing.Union[bool, int, float, str]]]

Raises:

NotImplementedError – unless implemented

is_clean(self, ctx)[source]

Returns True if clean() should be called before cleaning.

Parameters:ctx (context.Context) – the configuration context
Return type:bool
clean(self, ctx)[source]

Clean generated files for this target, called by the clean command. By default, this removes build/targets/<name>.

Parameters:ctx (context.Context) – the configuration context
Return type:None
binary_paths(self, ctx, instance)[source]

If implemented, this should return a list of absolute paths to binaries created by build() for the given instance. This is only used if the instance specifies post-build hooks. Each hook is called for each of the returned paths.

Parameters:
Returns:

paths to binaries

Return type:

typing.Iterable[str]

Raises:

NotImplementedError – unless implemented

Instances

class infra.Instance(*args, **kwargs)[source]

Abstract base class for instance definitions. Built-in derived classes are listed here.

Each instance must define a name attribute that is used to reference the instance on the command line. The name must be unique among all registered instances.

An instance changes variables in the configuration context that are used to apply instrumentation while building a target by Target.build() and Target.link(). This is done by configure().

Additionally, instances that need runtime support, such as a shared library, can implement prepare_run() which is called by the run command just before running the target with Target.run().

name

The instance’s name, must be unique.

add_build_args(self, parser)[source]

Extend the command-line arguments for the build command with custom arguments for this instance. These arguments end up in the global context, so it is a good idea to prefix them with the instance name to avoid collisions with other instances and targets.

Use this to enable build flags for your instance on the command line, rather than having to create separate instances for every option when experimenting.

Parameters:parser (argparse.ArgumentParser) – the argument parser to extend
Return type:None
dependencies(self)[source]

Specify dependencies that should be built and installed in the run environment before building a target with this instance. Called before configure() and prepare_run().

Return type:typing.Iterator[infra.package.Package]
configure(self, ctx)[source]

Modify context variables to change how a target is built.

Typically, this would set one or more of ctx.{cc,cxx,cflags,cxxflags,ldflags,hooks.post_build}. It is recommended to use += rather than = when assigning to lists in the context to avoid undoing changes by dependencies.

Any custom command-line arguments set by add_build_args() are available here in ctx.args.

Parameters:ctx (context.Context) – the configuration context
Return type:None
prepare_run(self, ctx)[source]

Modify context variables to change how a target is run.

Typically, this would change ctx.runenv, e.g., by setting ctx.runenv.LD_LIBRARY_PATH. Target.run() is expected to call util.run() which will inherit the modified environment.

Parameters:ctx (context.Context) – the configuration context
Return type:None

Packages

class infra.Package(*args, **kwargs)[source]

Abstract base class for package definitions. Built-in derived classes are listed here.

Each package must define a ident() method that returns a unique ID for the package instance. This is similar to Target.name, except that each instantiation of a package can return a different ID depending on its parameters. For example a Bash package might be initialized with a version number and be identified as bash-4.1 and bash-4.3, which are different packages with different build directories.

A dependency is built in three steps:

  1. fetch() downloads the source code, typically to build/packages/<ident>/src.
  2. build() builds the code, typically in build/packages/<ident>/obj.
  3. install() installs the built binaries/libraries, typically into build/packages/<ident>/install.

The functions above are only called if is_fetched(), is_built() and is_installed() return False respectively. Additionally, if is_installed() returns True, fetching and building is skipped altogether. All these methods are abstract and thus require an implementation in a pacakge definition.

clean() removes all generated package files when the clean command is run. By default, this removes build/packages/<ident>.

The package needs to be able to install itself into ctx.runenv so that it can be used by targets/instances/packages that depend on it. This is done by install_env(), which by default adds build/packages/<ident>/install/bin to the PATH and build/packages/<ident>/install/lib to the LD_LIBRARY_PATH.

Finally, the setup script has a pkg-config command that prints package information such as the installation prefix of compilation flags required to build software that uses the package. These options are configured by pkg_config_options().

ident(self)[source]

Returns a unique identifier to this package instantiation.

Two packages are considered identical if their identifiers are equal. This means that if multiple targets/instances/packages return different instantiations of a package as dependency that share the same identifier, they are assumed to be equal and only the first will be built. This way, different implementations of dependencies() can instantiate the same class in order to share a dependency.

Return type:str
dependencies(self)[source]

Specify dependencies that should be built and installed in the run environment before building this package.

Return type:typing.Iterator[ForwardRef(‘Package’)]
is_fetched(self, ctx)[source]

Returns True if fetch() should be called before building.

Parameters:ctx (context.Context) – the configuration context
Return type:bool
is_built(self, ctx)[source]

Returns True if build() should be called before installing.

Parameters:ctx (context.Context) – the configuration context
Return type:bool
is_installed(self, ctx)[source]

Returns True if the pacakge has not been installed yet, and thus needs to be fetched, built and installed.

Parameters:ctx (context.Context) – the configuration context
Return type:bool
fetch(self, ctx)[source]

Fetches the source code for this package. This step is separated from build() because the build command first fetches all packages and targets before starting the build process.

Parameters:ctx (context.Context) – the configuration context
Return type:None
build(self, ctx)[source]

Build the package. Usually amounts to running make -j<ctx.jobs> using util.run().

Parameters:ctx (context.Context) – the configuration context
Return type:None
install(self, ctx)[source]

Install the package. Usually amounts to running make install using util.run(). It is recommended to install to self.path(ctx, 'install'), which results in build/packages/<ident>/install. Assuming that a bin and/or lib directories are generated in the install directory, the default behaviour of install_env() will automatically add those to [LD_LIBRARY_]PATH.

Parameters:ctx (context.Context) – the configuration context
Return type:None
is_clean(self, ctx)[source]

Returns True if clean() should be called before cleaning.

Parameters:ctx (context.Context) – the configuration context
Return type:bool
clean(self, ctx)[source]

Clean generated files for this target, called by the clean command. By default, this removes build/packages/<ident>.

Parameters:ctx (context.Context) – the configuration context
Return type:None
path(self, ctx, *args)[source]

Get the absolute path to the build directory of this package, optionally suffixed with a subpath.

Parameters:
Returns:

build/packages/<ident>[/<subpath>]

Return type:

str

install_env(self, ctx)[source]

Install the package into ctx.runenv so that it can be used in subsequent calls to util.run(). By default, it adds build/packages/<ident>/install/bin to the PATH and build/packages/<ident>/install/lib to the LD_LIBRARY_PATH (but only if the directories exist).

Parameters:ctx (context.Context) – the configuration context
Return type:None
pkg_config_options(self, ctx)[source]

Yield options for the pkg-config command. Each option is an (option, description, value) triple. The defaults are --root which returns the root directory build/packages/<ident>, and --prefix which returns the install directory populated by install(): build/packages/<ident>/install.

When reimplementing this method in a derived package class, it is recommended to end the implementation with yield from super().pkg_config_options(ctx) to add the two default options.

Parameters:ctx (context.Context) – the configuration context
Return type:typing.Iterator[typing.Tuple[str, str, typing.Union[str, typing.Iterable[str]]]]

Utility functions

infra.util.add_cflag(ctx, flag)[source]

Add flag to ctx.cflags if new

Return type:None
infra.util.add_cxxflag(ctx, flag)[source]

Add flag to ctx.cxxflags if new

Return type:None
infra.util.add_c_cxxflag(ctx, flag)[source]

Add a flag both to ctx.cflags & ctx.cxxflags if new

Return type:None
infra.util.add_ldflag(ctx, flag)[source]

Add flag to ctx.ldflags if new

Return type:None
infra.util.add_lib_ldflag(ctx, flag, also_ldflag=False)[source]

Add flag to ctx.lib_ldflags if new

Return type:None
class infra.util.Index(thing_name)[source]
keys(self)[source]
Return type:typing.KeysView[str]
values(self)[source]
Return type:typing.ValuesView[~T]
items(self)[source]
Return type:typing.ItemsView[str, ~T]
class infra.util.LazyIndex(thing_name, find_value)[source]
exception infra.util.FatalError[source]

Raised for errors that should stop the execution immediately, but do not need a backtrace. Results in only the exception message being logged. This typically means there is an error in the user input, rather than in the code that raises the error.

infra.util.apply_patch(ctx, path, strip_count)[source]

Applies a patch in the current directory by calling patch -p<strip_count> < <path>.

Afterwards, a stamp file called .patched-<basename> is created to indicate that the patch has been applied. If the stamp file is already present, the patch is not applied at all. <basename> is generated from the patch file name: path/to/my-patch.patch becomes my-patch.

Parameters:
  • ctx (context.Context) – the configuration context
  • path (str) – path to the patch file
  • strip_count (int) – number of leading elements to strip from patch paths
Returns:

True if the patch was applied, False if it was already applied before

Return type:

bool

class infra.util.Process(proc, cmd_str, teeout, stdout_override=None)[source]
Return type:None
infra.util.run(ctx, cmd, allow_error=False, silent=False, teeout=False, defer=False, env={}, **kwargs)[source]

Wrapper for subprocess.run() that does environment/output logging and provides a few useful options. The log file is build/log/commands.txt. Where possible, use this wrapper in favor of subprocess.run() to facilitate easier debugging.

It is useful to permanently have a terminal window open running tail -f build/log/commands.txt, This way, command output is available in case of errors but does not clobber the setup’s progress log.

The run environment is based on os.environ, first adding ctx.runenv (populated by packages/instances, see also Setup) and then the env parameter. The combination of ctx.runenv and env is logged to the log file. Any lists of strings in environment values are joined with a ‘:’ separator.

If the command exits with a non-zero status code, the corresponding output is logged to the command line and the process is killed with sys.exit(-1).

Parameters:
  • ctx (context.Context) – the configuration context
  • cmd (str or typing.Iterable[typing.Any]) – command to run, can be a string or a list of strings like in subprocess.run()
  • allow_error (bool) – avoids calling sys.exit(-1) if the command returns an error
  • silent (bool) – disables output logging (only logs the invocation and environment)
  • teeout (bool) – streams command output to sys.stdout as well as to the log file
  • defer (bool) – Do not wait for the command to finish. Similar to ./program & in Bash. Returns a subprocess.Popen instance.
  • env (typing.Mapping[str, typing.Union[str, typing.List[str]]]) – variables to add to the environment
  • kwargs (Any) – passed directly to subprocess.run() (or subprocess.Popen if defer==True)
Returns:

a handle to the completed or running process

Return type:

util.Process

infra.util.qjoin(args)[source]

Join the command-line arguments to a single string to make it safe to pass to paste in a shell. Basically this adds quotes to each element containing spaces (uses shlex.quote()). Arguments are stringified by str before joining.

Parameters:args (typing.Iterable[typing.Any]) – arguments to join
Return type:str
infra.util.download(ctx, url, outfile=None)[source]

Download a file (logs to the debug log).

Parameters:
  • ctx (context.Context) – the configuration context
  • url (str) – URL to the file to download
  • outfile (str or None) – optional path/filename to download to
Return type:

None

infra.util.require_program(ctx, name, error=None)[source]

Require a program to be available in PATH or ctx.runenv.PATH.

Parameters:
  • ctx (context.Context) – the configuration context
  • name (str) – name of required program
  • error (str or None) – optional error message
Return type:

None

Raises:

FatalError – if program is not found

infra.util.untar(ctx, tarname, dest=None, *, remove=True, basename=None)[source]

TODO: docs

Return type:None
class infra.parallel.Pool(logger, parallelmax)[source]

A pool is used to run processes in parallel as jobs when --parallel is specified on the command line. The pool is created automatically by Setup and passed to Target.build() and Target.run(). However, the pool is only passed if the method implementation defines a parameter for the pool, i.e.:

class MyTarget(Target):
    def build(self, ctx, instance, pool): # receives Pool instance
       ...
    def run(self, ctx, instance):         # does not receive it
       ...

The maximum number of parallel jobs is controlled by --parallelmax. For --parallel=proc this is simply the number of parallel processes on the current machine. For --parallel=prun it is the maximum number of simultaneous jobs in the job queue (pending or running).

Parameters:
  • logger (logging.Logger) – logging object for status updates (set to ctx.log)
  • parallelmax (int) – value of --parallelmax
wait_all(self)[source]

Block (busy-wait) until all jobs in the queue have been completed. Called automatically by Setup after the build and run commands.

Return type:None
run(self, ctx, cmd, jobid, outfile, nnodes, onsuccess=None, onerror=None, **kwargs)[source]

A non-blocking wrapper for util.run(), to be used when --parallel is specified.

Parameters:
  • ctx (context.Context) – the configuration context
  • cmd (str or typing.Iterable[str]) – the command to run
  • jobid (str) – a human-readable ID for status reporting
  • outfile (str) – full path to target file for command output
  • nnodes (int) – number of cores or machines to run the command on
  • onsuccess (typing.Callable[[infra.parallel.Job], NoneType] or None) – callback when the job finishes successfully
  • onerror (typing.Callable[[infra.parallel.Job], NoneType] or None) – callback when the job exits with (typically I/O) error
  • kwargs (Any) – passed directly to util.run()
Returns:

handles to created job processes

Return type:

typing.Iterable[infra.parallel.Job]

Built-in targets

SPEC

class infra.targets.SPEC2006(source_type, source, patches=[], toolsets=[], nothp=True, force_cpu=0, default_benchmarks=['all_c', 'all_cpp'], reporters=[<class 'infra.packages.tools.RusageCounters'>])[source]

The SPEC-CPU2006 benchmarking suite.

Since SPEC may not be redistributed, you need to provide your own copy in source. We support the following types for source_type:

  • isofile: ISO file to mount (requires fuseiso to be installed)
  • mounted: mounted/extracted ISO directory
  • installed: pre-installed SPEC directory in another project
  • tarfile: compressed tarfile with ISO contents
  • git: git repo containing extracted ISO

The --spec2006-benchmarks command-line argument is added for the build and run commands. It supports full individual benchmark names such as ‘400.perlbench’, and the following benchmark sets defined by SPEC:

Mutiple sets and individual benchmarks can be specified, duplicates are removed and the list is sorted automatically. When unspecified, the benchmarks default to all_c all_cpp.

The following options are added only for the run command:

  • --benchmarks: alias for --spec2006-benchmarks
  • --test: run the test workload
  • --measuremem: use an alternative runscript that bypasses runspec to measure memory usage
  • --runspec-args: passed directly to runspec

Parallel builds and runs using the --parallel option are supported. Command output will end up in the results/ directory in that case. Note that even though the parallel job may finish successfully, you still need to check the output for errors manually using the report command.

The --iterations option of the run command is translated into the number of nodes per job when --parallel is specified, and to --runspec-args -n <iterations> otherwise.

The report command analyzes logs in the results directory and reports the aggregated data in a table. It receives a list of run directories (results/run.X) as positional arguments to traverse for log files. By default, the columns list runtimes, memory usages, overheads, standard deviations and iterations. The computed values are appended to each log file with the prefix [setup-report], and read from there by subsequent report commands if available (see also RusageCounters). This makes log files portable to different machines without copying over the entire SPEC directory. The script depends on a couple of Python libraries for its output:

pip3 install [--user] terminaltables termcolor

Some useful command-line options change what is displayed by report:

TODO: move some of these from below to general report command docs

  1. --fields changes which data fields are printed. A column is added for each instance for each field. The options are autocompleted and default to status, overheads, runtime, memory usage, stddevs and iterations. Custom counter fields from runtime libraries can also be specified (but are not autocompleted).
  2. --baseline changes the baseline for overhead computation. By default, the script looks for baseline, clang-lto or clang.
  3. --csv/--tsv change the output from human-readable to comma/tab-separated for script processing. E.g., use in conjunction with cut to obtain a column of values.
  4. --nodes adds a (possibly very large) table of runtimes of individual nodes. This is useful for identifying bad nodes on the DAS-5 when some standard deviations are high while using --parallel prun.
  5. --ascii disables UTF-8 output so that output can be saved to a log file or piped to less.

Finally, you may specify a list of patches to apply before building. These may be paths to .patch files that will be applied with patch -p1, or choices from the following built-in patches:

  • dealII-stddef Fixes error in dealII compilation on recent compilers when ptrdiff_t is used without including stddef.h. (you basically always want this)
  • asan applies the AddressSanitizer patch, needed to make -fsanitize=address work on LLVM.
  • gcc-init-ptr zero-initializes a pointer on the stack so that type analysis at LTO time does not get confused.
  • omnetpp-invalid-ptrcheck fixes a code copy-paste bug in an edge case of a switch statement, where a pointer from a union is used while it is initialized as an int.
Name:

spec2006

Parameters:
  • source_type (str) – see above
  • source (str) – where to install spec from
  • patches (typing.List[str]) – patches to apply after installing
  • toolsets (typing.List[str]) – approved toolsets to add additionally
  • nothp (bool) – run without transparent huge pages (they tend to introduce noise in performance measurements), implies Nothp dependency if True
  • force_cpu (int) – bind runspec to this cpu core (-1 to disable)
  • default_benchmarks (typing.List[str]) – specify benchmarks run by default
custom_allocs_flags = ['-allocs-custom-funcs=Perl_safesysmalloc:malloc:0.Perl_safesyscalloc:calloc:1:0.Perl_safesysrealloc:realloc:1.Perl_safesysfree:free:-1.ggc_alloc:malloc:0.alloc_anon:malloc:1.xmalloc:malloc:0.xcalloc:calloc:1:0.xrealloc:realloc:1']

list Command line arguments for the built-in -allocs pass; Registers custom allocation function wrappers in SPEC benchmarks.

class infra.targets.SPEC2017(source_type, source, patches=[], nothp=True, force_cpu=0, default_benchmarks=['intspeed_pure_c', 'intspeed_pure_cpp', 'fpspeed_pure_c'], reporters=[<class 'infra.packages.tools.RusageCounters'>])[source]

The SPEC-CPU2017 benchmarking suite.

Since SPEC may not be redistributed, you need to provide your own copy in source. We support the following types for source_type:

  • isofile: ISO file to mount (requires fuseiso to be installed)
  • mounted: mounted/extracted ISO directory
  • installed: pre-installed SPEC directory in another project
  • tarfile: compressed tarfile with ISO contents
  • git: git repo containing extracted ISO

The following options are added only for the run command:

  • --benchmarks: alias for --spec2017-benchmarks
  • --test: run the test workload
  • --measuremem: use an alternative runscript that bypasses runspec to measure memory usage
  • --runspec-args: passed directly to runspec

Parallel builds and runs using the --parallel option are supported. Command output will end up in the results/ directory in that case. Note that even though the parallel job may finish successfully, you still need to check the output for errors manually using the report command.

The --iterations option of the run command is translated into the number of nodes per job when --parallel is specified, and to --runspec-args -n <iterations> otherwise.

The report command analyzes logs in the results directory and reports the aggregated data in a table. It receives a list of run directories (results/run.X) as positional arguments to traverse for log files. By default, the columns list runtimes, memory usages, overheads, standard deviations and iterations. The computed values are appended to each log file with the prefix [setup-report], and read from there by subsequent report commands if available (see also RusageCounters). This makes log files portable to different machines without copying over the entire SPEC directory. The script depends on a couple of Python libraries for its output:

pip3 install [--user] terminaltables termcolor

Some useful command-line options change what is displayed by report:

TODO: move some of these from below to general report command docs

  1. --fields changes which data fields are printed. A column is added for each instance for each field. The options are autocompleted and default to status, overheads, runtime, memory usage, stddevs and iterations. Custom counter fields from runtime libraries can also be specified (but are not autocompleted).
  2. --baseline changes the baseline for overhead computation. By default, the script looks for baseline, clang-lto or clang.
  3. --csv/--tsv change the output from human-readable to comma/tab-separated for script processing. E.g., use in conjunction with cut to obtain a column of values.
  4. --nodes adds a (possibly very large) table of runtimes of individual nodes. This is useful for identifying bad nodes on the DAS-5 when some standard deviations are high while using --parallel prun.
  5. --ascii disables UTF-8 output so that output can be saved to a log file or piped to less.
Name:

spec2017

Parameters:
  • source_type (str) – see above
  • source (str) – where to install spec from
  • patches (typing.List[str]) – patches to apply after installing
  • nothp (bool) – run without transparent huge pages (they tend to introduce noise in performance measurements), implies Nothp dependency if True
  • force_cpu (int) – bind runspec to this cpu core (-1 to disable)
  • default_benchmarks (typing.List[str]) – specify benchmarks run by default

Web servers

class infra.targets.Nginx(version, build_flags=[])[source]

The Nginx web server.

Name:nginx
Parameters:version (str) – which (open source) version to download
class infra.targets.ApacheHttpd(version, apr_version, apr_util_version, modules=['few'], build_flags=[])[source]

Apache web server. Builds APR and APR Util libraries as binary dependencies.

Name:

apache

Parameters:
  • version (str) – apache httpd version
  • apr_version (str) – APR version
  • apr_util_version (str) – APR Util version
  • module – a list of modules to enable (default: “few”, any modules will be statically linked)
class infra.targets.Lighttpd(version)[source]

TODO: docs

Juliet

class infra.targets.Juliet(mitigation_return_code=None)[source]

The Juliet Test Suite for C/C++.

This test suite contains a large amount of programs, categorized by vulnerability type (CWE). Most programs include both a “good” and “bad” version, where the good version should succeed (no bug) whereas the bad version should be detected by the applied mitigation. In other words, the good version tests for false positives, and the bad version for false negatives.

The --cwe command-line argument specifies which CWEs to build and/or run, and can be a CWE-ID (416 or CWE416) or an alias (e.g., uaf). A mix of CWE-IDs and aliases is allowed.

The Juliet suite contains multiple flow variants per test case. These are different control-flows in the program, that in the end all arrive at the same bug. This is only relevant for static analysis tools, and for run-time mitigations these are unsuitable. In particular, some flow variants (e.g., 12) do not (always) trigger or reach the bug at runtime. Therefore, by default only flow variant 01 is used, but others can be specified with the --variants command-line argument.

By default, a good test is counted as successful (true negative) if its returncode is 0, and a bad test is counted as successful (true positive) if its returncode is non-zero. The latter behavior can be fine-tuned via the mitigation_return_code argument to this class, which can be set to match the returncode of the mitigation.

Each test receives a fixed string to stdin. Tests that are based on sockets are currently not supported, as this requires running two tests at the same time (a client and a server).

Tests can be built in parallel (using --parallel=proc), since this process might take a while when multiple CWEs or variants are selected. Running tests in parallel is not supported (yet).

Name:juliet
Parameters:mitigation_return_code (int or None) – Return code the mitigation exits with, to distinguish true positives for the bad version of testcases. If None, any non-zero value is considered a success.

Built-in instances

Clang

class infra.instances.Clang(llvm, *, optlevel=2, lto=False, alloc='system')[source]

Sets clang as the compiler. The version of clang used is determined by the LLVM package passed to the constructor.

By default, -O2 optimization is set in CFLAGS and CXXFLAGS. This can be customized by setting optlevel to 0/1/2/3/s.

alloc can be system (the default) or tcmalloc. For custom tcmalloc hackery, overwrite the gperftools property of this package with a custom Gperftools object.

Name:

clang[-O<optlevel>][-lto][-tcmalloc]

Parameters:
  • llvm (packages.LLVM) – an LLVM package containing the relevant clang version
  • optlevel (int or str) – optimization level for -O (default: 2)
  • lto (bool) – whether to apply link-time optimizations
  • alloc (str) – which allocator to use (default: system)

AddressSanitizer

class infra.instances.ASan(llvm, temporal=True, stack=True, glob=True, check_writes=True, check_reads=True, redzone=None, optlevel=2, lto=False)[source]

AddressSanitizer instance. Added -fsanitize=address plus any configuration options at compile time and link time, and sets ASAN_OPTIONS at runtime.

Runtime options are currently hard-coded to the following:

  • alloc_dealloc_mismatch=0
  • detect_odr_violation=0
  • detect_leaks=0
Name:

asan[-heap|-nostack|-noglob][-wo][-lto]

Parameters:
  • llvm (packages.LLVM) – an LLVM package with compiler-rt included
  • stack (bool) – toggle stack instrumentation
  • temporal (bool) – toggle temporal safety (False sets quarantine size to 0)
  • glob (bool) – toggle globals instrumentation
  • check_writes (bool) – toggle checks on stores
  • check_reads (bool) – toggle checks on loads
  • lto (bool) – perform link-time optimizations
  • redzone (int or None) – minimum heap redzone size (default 16, always 32 for stack)
Return type:

None

Built-in packages

LLVM

class infra.packages.LLVM(version, compiler_rt, commit=None, lld=False, patches=[], build_flags=[])[source]

LLVM dependency package. Includes the Clang compiler and optionally compiler-rt (which contains runtime support for ASan).

Supports a number of patches to be passed as arguments, which are applied (with patch -p1) before building. A patch in the list can either be a full path to a patch file, or the name of a built-in patch. Available built-in patches are:

  • gold-plugins (for 3.8.0/3.9.1/4.0.0/5.0.0/7.0.0): adds a -load option to load passes from a shared object file during link-time optimizations, best used in combination with LLVMPasses
  • statsfilter (for 3.8.0/3.9.1/5.0.0/7.0.0): adds -stats-only option, which relates to -stats like -debug-only relates to -debug
  • lto-nodiscard-value-names (for 7.0.0): preserves value names when producing bitcode for LTO, this is very useful when debugging passes
  • safestack (for 3.8.0): adds -fsanitize=safestack for old LLVM
  • compiler-rt-typefix (for 4.0.0): fixes a compiler-rt-4.0.0 bug to make it compile for recent glibc, is applied automatically if compiler_rt is set
Identifier:

llvm-<version>

Parameters:
  • version (str) – the full LLVM version to download, like X.Y.Z
  • compiler_rt (bool) – whether to enable compiler-rt
  • patches (typing.List[str]) – optional patches to apply before building
  • build_flags (typing.List[str]) – additional build flags to pass to cmake
configure(self, ctx)[source]

Set LLVM toolchain programs in ctx. Should be called from the configure method of an instance.

Parameters:ctx (context.Context) – the configuration context
Return type:None
static add_plugin_flags(ctx, *flags, gold_passes=True)[source]

Helper to pass link-time flags to the LLVM gold plugin. Prefixes all flags with -Wl,-plugin-opt= before adding them to ctx.ldflags.

Parameters:
Return type:

None

Dependencies
class infra.packages.AutoConf(version, m4)[source]
Identifier:

autoconf-<version>

Parameters:
  • version (str) – version to download
  • m4 (packages.M4) – M4 package
class infra.packages.AutoMake(version, autoconf, libtool)[source]
Identifier:

automake-<version>

Parameters:
classmethod default(automake_version='1.16.5', autoconf_version='2.71', m4_version='1.4.19', libtool_version='2.4.6')[source]

Create a package with default versions for all autotools.

Parameters:
  • automake_version (str) – automake version
  • autoconf_version (str) – autoconf version
  • m4_version (str) – m4 version
  • libtool_version (str or None) – optional libtool version
Return type:

AutoMake

class infra.packages.Bash(version)[source]
Identifier:bash-<version>
Parameters:version (str) – version to download
class infra.packages.BinUtils(version, gold=True)[source]
Identifier:

binutils-<version>[-gold]

Parameters:
  • version (str) – version to download
  • gold (bool) – whether to use the gold linker
class infra.packages.CMake(version)[source]
Identifier:cmake-<version>
Parameters:version (str) – version to download
class infra.packages.CoreUtils(version)[source]
Identifier:coreutils-<version>
Parameters:version (str) – version to download
class infra.packages.LibElf(version)[source]
Identifier:libelf-<version>
Parameters:version (str) – version to download
class infra.packages.LibTool(version)[source]
Identifier:libtool-<version>
Parameters:version (str) – version to download
class infra.packages.M4(version)[source]
Identifier:m4-<version>
Parameters:version (str) – version to download
class infra.packages.Make(version)[source]
Identifier:make-<version>
Parameters:version (str) – version to download
class infra.packages.Ninja(version)[source]
Identifier:ninja-<version>
Parameters:version (str) – version to download

LLVM passes

class infra.packages.LLVMPasses(llvm, srcdir, build_suffix, use_builtins, debug=False, gold_passes=True)[source]

LLVM passes dependency. Use this to add your own passes as a dependency to your own instances. In your own passes directory, your Makefile should look like this (see the skeleton for an example):

BUILD_SUFFIX = <build_suffix>
LLVM_VERSION = <llvm_version>
SETUP_SCRIPT = <path_to_setup.py>
SUBDIRS      = <optional list of subdir names containing passes>
include <path_to_infra>/infra/packages/llvm_passes/Makefile

The makefile can be run as-is using make in your passes directory during development, without invoking the setup script directly. It creates two shared objects in build/packages/llvm-passes-<build_suffix>/install:

  • libpasses-gold.so: used to load the passes at link time in Clang. This is the default usage.
  • libpasses-opt.so: used to run the passes with LLVM’s opt utility. Can be used in a customized build system or for debugging.

The passes are invoked at link time by a patched LLVM gold plugin. The gold-plugin patch of the LLVM package adds an option to load custom passes into the plugin. Passes are invoked by adding their registered names to the flags passed to the LLVM gold plugin by the linker. In other words, by adding -Wl,-plugin-opt=<passname> to ctx.ldflags in the configure method of your instance. The LLVM.add_plugin_flags() helper does exactly that. Before using passes, you must call llvm_passes.configure(ctx) to load the passes into the plugin. See the skeleton LibcallCount instance for an example.

For the pkg-config command of this package, the --objdir option points to the build directory.

Identifier:

llvm-passes-<build_suffix>

Parameters:
  • llvm (packages.LLVM) – LLVM package to link against
  • srcdir (str) – source directory containing your LLVM passes
  • build_suffix (str) – identifier for this set of passes
  • use_builtins (bool) – whether to include built-in LLVM passes in the shared object
  • debug (bool) – enable to compile passes with -O0 -ggdb
Todo:

extend this to support compile-time plugins

configure(self, ctx, *, linktime=True, compiletime=True)[source]

Set build/link flags in ctx. Should be called from the configure method of an instance.

linktime and compiletime can be set to false to avoid loading the pass libraries at link time and at compile time, respectively. Loading passes at link time requires LLVM to be built with the gold-plugin patch.

Parameters:
  • ctx (context.Context) – the configuration context
  • linktime (bool) – are the passes used at link time?
  • compiletime (bool) – are the passes used at compile time?
Return type:

None

runtime_cflags(self, ctx)[source]

Returns a list of CFLAGS to pass to a runtime library that depends on features from passes. These set include directories for header includes of built-in pass functionalities such as the NOINSTRUMENT macro.

Parameters:ctx (context.Context) – the configuration context
Return type:typing.Iterable[str]
class infra.packages.BuiltinLLVMPasses(llvm, gold_passes=True)[source]

Subclass of LLVMPasses for built-in passes. Use this if you don’t have any custom passes and just want to use the built-in passes. Configuration happens in the same way as described above: by calling the configure() method.

In addition to the shared objects listed above, this package also produces a static library called libpasses-builtin.a which is used by the LLVMPasses to include built-in passes when use_builtins is True.

For the pkg-config command of this package, the following options are added in addition to --root/--prefix/--objdir:

  • --cxxflags lists compilation flags for custom passes that depend on built-in analysis passes (sets include path for headers).
  • --runtime-cflags prints the value of LLVMPasses.runtime_cflags().
Identifier:llvm-passes-builtin-<llvm.version>
Parameters:llvm (packages.LLVM) – LLVM package to link against

Address space shrinking

class infra.packages.LibShrink(addrspace_bits, commit='master', debug=False)[source]

Dependency package for libshrink.

Libshrink shrinks the application address space to a maximum number of bits. It moves the stack and TLS to a memory region that is within the allowed bitrange, and prelinks all shared libraries as well so that they do not exceed the address space limitations. It also defines a run_wrapper() that should be put in ctx.target_run_wrapper by an instance that uses libshrink.

Identifier:

libshrink-<addrspace_bits>

Parameters:
  • addrspace_bits (int) – maximum number of nonzero bits in any pointer
  • commit (str) – branch or commit to clone
  • debug (bool) – whether to compile with debug symbols
configure(self, ctx, static=True)[source]

Set build/link flags in ctx. Should be called from the configure method of an instance. Uses post-build hooks, so any target compiled with this libary must implement infra.Target.binary_paths().

Parameters:
  • ctx (context.Context) – the configuration context
  • static (bool) – use the static library? (shared library otherwise)
Return type:

None

Raises:

NotImplementedError – if static is not True (TODO)

run_wrapper(self, ctx)[source]

Run wrapper for targets. Links to a script that sets the rpath before any libraries are loaded, so that any dependencies of shared libraries loaded by the applications are also loaded from the directory of prelinked libraries (which is created by a post-build hook).

Parameters:ctx (context.Context) – the configuration context
Return type:str
Dependencies
class infra.packages.PatchElf(version)[source]
Identifier:patchelf-<version>
Parameters:version (str) – version to download
Identifier:prelink-<version>
Parameters:version (str) – version to download
class infra.packages.PyElfTools(version, python_version)[source]
Identifier:

pyelftools-<version>

Parameters:
  • version (str) – version to download
  • python_version (str) – which Python version to install the package for

TCmalloc

class infra.packages.Gperftools(commit, libunwind_version='1.4-rc1', patches=[])[source]
Identifier:

gperftools-<version>

Parameters:
  • commit (str) – git branch/commit to check out after cloning
  • libunwind_version (str) – libunwind version to use
  • patches (typing.List[str]) – optional patches to apply before building
configure(self, ctx)[source]

Set build/link flags in ctx. Should be called from the configure method of an instance.

Sets the necessary -I/-L/-l flags, and additionally adds -fno-builtin-{malloc,calloc,realloc,free} to CFLAGS.

Parameters:ctx (context.Context) – the configuration context
Return type:None
Dependencies
class infra.packages.LibUnwind(version, patches=[])[source]
Identifier:libunwind-<version>
Parameters:version (str) – version to download

Tools

class infra.packages.Nothp(*args, **kwargs)[source]
Identifier:nothp
class infra.packages.RusageCounters(*args, **kwargs)[source]

Utility library for targets that want to measure resource counters:

  • memory (max resident set size)
  • page faults
  • I/O operations
  • context switches
  • runtime (esimated by gettimeofday in constructor+destructor)

The target only needs to depend on this package and configure() it to link the static library which will then log a reportable result in a destructor. See SPEC2006 for a usage example.

Identifier:rusage-counters
classmethod parse_results(ctx, path, allow_missing=False)[source]

Parse any results containing counters by this package.

Parameters:
  • ctx (context.Context) – the configuration context
  • path (str) – path to file to parse
Returns:

counter results

Return type:

typing.MutableMapping[str, typing.Union[bool, int, float, str]]

configure(self, ctx)[source]

Set build/link flags in ctx. Should be called from the build method of a target to link in the static library.

Parameters:ctx (context.Context) – the configuration context
Return type:None
pkg_config_options(self, ctx)[source]

Yield options for the pkg-config command. Each option is an (option, description, value) triple. The defaults are --root which returns the root directory build/packages/<ident>, and --prefix which returns the install directory populated by install(): build/packages/<ident>/install.

When reimplementing this method in a derived package class, it is recommended to end the implementation with yield from super().pkg_config_options(ctx) to add the two default options.

Parameters:ctx (context.Context) – the configuration context
Return type:typing.Iterator[typing.Tuple[str, str, typing.Union[str, typing.Iterable[str]]]]

Apache benchmark (ab)

class infra.packages.ApacheBench(httpd_version, apr, apr_util)[source]

Apache’s ab benchmark.

Identifier:

ab-<version>

Parameters:
Return type:

None

Dependencies
class infra.packages.APR(version)[source]

The Apache Portable Runtime.

Identifier:apr-<version>
Parameters:version (str) – version to download
class infra.packages.APRUtil(version, apr)[source]

The Apache Portable Runtime utilities.

Identifier:

apr-util-<version>

Parameters:
  • version (str) – version to download
  • apr (packages.APR) – APR package to depend on
Return type:

None

Wrk benchmark

class infra.packages.Wrk(version='master')[source]

The wrk benchmark.

Identifier:wrk-<version>
Parameters:version (str) – version to download
class infra.packages.Wrk2(version='master')[source]

The wrk2 benchmark.

Identifier:wrk2-<version>
Parameters:version (str) – version to download

Scons

class infra.packages.Scons(version)[source]

The scons build tool (replacement for make).

Identifier:scons-<version>
Parameters:version (str) – version to download

Built-in LLVM passes

The framework features a number of useful analysis/transformation passes that you can use in your own instances/passes. The passes are listed below, with the supported LLVM versions in parentheses.

Transform passes

-dump-ir (3.8.0/4.0.0): Dumps the current module IR of the program that is being linked in human-readable bitcode file with the “.ll” extension. Prints the location of the created file to stderr. Optionally, the target filename can be set by calling DEBUG_MODULE_NAME("myname"); after including dump-ir-helper.h from the built-in passes.

-custominline (3.8.0/4.0.0): Custom inliner for helper functions from statically linked runtime libraries. Inlines calls to functions that have __attribute__((always_inline)) and functions whose name starts with __noinstrument__inline_.

-defer-global-init (3.8.0): Changes all global initializers to zero-initializers and adds a global constructor function that initializes the globals instead. In combination with -expand-const-global-uses, this is useful for instrumenting globals without having to deal with constant expressions (but only with instructions).

-expand-const-global-uses (3.8.0): Expands all uses of constant expressions (ConstExpr) in functions to equivalent instructions. This limts edge cases during instrumentation, and can be undone with -instcombine.

TODO: Combine -defer-global-init and -expand-const-global-uses into a single -expand-constexprs pass that expands all constant expressions to instructions.

Analysis passes

-sizeof-types (3.8.0): Finds allocated types for calls to malloc based on sizeof expression in the source code. Must be used in conjunction with the accompanying compiler wrapper and compile-time pass. See header file for usage.

Utility headers

Utilities to be used in custom LLVM pass implementations. These require use_builtins=True to be passed to infra.packages.LLVM. See the source code for a complete reference.

builtin/Common.h (3.8.0/4.0.0): Includes a bunch of much-used LLVM headers and defines some helper functions.

builtin/Allocation.h (3.8.0/4.0.0): Helpers to populate an AllocationSite struct with standardized information about any stack/heap allocations.

TODO: rewrite builtin/Allocation.h to an -allocs analysis pass.

builtin/CustomFunctionPass.h (3.8.0/4.0.0): Defines the CustomFunctionPass class which serves as a drop-in replacement for LLVM’s FunctionPass, but really is a ModulePass. This is necessary because the link-time passes plugin does not support function passes because of things and reasons.

SPEC CPU benchmarking

SPEC benchmarking 101

The SPEC CPU benchmarking suites contain a number of C, C++ and Fortran benchmarks. Each benchmark is based on an existing, real-world, program (e.g., the Perl interpreter, the GCC compiler, etc), and has different characteristics. Some programs might be very CPU/FPU intensive, some might be memory intensive, and so on. It is widely used for paper evaluations because of this.

The latest version is SPEC CPU2017, although SPEC CPU2006 is also still in wide use (partly to compare against older systems and papers). SPEC CPU2000 has mostly fallen out of use except for comparing against very old papers, and CPU95/CPU92 are not used at all anymore. The infra currently supports SPEC CPU2006 and CPU2017. The concepts are mostly the same between these two, most information here is applicable to both versions unless otherwise stated. This guide will refer to both as “SPEC” for convenience.

Benchmarks in each SPEC version are often grouped in several (overlapping) sets. For example, CPU2006 has the CINT and CFP sets (for integer and floating-point respectively), but also had sets like all_c, all_cpp, all_fortan and all_mixed (grouping the benchmark per language). When running and reporting SPEC results, you should pick a suitable/established set, and you should not cherry-pick or leave out certain benchmarks. Typically, you’ll want to run the full suite, although running only CINT or CFP is acceptable in some cases. However, Fortran support is currently still lacking in compilers such as LLVM. Therefore, most papers omit (pure or mixed) Fortran benchmarks. For CPU2006, running all C and C++ benchmarks (19 in total) is the most common configuration, and the default for the infra.

Adding to the infra

While the infra contains SPEC targets, it does not include SPEC itself, as it is a commercial product that we are not allowed to redistribute. Therefore, step one is to acquire a copy of SPEC and point the infra to this.

Note

If you are a student in VUSec, you should contact your supervisor for access to a copy of SPEC.

The infra supports several different formats of the SPEC installation: the raw .iso file, an extracted version of the .iso file, or a manually installed version. A single SPEC installation can be used between different infra instances, and generally has the preference:

mkdir speciso
sudo mount -o loop spec2006.iso speciso
cd speciso
./install.sh -f -d /path/to/install/spec2006  # E.g., /home/$USER/spec2006

Then, open setup.py and add at bottom (but before setup.main()):

setup.add_target(infra.targets.SPEC2006(
    source = '/path/to/spec',
    source_type = 'installed'
))

If you use any other type of source_type (isofile, mounted, tarfile, git), the infra will install spec for you inside its own build directory.

Building and running

You can build and run SPEC in the infra like any other target, e.g.:

./setup.py run spec2006 baseline deltapointers --build

However, some special flags that are relevant here:

  • --benchmark BENCHMARK[,BENCHMARK]
    This option allows you to run only a subset of the benchmarks. This option is especially useful when testing or debugging a single benchmark. E.g., --benchmark 400.perlbench
  • --test
    SPEC comes with multiple “input sets” – inputs that are fed into the aforementioned benchmark programs. By default it uses the “ref” set, which are pretty big and run for a long time. With the --test option it instead uses the “test” input set, which consist of smaller inputs, so all of SPEC can run within a minute. This cannot be used for benchmarks, but is useful for checking if everything is OK before starting a full ref run. Note that a system might work on one input set fine, but not on the other, because one input set might stress different part of the programs. One common example is the test set of 400.perlbench, which is the only SPEC benchmark that executes a fork().
  • --iterations=N
    To reduce noise when benchmarking, you want to do multiple runs of each benchmark to take the median runtime. On most systems 3 or 5 runs are sufficient, but if high standard deviations are observed more are required.
  • --parallel=proc --parallelmax=1
    By passing --parallel=<something> the infra will produce/process the output of SPEC. Here, --parallel=proc means run it as processes on the local machine (instead of distributing the jobs over a cluster or remote machines). The --parallelmax=1 means only one benchmark runs at a time, so they don’t interfere with each other. For testing runs, where you don’t care about measuring performance, you can set --parallelmax to your CPU count for example.

So overall, for running full spec and measure overhead, you’d use:

./setup.py run spec2006 baseline --iterations=3 --parallel=proc --parallelmax=1

This will produce a new directory in the results/ directory. To keep track of different runs, it’s convenient to rename these directories manually after it’s done (e.g., from results/run-2020-04-16.10-15-55 to results/baseline).

Note

You need to pass the --parallel=proc argument to actually generate results that can be reported.

Parsing the results

The infra can produce tables of the results for you with the normal report command:

./setup.py report spec2006 results/baseline -f runtime:median:stdev_percent

The thing at the end means “give me the median and standard deviation of the runtimes per benchmark”. You can similarly do -f maxrss:median to print the memory overhead. You can give it multiple result directories. If you pass in --overhead baseline it will calculate everything as normalized overheads relative to the baseline instance.

SPEC CPU2017

SPEC CPU2017 comes with two distinct sets of benchmarks: the speed and the rate suites. The speed set is similar to older versions of SPEC, where a single benchmark is started and its execution time is measured. The new rate metric, on the other hand, launches multiple binaries at the same time (matching the number of CPU cores) and measures throughput. More information is available in the SPEC documentation. Each of these to sets as its own list of benchmark programs: speed benchmarks start with 6xx, whereas rate benchmarks start with 5xx.

Typically we only use the speed set for our papers.

Running on a cluster

Note

The following information is specific to the DAS clusters offered by dutch universities, although it can be used on any cluster that uses prun to issue jobs to nodes. The DAS clusters can generally be used by any (BSc, MSc or PhD) student at the VU, LU, UvA, and TUD.

On a cluster, it is possible to run multiple SPEC benchmarks in parallel for much faster end-to-end benchmarking. The infra has full support for clusters that utilize the prun command to issue jobs, as described on the usage page. For running SPEC we recommend the DAS-5 over the DAS-6 cluster, as it features more nodes (instead of fewer more powerful nodes).

You will first need to request an account. When doing so as a student, you should mention your supervisor.

Some additional notes on using the DAS cluster:

  • Your homedir is limited is space, so use /var/scratch/$USER instead (for both the infra and the spec install dir).
  • Use --parallel=prun. You can omit --parallelmax, since defaults to 64 to match DAS-5 cluster.
  • By default jobs are killed after 15min. This is usually fine (baseline longest benchmark, 464.h264ref, takes 8 minutes) but if you have a super slow defense it might exceed it. For those cases, you can outside office hours use --prun-opts="-asocial -t 30:00"
  • The results on the DAS-5 are much noisier since we cannot control things like CPU frequency scaling. Therefore you should do 11 iterations (instead of 5) and take median. Do also take note of the stddev: if that’s crazy high it might indicate some defective nodes. Contact the DAS sysadmin or your supervisor if that’s becoming a serious problem, since a reboot fixes these issues. Note that we have scripts to find these defective nodes based on benchmarking results.

So overall, in most cases you’d simply use something like:

./setup.py run spec2006 baseline asan --iterations=11 --parallel=prun

Debugging

When debugging issues with a particular instance, it is often required to run a SPEC benchmark under a debugger such as GDB. The infra itself launches spec benchmarks via the specrun command, which in turn invokes the binary of the particular benchmark several times with different command line arguments. For example, 400.perlbench runs the perl binary several times with different perl scripts. In this example we use 400.perlbench from CPU2006, but this procedure is the same for any benchmarks of any SPEC version.

To run one of these tests manually with gdb, we bypass both the infra and specrun. To determine the correct command line arguments for the benchmark (and to set up the relevant input files), the first step is to run the particular benchmark via the infra normally (see above). This will set up the correct run directory, for example in $SPEC/benchspec/CPU2006/400.perlbench/run/run_base_ref_infra-baseline.0000, where the last directory name depends on the instance (here baseline) and input set (ref or test).

Inside this directory should be a speccmds.cmd file, which contains the run environment and arguments of the binary, and is normally parsed by specinvoke. Lines starting with -E and -C define the environment variables and working directory, respectively, and can be ignored. The lines starting with -o define the actual runs of the binary, and might for example look like:

-o checkspam.2500.5.25.11.150.1.1.1.1.out -e checkspam.2500.5.25.11.150.1.1.1.1.err ../run_base_ref_infra-baseline.0000/perlbench_base.infra-baseline -I./lib checkspam.pl 2500 5 25 11 150 1 1 1 1

The first two bits (-o and -e) tell specinvoke where to redirect stdout/stderr, and we don’t need. Then comes the binary (including a relative path into the current directory), and thus is perlbench_base.infra-baseline in our case. After that follow all actual arguments, which we need to pass.

If we want to run this under gdb, we can thus call is as follows:

gdb --args perlbench_base.infra-baseline -I./lib checkspam.pl 2500 5 25 11 150 1 1 1 1

Webserver benchmarking

The infra has built-in support for benchmarking webserver applications like Nginx. In such setups, the infra runs an instrumented version of the server application, and then runs a (uninstrumented) client program to benchmark the performance of the server (typically, wrk).

The setup to run webserver benchmarks, however, is more complicated than it is for targets like SPEC. In particular, two machines are required (one for the server and one for the client), with a fast network connection between them (e.g., 40 Gbit). The key goal of webserver benchmarks is to reach CPU saturation already on the baseline. If saturation is not reached, any measured overhead is practically meaningless (since it’s hidden by the spare CPU cycles). While far from ideal, it is preferable to use a loopback setup (running client and server on a single machine, dividing the cores evenly) rather than use a setup where no saturation is reached (e.g., 1 Gbit connection).

For benchmarks, the saturation/peak performance point should be determined for the baseline, and that point is then used to measure the overhead (both in throughput decrease and latency increase). To do so, we typically construct a graph as shown below. This increases the pressure of the client by increasing its number of connection (X-axis), and measures both the throughput (in requests/second) and CPU utilization. In this graph, we see a peak at 256 connections, at which point the throughput overhead for “DangZero” is 18% (623 kReqs/s -> 516 kReqs/s). Not shown in this graph is the latency: that should be measured at the same saturation point, and reported separately in a table (as percentiles, e.g., 99th percentile).

_images/nginx-througput-example.png

The infra has several options for running this setup automatically on separate machines. The recommended way is to use the SSH method (using --parallel=ssh). This guide follows this method. Note that this setup can use localhost as an SSH target, meaning one (or even both, for loopback experiments) of the nodes can be the same as the one running the infra.

This whole process currently requires a lot of arguments to setup.py. Below, we show a script that provides good defaults for most arguments.

#!/bin/bash
set -euo pipefail

servers="nginx"
instances="baseline dangzero"

# Sweep over connection count, decreasing in density as we go higher
connections="`seq 16 16 256` `seq 256 128 1024` `seq 1024 256 1536`"

# SSH names - used as `ssh <host>`, so can be a host config the SSH config
sshclient=father
sshserver=son
# Local hosts - how to connect to each node via TCP
hostclient=localhost
hostserver=192.168.0.10
# Benchmark host (100G NIC) - how the client connects to server
serverip=10.0.0.10
serverport=20000

iterations=3  # Repeat experiments a few times
filesize=64  # Data per request, in bytes
duration=30  # Time per experiment in seconds
wait_time=1  # Time to wait between experiments


client_threads=`nproc`  # Threads - should always be max, i.e., nproc

server_workers=`nproc`  # Worker processes on server - should be max
server_worker_connections=1024  # Max connections per worker - do not change

# Statistics to collect of server
stats="cpu rss"  # Space-separated list of {cpu, cpu-proc, rss, vms}
stats_interval=1  # Time between measurements, in seconds


for server in $servers; do
    python3 ./setup.py run $server $instances \
        -t bench \
        --parallel=ssh \
        --ssh-nodes $sshclient $sshserver \
        --remote-client-host $hostclient \
        --remote-server-host $hostserver \
        --server-ip $serverip \
        --port $serverport \
        --duration $duration \
        --threads $client_threads \
        --iterations $iterations \
        --workers $server_workers \
        --worker-connections $server_worker_connections \
        --filesize $filesize \
        --collect-stats $stats \
        --collect-stats-interval $stats_interval \
        --connections $connections \
        --restart-server-between-runs
done

Options you may want to have a look at:

  • connections should cover a range so that you can observe the growth to saturation, and after the peak point a drop-off in throughput (with lower number more densely sampled).
  • iterations repeats each experiment N times to reduce noise. A value of 3 or 5 is recommended, unless high standard deviations are observed.
  • filesize is the size of the file that the benchmark retrieves. Higher values put more pressure on the network link without increasing CPU pressure, and thus lower values are recommended for CPU saturation.
  • duration is the length of each experiment in seconds. Normally 30 second runs are fine, but if you are benchmarking something with increased memory pressure over time you may need to run longer benchmarks (e.g., 10 minutes).

Finally, there are the SSH, host and server IP settings which require some explanation:

  • The sshclient and sshserver describe how the setup.py script can reach the machines running the client (wrk) and server (the webserver). These are SSH hostnames, and can be an IP or a hostname from the .ssh/config file.
  • The setup.py script spawns a python script (remoterunner.py) on both the client and server machines via SSH. After that it connects to these scripts via TCP directly, and hostclient and hostserver describe the IP addresses of how to connect to these. If you used IP addresses for the SSH client/server fields, these fields probably hold the same values.
  • Finally, once the benchmark starts the client machine will run run wrk against the webserver on the host. The IP address that the client machine uses to connect to the server machine is configured via serverip. This might be the same IP as hostserver, but it might also be different: for the SSH and host fields these connections can go over any link (localhost, built-in 1 Gbit NIC, QEMU virtual NIC, etc). For the serverip field, however, the IP associated to the fast NIC (e.g., 40 or 100 Gbit) should be used to ensure CPU saturation.

The setup.py script can run on one of the two machines (client or server): in the example above, the setup.py script runs on the client machine (the one that will also run wrk). It furthermore assumes the father (client) and son (server) hosts are in .ssh/config and can be used without a passphrase (e.g., via an SSH agent). The machines are in a LAN in the 192.168.0.0/24 range, whereas the 100 Gbit NICs use the 10.0.0.0/24 range. This is configured manually via:

father $ ifconfig ens4 10.0.0.20 up
son $ ifconfig ens4 10.0.0.10 up

Finally, the infra can collect statistics during the execution of each test on the server. One of these statistics is the CPU usage, which is used to ensure saturation was reached. These statistics can be sampled every N seconds, and the following are supported:

  • cpu: total CPU load of the system.
  • cpu-proc: CPU load per process.
  • rss: RSS (resident set size) of the server. I.e., physical memory usage.
  • vms: VMS (virtual memory size) of the server.

In a VM

Some mitigations, especially those featuring kernel or hypervisor modifications, require running the target webserver in a VM. Running benchmarks in a VM is fine, but care has to be taken to ensure a proper setup.

As a basis for any reasonable benchmark, the VM should be hardware accelerated (e.g., using KVM with Intel VMX or AMD-V), with sufficient memory and CPU cores assigned. Additionally, a VM may optionally be backed by hugepages.

As with the experiments on bare-metal (as described above), the VM also needs direct access to a fast NIC. Using something like virtio is, in our experience, not fast enough. Instead, a fast NIC should be directly assigned to the VM. This can be achieved through either SR-IOV (for devices that support virtualization and assigning part of it to a VM), or full PCI passthrough of the device. For this guide, we assume the latter as it is more generically applicable.

Enabling IOMMU

Passing the NIC to the guest requires an IOMMU to be enabled in the system. For this, ensure the IOMMU (VT-d or AMD-Vi) is enabled in the BIOS settings. Add intel_iommu=on or amd_iommu=on to the kernel boot parameters (e.g., by modifying GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and then running update-grub).

After this, running dmesg after boot should show messages related to IOMMU/DMAR being enabled.

Next we need to check the IOMMU groups. It is only possible to pass a whole VM group to a VM, not only part of its devices. First ensure /sys/kernel/iommu_groups/ exists and has a few directories. Then, run the following command in your terminal:

for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

If the NIC does not have its own IOMMU group, try plugging it into a different slot on the main board. Typically, the “primary” or first slot of a mainboard has its own IOMMU group at least.

VFIO

To assign the device to the VM, we need to unbind its original driver (e.g., mlx5_core for Mellanox cards), and bind it to the vfio-pci driver.

First, find the BDF (bus:device.function, basically the physical slot of the PCI card) and vendor:device pair of the card:

$ lspci -nn
...
b3:00.0 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]
...

We can see here that its BDF is b3:00.0 (in full form, 0000:b3:00.0, and the vendor:device pair is 15b3:1013.

Now, check which driver is in use for this device:

$ lspci -d 15b3:1013 -k
b3:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
    Subsystem: Mellanox Technologies MT27700 Family [ConnectX-4]
    Kernel driver in use: mlx5_core
    Kernel modules: mlx5_core

Which is the mlx5_core Mellanox driver. We need to unbind this driver from the card:

echo 0000:b3:00.0 | sudo tee /sys/bus/pci/drivers/mlx5_core/unbind

Then, allow vfio-pci to bind to this device:

echo 15b3 1013 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id

When running lspci -d 15b3:1013 -k again, it should report Kernel driver in use: vfio-pci. If this is not already the case, execute the following command to force the binding:

echo 0000:b3:00.0 | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
QEMU

To pass the device to the VM, we add the -device vfio-pci,host=<BDF> option to qemu:

sudo qemu-system-x86_64 -m 8G -enable-kvm -cpu host -device vfio-pci,host=b3:00.0 -nographic -serial mon:stdio debian.img

We run this with sudo, otherwise we get errors about mapping memory and such.

Inside the VM, we should see the card show up like it did on the host before:

vm $ lspci -d 15b3:1013 -k
00:04.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
Subsystem: Mellanox Technologies MT27700 Family [ConnectX-4]
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core

Note it now has the same vendor:device identifier, but a different BDF (00:04.0). We can now check which network interface is associated with this NIC:

vm $ ls /sys/bus/pci/devices/0000\:00\:04.0/net/
ens2

Which we can then configure as normal:

vm $ ifconfig ens2 10.0.0.10 up
Hugepage backing for VM

Forcing hugepage backing for the VM is not required: in most cases we have noticed no significant effect for webserver applications. However, it might be required if the instrumentation of the target increases memory or TLB pressure a lot. In this case, you might notice significant performance differences between runs, depending on when the THP (transparent huge pages) on the host kick in.

You can follow the guide from RedHat: https://access.redhat.com/solutions/36741

When using QEMU directly instead of libvirt, add the following command line options (instead of the modifications to guest.xml):

-mem-prealloc
-mem-path /hugepages/libvirt/qemu