Artifact Repository for “A Quantitative Cache Evaluation of Select PolyBench Kernels”
  • Python 53.8%
  • C 17.6%
  • Zig 15.5%
  • TeX 12.6%
  • Makefile 0.5%
Find a file
2026-04-19 11:51:47 -03:00
.config/mise chore(mise): add Perl dependency to install TeX packages 2026-04-19 11:51:47 -03:00
demo feat(README.md): add asciinema-like demo gifs 2026-04-03 23:49:56 -03:00
gem5@7a2b0e413d feat: add gem5 as a submodule 2026-03-18 15:35:15 -03:00
include feat(workloads): add some benchmarks from PolyBench 2026-03-23 17:44:07 -03:00
report chore(build.zig): move venv to root dir 2026-04-19 11:51:47 -03:00
workloads chore: add LICENSE and copyright notices 2026-04-02 13:07:46 -03:00
.clang-format initial commit 2026-03-17 11:41:42 -03:00
.clangd feat(workloads): add some benchmarks from PolyBench 2026-03-23 17:44:07 -03:00
.gitignore feat: add script to generate plot from results 2026-03-31 10:04:27 -03:00
.gitmodules feat: add gem5 as a submodule 2026-03-18 15:35:15 -03:00
.python-version feat: add .python-version for pyenv 2026-04-01 21:04:16 -03:00
analyze.zig chore: add LICENSE and copyright notices 2026-04-02 13:07:46 -03:00
AUTHORS chore: add README and authorship info 2026-04-03 18:16:31 -03:00
build.zig chore(build.zig): format dependencies description properly 2026-04-19 11:51:47 -03:00
build.zig.zon chore: add LICENSE and copyright notices 2026-04-02 13:07:46 -03:00
cache_config.py fix(cache_config.py): make sure an exit code is returned when simulation fails 2026-04-19 11:51:47 -03:00
CITATION.cff chore(CITATION.cff): fix metadata on the file so that it parses right 2026-04-05 23:06:52 -03:00
LICENSE chore: add LICENSE and copyright notices 2026-04-02 13:07:46 -03:00
pyrightconfig.json feat(gem5): compile gem5 with fast preset 2026-04-11 18:32:01 -03:00
README.md fix(README.md): add missing libc/libc++ deps for dnf-based systems 2026-04-19 11:51:47 -03:00
requirements.txt fix(run_all_simulations.py): make graphviz a hard requirement 2026-04-04 09:55:34 -03:00
run_all_simulations.py fix(cache_config.py): make sure an exit code is returned when simulation fails 2026-04-19 11:51:47 -03:00
visualize_results.py chore(mise): remove unused TeX package 2026-04-18 18:12:42 -03:00

Artifact Repository for “A Quantitative Cache Evaluation of Select PolyBench Kernels”

This repository contains the full submission for the course “Arquitetura de Computadores III” (Instituto de Ciências Exatas e Informática, Pontifícia Universidade Católica de Minas Gerais), 2026/1, Prof. Matheus Alcântara Souza.

It packages a complete, reproducible pipeline to study cache hierarchy sensitivities on a selected set of PolyBench kernels using the gem5 simulator. The pipeline is expressed in a single Zig build graph that: checks host prerequisites; pins and bootstraps Python via uv; initializes and builds the vendored gem5 submodule; compiles statically linked workloads; runs an exhaustive, parameterized simulation sweep; generates figures; and builds the final LaTeX report.

Target platform: Linux x86_64 only. While gem5 itself is portable, parts of the automation (uv/Python setup and LaTeX/report build) are written for Linux.

Full Demonstration

asciicast

Whats Here

  • build.zig: End-to-end build graph (Zig 0.15.2)
  • cache_config.py: gem5 SE-mode system and cache configuration (CLI-tunable)
  • run_all_simulations.py: Orchestrates the full parameter sweep, with parallel workers and idempotent resumption
  • visualize_results.py: Turns results/ into publication figures under figures/
  • analyze.zig: Small helper to inspect/aggregate gem5 stats (optional)
  • workloads/: C sources for PolyBench kernels and small microbenchmarks
    • atax.c, floyd-warshall.c, gemm.c, jacobi-2d.c, seidel-2d.c (PolyBench v4.2.1 kernels)
    • array_stride.c, matrix_multiply.c, random_access.c (handwritten)
    • polybench.c (shared runtime)
  • include/polybench.h: PolyBench configuration header
  • report/: IEEEtran paper sources; report/main.tex is the manuscript
  • gem5/: gem5 submodule (initialized by the build)

Who Made This

See AUTHORS for the full list and contact emails. If you use these artifacts, please cite the work described in report/main.tex. A machine-readable CITATION.cff is provided.

Summary Of The Experiment

  • Workloads (PolyBench v4.2.1): atax, floyd-warshall, gemm, jacobi-2d, seidel-2d
  • Core model: X86TimingSimpleCPU (in-order), 4 GHz; memory mode: timing; 8 GiB address space
  • Cache hierarchy: private L1I/L1D, shared L2, shared L3 (see cache_config.py)
  • Parameter sweep per workload (31 configs):
    • 11 realistic multi-level size presets (e.g., baseline i7-6700K; Ryzen; Apple; server)
    • cache line size ∈ {32, 64, 128, 256} bytes
    • associativity sweep for L1, L2, and L3 ∈ {1, 2, 4, 8, 16} (one level varied at a time)
  • Total runs: 5 workloads × 31 configurations = 155 simulations
  • Dataset sizes per kernel were chosen to balance fidelity vs. run time (see table below)

Dataset choices compiled into the real-run workloads:

Kernel Dataset Notes/Dimensions (from report)
atax LARGE M=1900, N=2100
gemm MEDIUM NI=200, NJ=220, NK=240
floyd-warshall SMALL N=180
jacobi-2d MEDIUM N=250, T=100
seidel-2d MEDIUM N=400, T=100

Prerequisites (Linux)

Required (checked by zig build check-deps):

  • Zig 0.15.2 (ZVM/mise recommended)
  • uv (the build pins Python to 3.14.3 inside venv)
  • A system-wide GCC or Clang installation, since gem5 unfortunately will not accept Zig's internal Clang
  • git, just, and m4 to fetch and build gem5, as well as the report
  • A TeX distribution to generate plots and the report (a full TeX Live is the easiest option, but TinyTeX may also work).
  • Graphviz (dot on PATH) to render gem5 config.dot to PDF (the Python binding pydot is installed via requirements.txt)
  • Optionally, gperftools for the tcmalloc implementation, which speeds up gem5 a lot.

Quick Start

Clone release tag with gem5 submodule (shallow clone recommended for saving disk space):

git clone https://github.com/lucca-pellegrini/AC3-TP1.git --branch=v0.1.1 --depth=1 --recursive --shallow-submodules
cd AC3-TP1

Sanity-check your host tools:

zig build check-deps

Reproduce everything end-to-end (very long: build gem5, run 155 sims, make figures, build the paper):

zig build report

The gem5 build usually takes ~50 minutes with decent parallelism, depending on the hardware. The full simulations usually take hours to a few days, as the build caps parallel workers to 9 to reduce OOM risks. Upon finishing, figures and PDF report appear under figures/ and report/.

Using mise-en-place

Set up requirements

On Debian Trixie:

sudo apt install curl gcc g++ m4 git zlib1g-dev libgoogle-perftools-dev graphviz
curl https://mise.run | sh && export PATH="$HOME/.local/bin:$PATH"

On Fedora 43/RHEL 10/CentOS Stream 10/Rocky 10/Alma 10 (run as superuser):

dnf install 'dnf-command(copr)'
dnf copr enable jdxcode/mise
dnf install gcc gcc-c++ glibc-devel glibc-static libstdc++ libstdc++-devel libstdc++-static m4 git zlib-devel graphviz mise

On Arch Linux (run as superuser):

pacman -S --needed gcc m4 git graphviz gperftools zlib mise

Clone repo, trust config, and run

git clone https://github.com/lucca-pellegrini/AC3-TP1.git --depth=1 --shallow-submodules --recursive --branch=v0.1.1
cd AC3-TP1
mise trust
mise run # Or `mise report` to immediately run the entire build/simulation pipeline

Manual Workflow

Running pipeline up to workload compilation from a fresh clone

Each major step is addressable. You can run them individually and resume safely.

# 1) Prepare Python (uv + venv + requirements.txt)
zig build setup-python

# 2) Initialize gem5 submodule
zig build init-gem5

# 3) Build gem5 simulator (gem5/build/X86/gem5.fast)
zig build gem5

# 4) Build the m5 control library (libm5.a)
zig build m5

# 5) Build workloads (default step)
zig build

# 6) Run the full sweep for all 5 PolyBench kernels
zig build simulations

# 7) Generate publication figures from results/
zig build visualize

# 8) Build the LaTeX report
zig build report

All steps are idempotent. You can interrupt long runs and rerun the same step later; remaining items will continue.

Running One-Off Simulations

Running all ATAx simulations with mini dataset using orchestrator

You can run gem5 directly with a specific parameter and workload. Examples:

# Baseline config for jacobi-2d
./gem5/build/X86/gem5.fast \
  -d results/jacobi-2d_baseline -- \
  cache_config.py ./zig-out/bin/jacobi-2d

# Try a different cache line size for atax
./gem5/build/X86/gem5.fast \
  -d results/atax_cache_line_128 -- \
  cache_config.py --cache-line-size=128 ./zig-out/bin/atax

# Use a tiny debug binary (MINI dataset) to smoke-test the flow
./gem5/build/X86/gem5.fast \
  -d results/seidel-2d_testline_64 -- \
  cache_config.py --cache-line-size=64 ./zig-out/bin/seidel-2d-test

Or invoke the orchestrator for just one workload:

# Run all 31 configs for gemm with 4 workers and CPU pinning (requires psutil)
./.venv/bin/python run_all_simulations.py \
  --results-dir=results ./gem5/build/X86/gem5.fast ./zig-out/bin/gemm \
  -j 4 --pin-workers

# Dry-run to see what would execute (no simulations are launched)
./.venv/bin/python run_all_simulations.py \
  --dry-run --results-dir=results ./gem5/build/X86/gem5.fast ./zig-out/bin/jacobi-2d

Results Layout

Each simulation stores its outputs under results/<workload>_<variant>/. The directory always contains stats.txt (the counters used by the analysis) and a complete snapshot of the simulated system in config.ini, config.json, and config.dot along with a rendered config.dot.pdf. When a run finishes, the orchestrator drops a .completed marker so subsequent invocations can resume cleanly without redoing work. The plotting stage reads all runs from results/, writes publication figures to figures/, and the paper (report/main.tex) imports those figures directly.

Reproducibility Choices

To minimize drift, the build pins Python 3.14.3 via uv and installs all Python tooling (SCons, plotting libraries, and friends) into .venv. The gem5 source is vendored as a Git submodule at a fixed commit and is always built through that virtual environments SCons. Workloads target x86_64-linux-musl and are linked statically to reduce hostdependency variance (see musl). The simulation runner is deterministic and resumable; it caps parallelism at min(9, nproc) to avoid oversubscription and outofmemory failures.

Licensing

Unless a file states otherwise, source code in this repository is licensed under the ISC license (see the SPDX headers). The report (report/main.tex and the figures it includes) is distributed under CC BY-SA 4.0. The gem5 submodule remains under its own upstream license, and the PolyBench workloads in workloads/ remain under the Ohio State University Software Distribution License.

Citing

If you use the code, figures, or methodology, please cite the accompanying paper in report/main.tex: A Quantitative Cache Evaluation of Select PolyBench Kernels, Amanda Canizela Guimarães, Ariel Inácio Jordão, Lucca Pellegrini, Paulo Dimas Junior, Pedro Vitor Andrade, ICEI/PUC Minas, 2026. Machine-readable citation metadata is available in CITATION.cff.