- Python 53.8%
- C 17.6%
- Zig 15.5%
- TeX 12.6%
- Makefile 0.5%
| .config/mise | ||
| demo | ||
| gem5@7a2b0e413d | ||
| include | ||
| report | ||
| workloads | ||
| .clang-format | ||
| .clangd | ||
| .gitignore | ||
| .gitmodules | ||
| .python-version | ||
| analyze.zig | ||
| AUTHORS | ||
| build.zig | ||
| build.zig.zon | ||
| cache_config.py | ||
| CITATION.cff | ||
| LICENSE | ||
| pyrightconfig.json | ||
| README.md | ||
| requirements.txt | ||
| run_all_simulations.py | ||
| visualize_results.py | ||
Artifact Repository for “A Quantitative Cache Evaluation of Select PolyBench Kernels”
This repository contains the full submission for the course “Arquitetura de Computadores III” (Instituto de Ciências Exatas e Informática, Pontifícia Universidade Católica de Minas Gerais), 2026/1, Prof. Matheus Alcântara Souza.
It packages a complete, reproducible pipeline to study cache hierarchy sensitivities on a selected set of PolyBench kernels using the gem5 simulator. The pipeline is expressed in a single Zig build graph that: checks host prerequisites; pins and bootstraps Python via uv; initializes and builds the vendored gem5 submodule; compiles statically linked workloads; runs an exhaustive, parameterized simulation sweep; generates figures; and builds the final LaTeX report.
Target platform: Linux x86_64 only. While gem5 itself is portable, parts of the automation (uv/Python setup and LaTeX/report build) are written for Linux.
Full Demonstration
What’s Here
- build.zig: End-to-end build graph (Zig 0.15.2)
- cache_config.py: gem5 SE-mode system and cache configuration (CLI-tunable)
- run_all_simulations.py: Orchestrates the full parameter sweep, with parallel workers and idempotent resumption
- visualize_results.py: Turns
results/into publication figures underfigures/ - analyze.zig: Small helper to inspect/aggregate gem5 stats (optional)
- workloads/: C sources for PolyBench kernels and small
microbenchmarks
- atax.c, floyd-warshall.c, gemm.c, jacobi-2d.c, seidel-2d.c (PolyBench v4.2.1 kernels)
- array_stride.c, matrix_multiply.c, random_access.c (handwritten)
- polybench.c (shared runtime)
- include/polybench.h: PolyBench configuration header
- report/: IEEEtran paper sources; report/main.tex is the manuscript
- gem5/: gem5 submodule (initialized by the build)
Who Made This
See AUTHORS for the full list and contact emails. If you use these artifacts, please cite the work described in report/main.tex. A machine-readable CITATION.cff is provided.
Summary Of The Experiment
- Workloads (PolyBench v4.2.1): atax, floyd-warshall, gemm, jacobi-2d, seidel-2d
- Core model: X86TimingSimpleCPU (in-order), 4 GHz; memory mode: timing; 8 GiB address space
- Cache hierarchy: private L1I/L1D, shared L2, shared L3 (see cache_config.py)
- Parameter sweep per workload (31 configs):
- 11 realistic multi-level size presets (e.g., baseline i7-6700K; Ryzen; Apple; server)
- cache line size ∈ {32, 64, 128, 256} bytes
- associativity sweep for L1, L2, and L3 ∈ {1, 2, 4, 8, 16} (one level varied at a time)
- Total runs: 5 workloads × 31 configurations = 155 simulations
- Dataset sizes per kernel were chosen to balance fidelity vs. run time (see table below)
Dataset choices compiled into the real-run workloads:
| Kernel | Dataset | Notes/Dimensions (from report) |
|---|---|---|
| atax | LARGE | M=1900, N=2100 |
| gemm | MEDIUM | NI=200, NJ=220, NK=240 |
| floyd-warshall | SMALL | N=180 |
| jacobi-2d | MEDIUM | N=250, T=100 |
| seidel-2d | MEDIUM | N=400, T=100 |
Prerequisites (Linux)
Required (checked by zig build check-deps):
- Zig 0.15.2 (ZVM/mise recommended)
- uv (the build pins Python to 3.14.3 inside venv)
- A system-wide GCC or Clang installation, since gem5 unfortunately will not accept Zig's internal Clang
- git, just, and m4 to fetch and build gem5, as well as the report
- A TeX distribution to generate plots and the report (a full TeX Live is the easiest option, but TinyTeX may also work).
- Graphviz (
doton PATH) to render gem5config.dotto PDF (the Python bindingpydotis installed viarequirements.txt) - Optionally, gperftools for the
tcmallocimplementation, which speeds up gem5 a lot.
Quick Start
Clone release tag with gem5 submodule (shallow clone recommended for saving disk space):
git clone https://github.com/lucca-pellegrini/AC3-TP1.git --branch=v0.1.1 --depth=1 --recursive --shallow-submodules
cd AC3-TP1
Sanity-check your host tools:
zig build check-deps
Reproduce everything end-to-end (very long: build gem5, run 155 sims, make figures, build the paper):
zig build report
The gem5 build usually takes ~50 minutes with decent parallelism, depending on
the hardware. The full simulations usually take hours to a few days, as the
build caps parallel workers to 9 to reduce OOM risks. Upon finishing, figures
and PDF report appear under figures/ and report/.
Using mise-en-place
Set up requirements
On Debian Trixie:
sudo apt install curl gcc g++ m4 git zlib1g-dev libgoogle-perftools-dev graphviz
curl https://mise.run | sh && export PATH="$HOME/.local/bin:$PATH"
On Fedora 43/RHEL 10/CentOS Stream 10/Rocky 10/Alma 10 (run as superuser):
dnf install 'dnf-command(copr)'
dnf copr enable jdxcode/mise
dnf install gcc gcc-c++ glibc-devel glibc-static libstdc++ libstdc++-devel libstdc++-static m4 git zlib-devel graphviz mise
On Arch Linux (run as superuser):
pacman -S --needed gcc m4 git graphviz gperftools zlib mise
Clone repo, trust config, and run
git clone https://github.com/lucca-pellegrini/AC3-TP1.git --depth=1 --shallow-submodules --recursive --branch=v0.1.1
cd AC3-TP1
mise trust
mise run # Or `mise report` to immediately run the entire build/simulation pipeline
Manual Workflow
Each major step is addressable. You can run them individually and resume safely.
# 1) Prepare Python (uv + venv + requirements.txt)
zig build setup-python
# 2) Initialize gem5 submodule
zig build init-gem5
# 3) Build gem5 simulator (gem5/build/X86/gem5.fast)
zig build gem5
# 4) Build the m5 control library (libm5.a)
zig build m5
# 5) Build workloads (default step)
zig build
# 6) Run the full sweep for all 5 PolyBench kernels
zig build simulations
# 7) Generate publication figures from results/
zig build visualize
# 8) Build the LaTeX report
zig build report
All steps are idempotent. You can interrupt long runs and rerun the same step later; remaining items will continue.
Running One-Off Simulations
You can run gem5 directly with a specific parameter and workload. Examples:
# Baseline config for jacobi-2d
./gem5/build/X86/gem5.fast \
-d results/jacobi-2d_baseline -- \
cache_config.py ./zig-out/bin/jacobi-2d
# Try a different cache line size for atax
./gem5/build/X86/gem5.fast \
-d results/atax_cache_line_128 -- \
cache_config.py --cache-line-size=128 ./zig-out/bin/atax
# Use a tiny debug binary (MINI dataset) to smoke-test the flow
./gem5/build/X86/gem5.fast \
-d results/seidel-2d_testline_64 -- \
cache_config.py --cache-line-size=64 ./zig-out/bin/seidel-2d-test
Or invoke the orchestrator for just one workload:
# Run all 31 configs for gemm with 4 workers and CPU pinning (requires psutil)
./.venv/bin/python run_all_simulations.py \
--results-dir=results ./gem5/build/X86/gem5.fast ./zig-out/bin/gemm \
-j 4 --pin-workers
# Dry-run to see what would execute (no simulations are launched)
./.venv/bin/python run_all_simulations.py \
--dry-run --results-dir=results ./gem5/build/X86/gem5.fast ./zig-out/bin/jacobi-2d
Results Layout
Each simulation stores its outputs under results/<workload>_<variant>/. The
directory always contains stats.txt (the counters used by the analysis) and a
complete snapshot of the simulated system in config.ini, config.json, and
config.dot along with a rendered config.dot.pdf. When a run finishes, the
orchestrator drops a .completed marker so subsequent invocations can resume
cleanly without redoing work. The plotting stage reads all runs from
results/, writes publication figures to figures/, and the paper
(report/main.tex) imports those figures directly.
Reproducibility Choices
To minimize drift, the build pins Python 3.14.3 via uv and installs all Python
tooling (SCons, plotting libraries, and friends) into .venv. The gem5 source
is vendored as a Git submodule at a fixed commit and is always built through
that virtual environment’s SCons. Workloads target x86_64-linux-musl and are
linked statically to reduce host‑dependency variance (see
musl). The simulation runner is deterministic and
resumable; it caps parallelism at min(9, nproc) to avoid oversubscription and
out‑of‑memory failures.
Licensing
Unless a file states otherwise, source code in this repository is licensed under the ISC license (see the SPDX headers). The report (report/main.tex and the figures it includes) is distributed under CC BY-SA 4.0. The gem5 submodule remains under its own upstream license, and the PolyBench workloads in workloads/ remain under the Ohio State University Software Distribution License.
Citing
If you use the code, figures, or methodology, please cite the accompanying paper in report/main.tex: A Quantitative Cache Evaluation of Select PolyBench Kernels, Amanda Canizela Guimarães, Ariel Inácio Jordão, Lucca Pellegrini, Paulo Dimas Junior, Pedro Vitor Andrade, ICEI/PUC Minas, 2026. Machine-readable citation metadata is available in CITATION.cff.

