Skip to content

Admin-Owned Sandbox Installation

An admin-owned installation solves two problems:

  1. Config protection — the admin config is tamper-proof (root-owned, outside the user's sandbox dir). The sandbox scripts are also root-owned at /app/lib/agent-sandbox/, and ~/.config/agent-sandbox/ contains only user data (config, agent profiles), not scripts. With bwrap/firejail the scripts are additionally read-only inside the sandbox via mount namespace.
  2. Policy enforcement — the admin sets a security baseline that users cannot weaken. Users can customize within bounds (add data mounts, extra blocked paths) but cannot remove admin-enforced protections.

Quick Setup

A secure admin installation has two parts: a root-owned config (tamper-proof policy) and protected scripts (the code that enforces it). A root-owned config alone is not sufficient — if the agent can modify sandbox-lib.sh, it can bypass any config.

Step 1: Install scripts and config

# Install to the default admin prefix (/app — matches _ADMIN_DIR
# in sandbox-lib.sh). Puts agent-sandbox in /app/bin/ and runtime
# files in /app/lib/agent-sandbox/.
sudo make install PREFIX=/app

# Start from the minimal admin skeleton (not the full user config).
# Only set what you want to ENFORCE — users control everything else.
sudo cp /app/lib/agent-sandbox/sandbox-admin.conf /app/lib/agent-sandbox/sandbox.conf
sudo $EDITOR /app/lib/agent-sandbox/sandbox.conf

The install ships three config files:

File Purpose
sandbox.conf Admin enforcement baseline (replace with sandbox-admin.conf skeleton)
sandbox.conf.template Full user config, auto-deployed to ~/.config/agent-sandbox/sandbox.conf on first run. Never modified by admins — ensures users always get the complete documented config.
sandbox-admin.conf Minimal skeleton with only enforcement knobs. Copy over sandbox.conf to use as admin baseline.

The Makefile handles directory creation, permissions, and the agent-sandbox symlink. If your site uses a different prefix (e.g. /usr/local), change _ADMIN_DIR in sandbox-lib.sh to match.

On first run, users automatically get sandbox.conf (from sandbox.conf.template) and agent templates (agent.md, settings.json) in ~/.config/agent-sandbox/. On upgrade, unmodified copies are silently updated; user-edited files are preserved with a message pointing to the new version. Users customize via sandbox.conf, user.conf, and conf.d/*.conf. Agent overlays run in subshells, so mutations to permission globals are structurally impossible — per-agent profiles cannot bypass admin-enforced policy.

Each agent profile directory (agents/<name>/) follows a file contract:

File Purpose
config.conf Declarative metadata only — env vars, auth markers, and paths the agent uses. Read for startup warnings; MUST NOT modify sandbox permissions.
overlay.sh Mechanical config merge (e.g., merge CLAUDE.md, create settings.json) and env-var exports. Runs in a subshell — mutations to permission globals cannot reach the parent.
agent.md Sandbox-awareness instructions injected into the agent's context
settings.json Agent-specific settings template (optional, agent-dependent)

Permissions live in the sandbox configuration layer. HOME_WRITABLE, HOME_READONLY, BLOCKED_FILES, BLOCKED_ENV_VARS, and ALLOWED_ENV_VARS are set by the admin config (/app/lib/agent-sandbox/sandbox.conf), the user config (~/.config/agent-sandbox/sandbox.conf or user.conf), and per-project overrides (conf.d/*.conf) — each layer adds to the previous. Admin-enforced entries cannot be weakened by user config or by any agent profile.

What this protects

Component bwrap/firejail Landlock
Admin config (/app/lib/agent-sandbox/sandbox.conf) Protected (root-owned, enforced via subprocess isolation) Protected (root-owned, enforced via subprocess isolation)
Sandbox scripts (sandbox-lib.sh, backends/) Protected (root-owned + read-only inside sandbox via mount namespace) Protected (root-owned at /app/lib/agent-sandbox/). ~/.config/agent-sandbox/ contains only user data, not scripts.
User config (user.conf, conf.d/) Cannot weaken admin policy (subprocess isolation + policy merge) Cannot weaken admin policy (subprocess isolation + policy merge)

The admin path is set in _ADMIN_DIR in sandbox-lib.sh (not configurable via environment variable). To use a different path, change this single line.

Config Hierarchy

The sandbox loads config in layers, each adding to the previous:

1. Defaults           (built into sandbox-lib.sh)
2. Admin config       (/app/lib/agent-sandbox/sandbox.conf)  ← security baseline (if present)
3. User config        (~/.config/agent-sandbox/user.conf)          ← additive customization
4. Per-project config (~/.config/agent-sandbox/conf.d/*.conf)      ← project-specific additions

Without an admin config, the sandbox loads a single sandbox.conf from ~/.config/agent-sandbox/, identical to the user-only install (layers 2 and 3 collapse into one). When an admin config is present but the user has not yet created user.conf, the sandbox accepts ~/.config/agent-sandbox/sandbox.conf as user config — this eases the transition when an admin install is deployed after users have already customized sandbox.conf.

What users can customize

Setting User can add entries User can remove admin entries
ALLOWED_PROJECT_PARENTS No — narrowing-only. User can only restrict admin's list (entries whose canonical path is under an admin entry). Paths outside admin's tree are stripped with a warning. Implicitly yes — by listing fewer or narrower entries.
READONLY_MOUNTS Yes — mount more data read-only N/A (additive)
EXTRA_WRITABLE_PATHS Yes — add writable directories (subject to DENIED_WRITABLE_PATHS) N/A (additive)
DENIED_WRITABLE_PATHS No No — admin-only deny-list
HOME_READONLY Yes — expose more dotfiles N/A (additive)
HOME_WRITABLE Yes (but not items in admin's HOME_READONLY) N/A (additive)
BLOCKED_FILES Yes — block more files No — restored with warning
BLOCKED_ENV_VARS Yes — block more env vars No — restored with warning
BLOCKED_ENV_PATTERNS Yes — add more glob patterns No — restored with warning
EXTRA_BLOCKED_PATHS Yes — block more paths No — restored with warning
ALLOWED_ENV_VARS Yes — unblock specific env vars N/A (additive)
PRIVATE_TMP Yes Yes
BIND_DEV_PTS Yes Yes
FILTER_PASSWD Yes Yes
SANDBOX_BACKEND Yes Yes
SLURM_SCOPE Yes Yes
HOME_ACCESS Yes Yes

Admin enforcement: subprocess isolation + policy merge

User configs (user.conf, conf.d/*.conf) are loaded in an isolated subprocess (/bin/bash --norc --noprofile). Only known config variables are extracted via declare -p and validated before being applied in the parent. After each untrusted config layer, _enforce_admin_policy() compares the resulting values against the admin snapshot, restores admin entries, and merges user additions on top.

This eliminates entire attack classes: function overrides (source, eval, builtin), DEBUG/RETURN traps, exit/return escapes, IFS manipulation, and background processes — none can escape the subprocess boundary. The merge logic runs in the parent shell, unreachable from user config.

Enforced arrays (BLOCKED_FILES, BLOCKED_ENV_VARS, BLOCKED_ENV_PATTERNS, EXTRA_BLOCKED_PATHS): admin entries are always present. User additions are preserved, but user removals are undone with a warning:

WARNING: User config removed admin-enforced BLOCKED_ENV_VARS entry 'GITHUB_TOKEN' — restored.

HOME_READONLY → HOME_WRITABLE escalation: if a user config moves an admin read-only entry to writable, the escalation is reverted with a warning:

WARNING: User config moved admin HOME_READONLY entry '.gnupg' to HOME_WRITABLE — reverted.

DENIED_WRITABLE_PATHS: any EXTRA_WRITABLE_PATHS entry matching or under a denied path is stripped with a warning:

WARNING: User config added EXTRA_WRITABLE_PATHS entry '/etc/cron.d' under denied path '/etc' — removed.

ALLOWED_PROJECT_PARENTS (narrowing-only): the user can only restrict admin's allow-list, never expand it. A user-supplied entry is admissible iff its canonical path (via realpath, with all symlinks followed) is identical to or a path-component subdir of one of admin's entries. Inadmissible entries are stripped with a warning:

WARNING: User config ALLOWED_PROJECT_PARENTS entry '/tmp/foo' is not under any admin-allowed parent — rejected.
  Admin-allowed: /home/dotto

If every user-requested entry is rejected (or the user's array is empty), the sandbox refuses to start rather than continuing with no admissible project locations.

Admin-config error handling — fail-closed. A missing admin file is treated as the narrowing default / (no narrowing — the user's list passes through). A present-but-malformed admin file refuses sandbox startup with a clear error: parse error during bash -n, runtime error during source, ALLOWED_PROJECT_PARENTS is not an indexed array, an entry is not absolute, or an entry contains command substitution. The boundary is explicit and security-relevant: missing admin → permissive default; malformed admin → fail-closed (no fall-through).

Admin Config Skeleton

The sandbox-admin.conf shipped with the install is a minimal starting point. It contains only the enforcement-only knobs (DENIED_WRITABLE_PATHS, BLOCKED_*, ALLOWED_PROJECT_PARENTS, etc.) with commented-out examples. Uncomment and edit what you need.

See sandbox-admin.conf for the full skeleton.

Environment overrides: Users can override SLURM_SCOPE and HOME_ACCESS at launch time without editing any config file: SLURM_SCOPE=session agent-sandbox claude or HOME_ACCESS=tmpwrite agent-sandbox bash. Environment values take precedence over both admin and user configs.

Example User Config

Users create ~/.config/agent-sandbox/user.conf to add project-specific mounts and tools:

# ~/.config/agent-sandbox/user.conf — User customization
# Adds to admin baseline. Cannot remove admin-enforced entries.

# Additional data I need
READONLY_MOUNTS+=(
    "/shared/reference_genomes"
)

# My editor config
HOME_READONLY+=(
    ".vimrc"
    ".vim"
)

# Extra scratch space
EXTRA_WRITABLE_PATHS+=(
    "/fh/scratch/delete30/mylab/pipeline-output"
)

Note the += syntax — this appends to the admin's arrays. Using =() to replace an enforced array has no effect — the admin values are forcefully re-applied.

Per-Project Overrides

Per-project configs in ~/.config/agent-sandbox/conf.d/*.conf are user-controlled and subject to the same post-merge validation as user.conf (cannot remove admin-enforced entries).

Example — a project-specific config that conditionally activates based on project path:

# conf.d/genomics.conf
[[ "$_PROJECT_DIR" == /fh/fast/mylab/genomics/* ]] || return 0

READONLY_MOUNTS+=(
    "/fh/fast/shared/reference_genomes"
)
EXTRA_WRITABLE_PATHS+=(
    "/fh/scratch/delete30/mylab/pipeline-output"
)

These configs also run in isolated subprocesses and go through admin enforcement — they cannot remove admin-set entries.

Chaperon: Slurm Proxy

The chaperon is a zero-trust Slurm proxy that sits between the sandboxed agent and the real Slurm commands. Inside the sandbox, Slurm binaries (sbatch, srun, scancel, squeue, etc.) are replaced with stubs that communicate with a chaperon process running outside the sandbox via FIFO IPC. The chaperon validates every request against a flag whitelist, wraps submitted jobs to re-enter the sandbox on compute nodes, and scopes squeue/scancel to the agent's own jobs.

Key security properties: - Real Slurm binaries are blocked inside the sandbox (bind-mounted to /dev/null on bwrap, blacklisted on firejail). Munge socket is blocked on bwrap/firejail. Landlock: neither Slurm binaries nor munge socket are blocked — chaperon is fully bypassable; use bwrap or firejail for any deployment that needs a hard Slurm boundary - Dangerous flags (--uid, --prolog, --bcast, --container, --get-user-env) are rejected - Job wrapping: sbatch scripts are inlined via heredoc into a wrapper that calls sandbox-exec.sh on the compute node — no temp files on NFS - Job scoping via --comment tags: squeue/scancel only see jobs submitted by this sandbox session/project (configurable via SLURM_SCOPE) - Scope-widening flags (squeue --me, scancel --all, scancel -u <user>) are silently mapped to "all jobs in your scope" — transparent to the user - All denials include prompt-injection recovery messages that re-anchor the agent to its instructions

See Chaperon for the full protocol, supported commands, and flag whitelists.

Testing

Two test suites validate the sandbox:

  • test.sh — run on every install to verify backend isolation: filesystem, env blocking, agent overlays, chaperon Slurm proxy (flag validation, job submission, scoped cancel, transparent squeue/scancel, comment tag stripping), security hardening, symlink/hardlink attacks, /proc escapes, FD inheritance, signal isolation, TIOCSTI, cgroup/userns restrictions, deterministic isolation, concurrent instances. Tests all available backends (bwrap, firejail, landlock).
  • test-admin.sh — run on admin installs to verify config enforcement: admin entries survive user tampering, DENIED_WRITABLE_PATHS, HOME_READONLY escalation blocking, scalar protection, HOME override resistance, conf.d enforcement, subprocess isolation of escape attempts, admin Slurm wrappers. Skips automatically if no admin config is found.
bash test.sh                          # all backends
bash test.sh --backend bwrap          # single backend
bash test-admin.sh                    # admin enforcement (needs admin config)
bash test-admin.sh --verbose          # show output on failure

Choosing a Backend on Ubuntu 24.04+

On Ubuntu 24.04+, AppArmor blocks unprivileged user namespaces, so bwrap doesn't work out of the box. The admin has three options:

Option Effort Result
Enable bwrap via AppArmor Low Strongest backend — mount namespace, PID namespace, /tmp isolation, self-protection
Install firejail Low Strong — setuid binary bypasses AppArmor; mount namespace, PID namespace, seccomp
Do nothing None Sandbox falls back to Landlock (weakest — see Landlock fallback below)

Recommendation: Enable bwrap. It provides the strongest isolation, is fully unprivileged (no setuid binary on the system), and has a significantly better security track record (4 CVEs with zero root exploits vs firejail's 18 CVEs with 12 root exploits). Firejail is a fallback if bwrap's AppArmor profile is not desired, but installing it adds a setuid-root binary to every node. See the full CVE comparison for details.

Enabling bwrap via AppArmor profile

Install bwrap and create an AppArmor profile that allows it to create user namespaces:

# 1. Install bwrap
sudo apt install bubblewrap

# 2. Create AppArmor profile
BWRAP_PATH=$(command -v bwrap)   # typically /usr/bin/bwrap

cat > /etc/apparmor.d/bwrap-sandbox << EOF
abi <abi/4.0>,
include <tunables/global>

profile bwrap-sandbox $BWRAP_PATH flags=(unconfined) {
  userns,
}
EOF

sudo apparmor_parser -r /etc/apparmor.d/bwrap-sandbox

This allows bwrap to create user namespaces. Other programs remain restricted. The profile survives reboots. Verify:

# As a regular user — should work after the profile is loaded
bwrap --ro-bind / / -- id

The sandbox auto-detects bwrap from $PATH, or admins can set BWRAP=/path/to/bwrap in sandbox.conf to pin a specific binary. Users can also install bwrap via Homebrew (brew install bubblewrap) — the AppArmor profile would need to include that path too (~/.linuxbrew/bin/bwrap) or a second profile entry.

Note: The AppArmor profile grants userns to any invocation of bwrap at the profiled path, not just sandbox-initiated ones. This is acceptable — bwrap user namespaces are unprivileged and cannot escalate beyond what the calling user already has access to. The sandbox adds filesystem restrictions on top.

Firejail backend (alternative to bwrap)

Firejail installs setuid root, so it can create mount namespaces regardless of AppArmor settings. The sandbox auto-detects firejail when bwrap is unavailable (priority: bwrap > firejail > landlock).

# 1. Install firejail
sudo apt install firejail

# 2. The sandbox auto-detects firejail — no user config needed.
#    Force firejail for testing:
SANDBOX_BACKEND=firejail ./sandbox-exec.sh -- bash

The sandbox uses --allusers to disable firejail's built-in /etc/passwd filtering, which would otherwise remove UIDs >= UID_MIN (typically 1000) and break Slurm if the slurm user has a UID in that range. User enumeration prevention is handled separately by FILTER_PASSWD=true (default), which blocks NSS daemon sockets to prevent LDAP/AD enumeration. Caveat: on LDAP/AD clusters where the current user exists only in LDAP (not in local /etc/passwd), FILTER_PASSWD=true breaks user resolution and should be set to false. The bwrap backend handles LDAP users correctly via /etc/passwd overlay.

/tmp isolation (--private-tmp): Enabled by default for both bwrap and firejail (controlled by PRIVATE_TMP in sandbox.conf). Breaks MPI shared-memory transport (OpenMPI, MPICH) and NCCL inter-GPU sockets. Set PRIVATE_TMP=false in sandbox.conf for multi-rank MPI or multi-GPU workloads.

Supplementary groups: Preserved (no --nogroups). HPC file access relies on supplementary groups for lab data directories.

bwrap vs firejail comparison

Capability bwrap firejail
Privilege model Unprivileged (user namespaces) Setuid root binary
Mount namespace
PID namespace
/tmp isolation ✓ (--tmpfs /tmp) ✓ (--private-tmp)
Sandbox self-protection ✓ (scripts read-only via bind mount) ✓ (scripts hidden via mount namespace)
User enumeration filtering ✓ (overlays /etc/passwd + nsswitch.conf, LDAP-safe) Partial (blacklists NSS sockets, but breaks LDAP-only users)
Slurm binary relocation ✓ (overlays /usr/bin/sbatch with redirector) PATH-based only (no overlay)
Seccomp Generated BPF filter (generate-seccomp.py) — see below Built-in (--seccomp + --caps.drop=all)
Internal state exposure None /run/firejail/mnt/seccomp/ readable (reveals BPF filter)
Attack surface Minimal, no setuid Setuid root binary on every node
CVE history 4 CVEs, 0 root exploits, none since 2020 18 CVEs, 12 local root exploits (details)
Supplementary groups Display as nogroup (user namespace limitation — file perms unaffected) Correct display (setuid avoids user namespace)
AppArmor on Ubuntu 24.04+ Requires admin AppArmor profile Works without admin action

Seccomp Filter — HPC Compatibility

All three backends include seccomp filters that block dangerous syscalls. Firejail and Landlock have built-in filters; bwrap loads a generated BPF filter via --seccomp FD (see generate-seccomp.py).

What is blocked

The filters block two groups of syscalls:

  1. Core attack-surface denialsio_uring_{setup,enter,register}, userfaultfd, kexec_load/kexec_file_load. The io_uring block provides the main security value; it has a large kernel attack surface and Docker's default seccomp profile blocks it since version 25.0.

  2. Defense-in-depth setbpf, mount, umount2, pivot_root, reboot, swapon/swapoff, personality, acct, quotactl, kcmp. Each of these is already rejected at the capability layer for an unprivileged sandboxed process; denying them at the seccomp layer too is belt-and-suspenders in case a kernel bug or misconfiguration ever leaks the gating capability. Zero observable effect on HPC/ML workloads — see SECURITY.md §Seccomp Filter for the per-syscall justification.

  3. Argument-filtered ioctl denials (bwrap)ioctl(TIOCSTI) and ioctl(TIOCLINUX). The BPF program inspects the ioctl cmd argument and returns EPERM for the keystroke-injection requests that drove CVE-2017-5226 (bwrap) and CVE-2023-1523 (Snap). Other ioctl requests (TIOCGWINSZ, FIONBIO, GPU ioctls, …) are unaffected. See SECURITY.md §Argument-filtered ioctl denials for detail.

The Landlock backend additionally denies ptrace and process_vm_readv/writev because it has no PID namespace to prevent sibling-process inspection. bwrap and firejail rely on PID namespacing for that.

Tool Uses io_uring When blocked Impact
Node.js / libuv Yes — async file I/O (libuv PR #3952) Falls back to epoll + threadpool None — transparent fallback
RocksDB Yes — parallel SST reads (io_posix.cc) Falls back to synchronous pread Minor — slightly slower bulk reads
QEMU Yes — block I/O backend (block/io_uring.c) Falls back to aio=threads Minor — slightly slower disk I/O
Rust tokio-uring Yes — io_uring-only runtime (io-uring crate) No fallback — fails Breaking — but standard tokio (epoll) is unaffected
DuckDB, SQLite No None

userfaultfd lets a process intercept page faults in userspace, pausing the faulting kernel thread indefinitely. Attackers exploit this to create arbitrary-width race windows for TOCTOU and use-after-free exploits (e.g. CVE-2021-22555, CVE-2024-1086). Docker blocks it, and the kernel restricts unprivileged access by default since 5.11 (vm.unprivileged_userfaultfd=0). No HPC tools use it — only QEMU postcopy live migration and CRIU lazy restore are affected, both of which fall back gracefully.

What is intentionally allowed

Syscall Used by Security risk (accepted)
memfd_create CUDA / ROCm GPU drivers, PyTorch shared memory, Numba JIT, JAX/XLA compiler, OpenJDK ZGC Anonymous executable memory regions
process_vm_readv/writev OpenMPI CMA transport, strace, gdb Cross-process memory access (mitigated by PID namespace in bwrap/firejail)

Blocking memfd_create would silently break CUDA, PyTorch DataLoader, Numba, and JAX/XLA. Docker's default seccomp profile also allows it. Blocking process_vm_readv would break MPI CMA shared-memory transport (Docker allows it on kernel >= 4.8). The filesystem sandbox remains the primary isolation mechanism; seccomp is defense-in-depth.

Seccomp for bwrap

bwrap's mount namespace + PID namespace + no_new_privs already provide strong containment. A seccomp filter closes two remaining gaps:

Syscall Already mitigated by Impact of blocking
io_uring_setup / io_uring_enter Nothing — real attack surface reduction Node.js falls back to epoll, RocksDB falls back to pread. tokio-uring (Rust) would fail, but standard tokio is unaffected
userfaultfd Kernel restricts unprivileged use since 5.11, but user-mode faults still allowed No HPC tools use it
kexec_load / kexec_file_load no_new_privs (requires CAP_SYS_BOOT) None — already ineffective without capabilities
ioctl(TIOCSTI) / ioctl(TIOCLINUX) Kernel CONFIG_LEGACY_TIOCSTI=n (6.2+) or dev.tty.legacy_tiocsti=0 — but only on opted-in hosts; LTS HPC kernels (5.4, 5.15) leave it open None — no legitimate workload simulates terminal input or pastes from the Linux console

Adding a seccomp filter to bwrap is reasonable and aligns with Docker's precedent.


Landlock Fallback

If neither bwrap nor firejail is available (e.g. Ubuntu 24.04+ without an AppArmor profile or firejail installed), the sandbox falls back to the Landlock backend. Landlock provides kernel-enforced filesystem isolation but lacks the mount namespace that bwrap and firejail use for stronger containment.

Gaps compared to bwrap/firejail

Gap Impact
No mount namespace Blocked paths return EACCES instead of ENOENT; no file overlays (passwd filtering, Slurm binary relocation)
No PID namespace Host processes visible via /proc; agent can read /proc/PID/environ of same-UID processes
No /tmp isolation Shared host /tmp — cross-session data leakage possible
No sandbox self-protection In user-only install, scripts writable under ~/.config/agent-sandbox/. Admin install avoids this — scripts are root-owned, ~/.config/agent-sandbox/ contains only user data.
Unix socket connect() not blocked systemd-run --user escape viable (see below)
User enumeration (LDAP) Cannot overlay /etc/passwd or block NSS sockets

Disable systemd user instances

Landlock cannot block Unix domain socket connect() (not available in any Landlock ABI version as of kernel 6.11). A sandboxed process can connect to /run/user/<UID>/systemd/private and use systemd-run --user to execute commands outside the sandbox. Both bwrap and firejail are unaffected — they replace /run with a tmpfs.

What is affected by disabling: gpg-agent socket activation (users doing GPG signing would need to start gpg-agent --daemon manually) and systemctl --user commands.

# Option A: Mask the user@ template service (recommended)
systemctl mask user@.service

# Option B: Limit via logind
# /etc/systemd/logind.conf.d/no-user-sessions.conf
[Login]
UserTasksMax=0
KillUserProcesses=yes

Option A prevents the user systemd instance from starting at all. Verify with systemd-run --user -- id (should fail with "Failed to connect to bus").