Admin-Owned Sandbox Installation¶

An admin-owned installation solves two problems:

Config protection — the admin config is tamper-proof (root-owned, outside the user's sandbox dir). The sandbox scripts are also root-owned at /app/lib/agent-sandbox/, and ~/.config/agent-sandbox/ contains only user data (config, agent profiles), not scripts. With bwrap/firejail the scripts are additionally read-only inside the sandbox via mount namespace.
Policy enforcement — the admin sets a security baseline that users cannot weaken. Users can customize within bounds (add data mounts, extra blocked paths) but cannot remove admin-enforced protections.

Quick Setup¶

A secure admin installation has two parts: a root-owned config (tamper-proof policy) and protected scripts (the code that enforces it). A root-owned config alone is not sufficient — if the agent can modify sandbox-lib.sh, it can bypass any config.

Step 1: Install scripts and config¶

# Install to the default admin prefix (/app — matches _ADMIN_DIR
# in sandbox-lib.sh). Puts agent-sandbox in /app/bin/ and runtime
# files in /app/lib/agent-sandbox/.
sudo make install PREFIX=/app

# Start from the minimal admin skeleton (not the full user config).
# Only set what you want to ENFORCE — users control everything else.
sudo cp /app/lib/agent-sandbox/sandbox-admin.conf /app/lib/agent-sandbox/sandbox.conf
sudo $EDITOR /app/lib/agent-sandbox/sandbox.conf

The install ships three config files:

File	Purpose
`sandbox.conf`	Admin enforcement baseline (replace with `sandbox-admin.conf` skeleton)
`sandbox.conf.template`	Full user config, auto-deployed to `~/.config/agent-sandbox/sandbox.conf` on first run. Never modified by admins — ensures users always get the complete documented config.
`sandbox-admin.conf`	Minimal skeleton with only enforcement knobs. Copy over `sandbox.conf` to use as admin baseline.

The Makefile handles directory creation, permissions, and the agent-sandbox symlink. If your site uses a different prefix (e.g. /usr/local), change _ADMIN_DIR in sandbox-lib.sh to match.

On first run, users automatically get sandbox.conf (from sandbox.conf.template) and agent templates (agent.md, settings.json) in ~/.config/agent-sandbox/. On upgrade, unmodified copies are silently updated; user-edited files are preserved with a message pointing to the new version. Users customize via sandbox.conf, user.conf, and conf.d/*.conf. Agent overlays run in subshells, so mutations to permission globals are structurally impossible — per-agent profiles cannot bypass admin-enforced policy.

Each agent profile directory (agents/<name>/) follows a file contract:

File	Purpose
`config.conf`	Declarative metadata only — env vars, auth markers, and paths the agent uses. Read for startup warnings; MUST NOT modify sandbox permissions.
`overlay.sh`	Mechanical config merge (e.g., merge `CLAUDE.md`, create `settings.json`) and env-var exports. Runs in a subshell — mutations to permission globals cannot reach the parent.
`agent.md`	Sandbox-awareness instructions injected into the agent's context
`settings.json`	Agent-specific settings template (optional, agent-dependent)

Permissions live in the sandbox configuration layer. HOME_WRITABLE, HOME_READONLY, BLOCKED_FILES, BLOCKED_ENV_VARS, and ALLOWED_ENV_VARS are set by the admin config (/app/lib/agent-sandbox/sandbox.conf), the user config (~/.config/agent-sandbox/sandbox.conf or user.conf), and per-project overrides (conf.d/*.conf) — each layer adds to the previous. Admin-enforced entries cannot be weakened by user config or by any agent profile.

What this protects¶

Component	bwrap/firejail	Landlock
Admin config (`/app/lib/agent-sandbox/sandbox.conf`)	Protected (root-owned, enforced via subprocess isolation)	Protected (root-owned, enforced via subprocess isolation)
Sandbox scripts (`sandbox-lib.sh`, backends/)	Protected (root-owned + read-only inside sandbox via mount namespace)	Protected (root-owned at `/app/lib/agent-sandbox/`). `~/.config/agent-sandbox/` contains only user data, not scripts.
User config (`user.conf`, `conf.d/`)	Cannot weaken admin policy (subprocess isolation + policy merge)	Cannot weaken admin policy (subprocess isolation + policy merge)

The admin path is set in _ADMIN_DIR in sandbox-lib.sh (not configurable via environment variable). To use a different path, change this single line.

Config Hierarchy¶

The sandbox loads config in layers, each adding to the previous:

1. Defaults           (built into sandbox-lib.sh)
2. Admin config       (/app/lib/agent-sandbox/sandbox.conf)  ← security baseline (if present)
3. User config        (~/.config/agent-sandbox/user.conf)          ← additive customization
4. Per-project config (~/.config/agent-sandbox/conf.d/*.conf)      ← project-specific additions

Without an admin config, the sandbox loads a single sandbox.conf from ~/.config/agent-sandbox/, identical to the user-only install (layers 2 and 3 collapse into one). When an admin config is present but the user has not yet created user.conf, the sandbox accepts ~/.config/agent-sandbox/sandbox.conf as user config — this eases the transition when an admin install is deployed after users have already customized sandbox.conf.

What users can customize¶

Setting	User can add entries	User can remove admin entries
`ALLOWED_PROJECT_PARENTS`	No — narrowing-only. User can only restrict admin's list (entries whose canonical path is under an admin entry). Paths outside admin's tree are stripped with a warning.	Implicitly yes — by listing fewer or narrower entries.
`READONLY_MOUNTS`	Yes — mount more data read-only	N/A (additive)
`EXTRA_WRITABLE_PATHS`	Yes — add writable directories (subject to `DENIED_WRITABLE_PATHS`)	N/A (additive)
`DENIED_WRITABLE_PATHS`	No	No — admin-only deny-list
`HOME_READONLY`	Yes — expose more dotfiles	N/A (additive)
`HOME_WRITABLE`	Yes (but not items in admin's `HOME_READONLY`)	N/A (additive)
`BLOCKED_FILES`	Yes — block more files	No — restored with warning
`BLOCKED_ENV_VARS`	Yes — block more env vars	No — restored with warning
`BLOCKED_ENV_PATTERNS`	Yes — add more glob patterns	No — restored with warning
`EXTRA_BLOCKED_PATHS`	Yes — block more paths	No — restored with warning
`ALLOWED_ENV_VARS`	Yes — unblock specific env vars	N/A (additive)
`PRIVATE_TMP`	Yes	Yes
`BIND_DEV_PTS`	Yes	Yes
`FILTER_PASSWD`	Yes	Yes
`SANDBOX_BACKEND`	Yes	Yes
`SLURM_SCOPE`	Yes	Yes
`HOME_ACCESS`	Yes	Yes

Admin enforcement: subprocess isolation + policy merge¶

User configs (user.conf, conf.d/*.conf) are loaded in an isolated subprocess (/bin/bash --norc --noprofile). Only known config variables are extracted via declare -p and validated before being applied in the parent. After each untrusted config layer, _enforce_admin_policy() compares the resulting values against the admin snapshot, restores admin entries, and merges user additions on top.

This eliminates entire attack classes: function overrides (source, eval, builtin), DEBUG/RETURN traps, exit/return escapes, IFS manipulation, and background processes — none can escape the subprocess boundary. The merge logic runs in the parent shell, unreachable from user config.

Enforced arrays (BLOCKED_FILES, BLOCKED_ENV_VARS, BLOCKED_ENV_PATTERNS, EXTRA_BLOCKED_PATHS): admin entries are always present. User additions are preserved, but user removals are undone with a warning:

WARNING: User config removed admin-enforced BLOCKED_ENV_VARS entry 'GITHUB_TOKEN' — restored.

HOME_READONLY → HOME_WRITABLE escalation: if a user config moves an admin read-only entry to writable, the escalation is reverted with a warning:

WARNING: User config moved admin HOME_READONLY entry '.gnupg' to HOME_WRITABLE — reverted.

DENIED_WRITABLE_PATHS: any EXTRA_WRITABLE_PATHS entry matching or under a denied path is stripped with a warning:

WARNING: User config added EXTRA_WRITABLE_PATHS entry '/etc/cron.d' under denied path '/etc' — removed.

ALLOWED_PROJECT_PARENTS (narrowing-only): the user can only restrict admin's allow-list, never expand it. A user-supplied entry is admissible iff its canonical path (via realpath, with all symlinks followed) is identical to or a path-component subdir of one of admin's entries. Inadmissible entries are stripped with a warning:

WARNING: User config ALLOWED_PROJECT_PARENTS entry '/tmp/foo' is not under any admin-allowed parent — rejected.
  Admin-allowed: /home/dotto

If every user-requested entry is rejected (or the user's array is empty), the sandbox refuses to start rather than continuing with no admissible project locations.

Admin-config error handling — fail-closed. A missing admin file is treated as the narrowing default / (no narrowing — the user's list passes through). A present-but-malformed admin file refuses sandbox startup with a clear error: parse error during bash -n, runtime error during source, ALLOWED_PROJECT_PARENTS is not an indexed array, an entry is not absolute, or an entry contains command substitution. The boundary is explicit and security-relevant: missing admin → permissive default; malformed admin → fail-closed (no fall-through).

Admin Config Skeleton¶

The sandbox-admin.conf shipped with the install is a minimal starting point. It contains only the enforcement-only knobs (DENIED_WRITABLE_PATHS, BLOCKED_*, ALLOWED_PROJECT_PARENTS, etc.) with commented-out examples. Uncomment and edit what you need.

See sandbox-admin.conf for the full skeleton.

Environment overrides: Users can override SLURM_SCOPE and HOME_ACCESS at launch time without editing any config file: SLURM_SCOPE=session agent-sandbox claude or HOME_ACCESS=tmpwrite agent-sandbox bash. Environment values take precedence over both admin and user configs.

Example User Config¶

Users create ~/.config/agent-sandbox/user.conf to add project-specific mounts and tools:

# ~/.config/agent-sandbox/user.conf — User customization
# Adds to admin baseline. Cannot remove admin-enforced entries.

# Additional data I need
READONLY_MOUNTS+=(
    "/shared/reference_genomes"
)

# My editor config
HOME_READONLY+=(
    ".vimrc"
    ".vim"
)

# Extra scratch space
EXTRA_WRITABLE_PATHS+=(
    "/fh/scratch/delete30/mylab/pipeline-output"
)

Note the += syntax — this appends to the admin's arrays. Using =() to replace an enforced array has no effect — the admin values are forcefully re-applied.

Per-Project Overrides¶

Per-project configs in ~/.config/agent-sandbox/conf.d/*.conf are user-controlled and subject to the same post-merge validation as user.conf (cannot remove admin-enforced entries).

Example — a project-specific config that conditionally activates based on project path:

# conf.d/genomics.conf
[[ "$_PROJECT_DIR" == /fh/fast/mylab/genomics/* ]] || return 0

READONLY_MOUNTS+=(
    "/fh/fast/shared/reference_genomes"
)
EXTRA_WRITABLE_PATHS+=(
    "/fh/scratch/delete30/mylab/pipeline-output"
)

These configs also run in isolated subprocesses and go through admin enforcement — they cannot remove admin-set entries.

Chaperon: Slurm Proxy¶

The chaperon is a zero-trust Slurm proxy that sits between the sandboxed agent and the real Slurm commands. Inside the sandbox, Slurm binaries (sbatch, srun, scancel, squeue, etc.) are replaced with stubs that communicate with a chaperon process running outside the sandbox via FIFO IPC. The chaperon validates every request against a flag whitelist, wraps submitted jobs to re-enter the sandbox on compute nodes, and scopes squeue/scancel to the agent's own jobs.

Key security properties: - Real Slurm binaries are blocked inside the sandbox (bind-mounted to /dev/null on bwrap, blacklisted on firejail). Munge socket is blocked on bwrap/firejail. Landlock: neither Slurm binaries nor munge socket are blocked — chaperon is fully bypassable; use bwrap or firejail for any deployment that needs a hard Slurm boundary - Dangerous flags (--uid, --prolog, --bcast, --container, --get-user-env) are rejected - Job wrapping: sbatch scripts are inlined via heredoc into a wrapper that calls sandbox-exec.sh on the compute node — no temp files on NFS - Job scoping via --comment tags: squeue/scancel only see jobs submitted by this sandbox session/project (configurable via SLURM_SCOPE) - Scope-widening flags (squeue --me, scancel --all, scancel -u <user>) are silently mapped to "all jobs in your scope" — transparent to the user - All denials include prompt-injection recovery messages that re-anchor the agent to its instructions

See Chaperon for the full protocol, supported commands, and flag whitelists.

Testing¶

Two test suites validate the sandbox:

test.sh — run on every install to verify backend isolation: filesystem, env blocking, agent overlays, chaperon Slurm proxy (flag validation, job submission, scoped cancel, transparent squeue/scancel, comment tag stripping), security hardening, symlink/hardlink attacks, /proc escapes, FD inheritance, signal isolation, TIOCSTI, cgroup/userns restrictions, deterministic isolation, concurrent instances. Tests all available backends (bwrap, firejail, landlock).
test-admin.sh — run on admin installs to verify config enforcement: admin entries survive user tampering, DENIED_WRITABLE_PATHS, HOME_READONLY escalation blocking, scalar protection, HOME override resistance, conf.d enforcement, subprocess isolation of escape attempts, admin Slurm wrappers. Skips automatically if no admin config is found.

bash test.sh                          # all backends
bash test.sh --backend bwrap          # single backend
bash test-admin.sh                    # admin enforcement (needs admin config)
bash test-admin.sh --verbose          # show output on failure

Choosing a Backend on Ubuntu 24.04+¶

On Ubuntu 24.04+, AppArmor blocks unprivileged user namespaces, so bwrap doesn't work out of the box. The admin has three options:

Option	Effort	Result
Enable bwrap via AppArmor	Low	Strongest backend — mount namespace, PID namespace, `/tmp` isolation, self-protection
Install firejail	Low	Strong — setuid binary bypasses AppArmor; mount namespace, PID namespace, seccomp
Do nothing	None	Sandbox falls back to Landlock (weakest — see Landlock fallback below)

Recommendation: Enable bwrap. It provides the strongest isolation, is fully unprivileged (no setuid binary on the system), and has a significantly better security track record (4 CVEs with zero root exploits vs firejail's 18 CVEs with 12 root exploits). Firejail is a fallback if bwrap's AppArmor profile is not desired, but installing it adds a setuid-root binary to every node. See the full CVE comparison for details.

Enabling bwrap via AppArmor profile¶

Install bwrap and create an AppArmor profile that allows it to create user namespaces:

# 1. Install bwrap
sudo apt install bubblewrap

# 2. Create AppArmor profile
BWRAP_PATH=$(command -v bwrap)   # typically /usr/bin/bwrap

cat > /etc/apparmor.d/bwrap-sandbox << EOF
abi <abi/4.0>,
include <tunables/global>

profile bwrap-sandbox $BWRAP_PATH flags=(unconfined) {
  userns,
}
EOF

sudo apparmor_parser -r /etc/apparmor.d/bwrap-sandbox

This allows bwrap to create user namespaces. Other programs remain restricted. The profile survives reboots. Verify:

# As a regular user — should work after the profile is loaded
bwrap --ro-bind / / -- id

The sandbox auto-detects bwrap from $PATH, or admins can set BWRAP=/path/to/bwrap in sandbox.conf to pin a specific binary. Users can also install bwrap via Homebrew (brew install bubblewrap) — the AppArmor profile would need to include that path too (~/.linuxbrew/bin/bwrap) or a second profile entry.

Note: The AppArmor profile grants userns to any invocation of bwrap at the profiled path, not just sandbox-initiated ones. This is acceptable — bwrap user namespaces are unprivileged and cannot escalate beyond what the calling user already has access to. The sandbox adds filesystem restrictions on top.

Firejail backend (alternative to bwrap)¶

Firejail installs setuid root, so it can create mount namespaces regardless of AppArmor settings. The sandbox auto-detects firejail when bwrap is unavailable (priority: bwrap > firejail > landlock).

# 1. Install firejail
sudo apt install firejail

# 2. The sandbox auto-detects firejail — no user config needed.
#    Force firejail for testing:
SANDBOX_BACKEND=firejail ./sandbox-exec.sh -- bash

The sandbox uses --allusers to disable firejail's built-in /etc/passwd filtering, which would otherwise remove UIDs >= UID_MIN (typically 1000) and break Slurm if the slurm user has a UID in that range. User enumeration prevention is handled separately by FILTER_PASSWD=true (default), which blocks NSS daemon sockets to prevent LDAP/AD enumeration. Caveat: on LDAP/AD clusters where the current user exists only in LDAP (not in local /etc/passwd), FILTER_PASSWD=true breaks user resolution and should be set to false. The bwrap backend handles LDAP users correctly via /etc/passwd overlay.

/tmp isolation (--private-tmp): Enabled by default for both bwrap and firejail (controlled by PRIVATE_TMP in sandbox.conf). Breaks MPI shared-memory transport (OpenMPI, MPICH) and NCCL inter-GPU sockets. Set PRIVATE_TMP=false in sandbox.conf for multi-rank MPI or multi-GPU workloads.

Supplementary groups: Preserved (no --nogroups). HPC file access relies on supplementary groups for lab data directories.

bwrap vs firejail comparison¶

Capability	bwrap	firejail
Privilege model	Unprivileged (user namespaces)	Setuid root binary
Mount namespace	✓	✓
PID namespace	✓	✓
`/tmp` isolation	✓ (`--tmpfs /tmp`)	✓ (`--private-tmp`)
Sandbox self-protection	✓ (scripts read-only via bind mount)	✓ (scripts hidden via mount namespace)
User enumeration filtering	✓ (overlays `/etc/passwd` + `nsswitch.conf`, LDAP-safe)	Partial (blacklists NSS sockets, but breaks LDAP-only users)
Slurm binary relocation	✓ (overlays `/usr/bin/sbatch` with redirector)	PATH-based only (no overlay)
Seccomp	Generated BPF filter (`generate-seccomp.py`) — see below	Built-in (`--seccomp` + `--caps.drop=all`)
Internal state exposure	None	`/run/firejail/mnt/seccomp/` readable (reveals BPF filter)
Attack surface	Minimal, no setuid	Setuid root binary on every node
CVE history	4 CVEs, 0 root exploits, none since 2020	18 CVEs, 12 local root exploits (details)
Supplementary groups	Display as `nogroup` (user namespace limitation — file perms unaffected)	Correct display (setuid avoids user namespace)
AppArmor on Ubuntu 24.04+	Requires admin AppArmor profile	Works without admin action

Seccomp Filter — HPC Compatibility¶

All three backends include seccomp filters that block dangerous syscalls. Firejail and Landlock have built-in filters; bwrap loads a generated BPF filter via --seccomp FD (see generate-seccomp.py).

What is blocked¶

The filters block two groups of syscalls:

Core attack-surface denials — io_uring_{setup,enter,register}, userfaultfd, kexec_load/kexec_file_load. The io_uring block provides the main security value; it has a large kernel attack surface and Docker's default seccomp profile blocks it since version 25.0.
Defense-in-depth set — bpf, mount, umount2, pivot_root, reboot, swapon/swapoff, personality, acct, quotactl, kcmp. Each of these is already rejected at the capability layer for an unprivileged sandboxed process; denying them at the seccomp layer too is belt-and-suspenders in case a kernel bug or misconfiguration ever leaks the gating capability. Zero observable effect on HPC/ML workloads — see SECURITY.md §Seccomp Filter for the per-syscall justification.
Argument-filtered ioctl denials (bwrap) — ioctl(TIOCSTI) and ioctl(TIOCLINUX). The BPF program inspects the ioctl cmd argument and returns EPERM for the keystroke-injection requests that drove CVE-2017-5226 (bwrap) and CVE-2023-1523 (Snap). Other ioctl requests (TIOCGWINSZ, FIONBIO, GPU ioctls, …) are unaffected. See SECURITY.md §Argument-filtered ioctl denials for detail.

The Landlock backend additionally denies ptrace and process_vm_readv/writev because it has no PID namespace to prevent sibling-process inspection. bwrap and firejail rely on PID namespacing for that.

Tool	Uses `io_uring`	When blocked	Impact
Node.js / libuv	Yes — async file I/O (libuv PR #3952)	Falls back to epoll + threadpool	None — transparent fallback
RocksDB	Yes — parallel SST reads (io_posix.cc)	Falls back to synchronous `pread`	Minor — slightly slower bulk reads
QEMU	Yes — block I/O backend (block/io_uring.c)	Falls back to `aio=threads`	Minor — slightly slower disk I/O
Rust tokio-uring	Yes — io_uring-only runtime (io-uring crate)	No fallback — fails	Breaking — but standard tokio (epoll) is unaffected
DuckDB, SQLite	No	—	None

userfaultfd lets a process intercept page faults in userspace, pausing the faulting kernel thread indefinitely. Attackers exploit this to create arbitrary-width race windows for TOCTOU and use-after-free exploits (e.g. CVE-2021-22555, CVE-2024-1086). Docker blocks it, and the kernel restricts unprivileged access by default since 5.11 (vm.unprivileged_userfaultfd=0). No HPC tools use it — only QEMU postcopy live migration and CRIU lazy restore are affected, both of which fall back gracefully.

What is intentionally allowed¶

Syscall	Used by	Security risk (accepted)
`memfd_create`	CUDA / ROCm GPU drivers, PyTorch shared memory, Numba JIT, JAX/XLA compiler, OpenJDK ZGC	Anonymous executable memory regions
`process_vm_readv/writev`	OpenMPI CMA transport, strace, gdb	Cross-process memory access (mitigated by PID namespace in bwrap/firejail)

Blocking memfd_create would silently break CUDA, PyTorch DataLoader, Numba, and JAX/XLA. Docker's default seccomp profile also allows it. Blocking process_vm_readv would break MPI CMA shared-memory transport (Docker allows it on kernel >= 4.8). The filesystem sandbox remains the primary isolation mechanism; seccomp is defense-in-depth.

Seccomp for bwrap¶

bwrap's mount namespace + PID namespace + no_new_privs already provide strong containment. A seccomp filter closes two remaining gaps:

Syscall	Already mitigated by	Impact of blocking
`io_uring_setup` / `io_uring_enter`	Nothing — real attack surface reduction	Node.js falls back to epoll, RocksDB falls back to `pread`. tokio-uring (Rust) would fail, but standard tokio is unaffected
`userfaultfd`	Kernel restricts unprivileged use since 5.11, but user-mode faults still allowed	No HPC tools use it
`kexec_load` / `kexec_file_load`	`no_new_privs` (requires `CAP_SYS_BOOT`)	None — already ineffective without capabilities
`ioctl(TIOCSTI)` / `ioctl(TIOCLINUX)`	Kernel `CONFIG_LEGACY_TIOCSTI=n` (6.2+) or `dev.tty.legacy_tiocsti=0` — but only on opted-in hosts; LTS HPC kernels (5.4, 5.15) leave it open	None — no legitimate workload simulates terminal input or pastes from the Linux console

Adding a seccomp filter to bwrap is reasonable and aligns with Docker's precedent.

Landlock Fallback¶

If neither bwrap nor firejail is available (e.g. Ubuntu 24.04+ without an AppArmor profile or firejail installed), the sandbox falls back to the Landlock backend. Landlock provides kernel-enforced filesystem isolation but lacks the mount namespace that bwrap and firejail use for stronger containment.

Gaps compared to bwrap/firejail¶

Gap	Impact
No mount namespace	Blocked paths return EACCES instead of ENOENT; no file overlays (passwd filtering, Slurm binary relocation)
No PID namespace	Host processes visible via `/proc`; agent can read `/proc/PID/environ` of same-UID processes
No `/tmp` isolation	Shared host `/tmp` — cross-session data leakage possible
No sandbox self-protection	In user-only install, scripts writable under `~/.config/agent-sandbox/`. Admin install avoids this — scripts are root-owned, `~/.config/agent-sandbox/` contains only user data.
Unix socket `connect()` not blocked	`systemd-run --user` escape viable (see below)
User enumeration (LDAP)	Cannot overlay `/etc/passwd` or block NSS sockets

Disable systemd user instances¶

Landlock cannot block Unix domain socket connect() (not available in any Landlock ABI version as of kernel 6.11). A sandboxed process can connect to /run/user/<UID>/systemd/private and use systemd-run --user to execute commands outside the sandbox. Both bwrap and firejail are unaffected — they replace /run with a tmpfs.

What is affected by disabling: gpg-agent socket activation (users doing GPG signing would need to start gpg-agent --daemon manually) and systemctl --user commands.

# Option A: Mask the user@ template service (recommended)
systemctl mask user@.service

# Option B: Limit via logind
# /etc/systemd/logind.conf.d/no-user-sessions.conf
[Login]
UserTasksMax=0
KillUserProcesses=yes

Option A prevents the user systemd instance from starting at all. Verify with systemd-run --user -- id (should fail with "Failed to connect to bus").