Batch `stat` calls in `find -exec` by replacing `\;` with `+` by jeanschmidt · Pull Request #340 · actions/runner-container-hooks

jeanschmidt · 2026-04-22T21:06:10Z

Summary

Batch stat calls in find -exec by replacing \; with + in listDirAllCommand(), reducing process spawns from one-per-file to one-per-batch.

Problem

The listDirAllCommand() function in packages/k8s/src/k8s/utils.ts generates a shell command used to list every file (with its size) under a directory. It is invoked during workspace copy verification in both execCpToPod and execCpFromPod — each of which runs the command up to 15 times in a retry loop on both the runner side (local spawn) and the job pod side (K8s exec). That's potentially 60 invocations of this command per job.

The original command:

find . -type f -not -path '*/_runner_hook_responses*' -exec stat -c '%s %n' {} \;

The \; terminator tells find to spawn a separate stat process for every single file it discovers. For a workspace with 10,000 files, that means 10,000 fork+exec cycles — each creating a new process, loading the stat binary, running it on one file, and exiting. The overhead is almost entirely in process creation, not in the actual stat syscall.

This is especially painful inside Kubernetes job pods, where:

The runner pod is memory-constrained (512Mi), and thousands of short-lived processes spike RSS and page cache churn.
The K8s exec path has per-call latency from the WebSocket round-trip, amplifying the wall-clock cost.
The output is accumulated in a Node.js Writable stream buffer (execCalculateOutputHashSorted), and the drip-feed of one-line-at-a-time from individual stat calls increases GC pressure compared to receiving larger chunks.

Solution

find . -type f -not -path '*/_runner_hook_responses*' -exec stat -c '%s %n' {} +

The + terminator tells find to batch as many filenames as possible into each stat invocation, up to the OS argument-length limit (ARG_MAX, typically 2MB on Linux). For a 10,000-file workspace, this typically results in 1–3 stat processes instead of 10,000.

This is a POSIX-standard feature (find -exec {} + has been in POSIX since 2004 / IEEE Std 1003.1-2004) and is supported by every find implementation used in GitHub Actions runner images (GNU findutils, BusyBox find, macOS find).

Behavioral equivalence

The output is identical: one %s %n (size + filename) line per file. The only difference is how many filenames are passed per stat invocation. Since the downstream consumer (execCalculateOutputHashSorted / localCalculateOutputHashSorted) splits on newlines, sorts, and hashes — the batching is invisible to the hash comparison logic.

What this does NOT change

The output format (unchanged — same stat -c '%s %n' format string)
The hash calculation (unchanged — lines are sorted before hashing, so ordering differences from batching are irrelevant)
The retry/verification logic (unchanged — same 15-attempt loop with 1s delay)
The find filter (unchanged — same -type f -not -path exclusion)

Performance impact

Workspace size	`\;` (before)	`+` (after)	Speedup
100 files	~100 processes	1 process	~10x
1,000 files	~1,000 processes	1–2 processes	~50–100x
10,000 files	~10,000 processes	1–3 processes	~100x+

The savings multiply across the retry loop (up to 15 iterations × 2 sides × 2 copy directions = 60 invocations per job in the worst case).

Files changed

packages/k8s/src/k8s/utils.ts — one-character change in listDirAllCommand(): \\; → +

Test plan

npm run build succeeds (tsc + ncc)
All 22 existing utils tests pass (npx jest -- utils)
In production for pytorch/* org that uses the fork https://github.com/jeanschmidt/runner-container-hooks

- Replace `\;` with `+` in find -exec for stat Using `{} +` batches filenames into fewer stat invocations instead of spawning one process per file, reducing fork/exec overhead in large dirs. Signed-off-by: Jean Schmidt <contato@jschmidt.me>

Copilot

Pull request overview

This PR improves performance of workspace copy verification by batching stat invocations produced by listDirAllCommand() (used when hashing directory contents on both runner and pod sides).

Changes:

Replace find -exec ... {} \; with find -exec ... {} + to batch multiple files per stat process.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add `--` before `{}` in stat command to prevent filenames starting with `-` from being interpreted as options - Add tests for listDirAllCommand covering batched exec, end-of-options marker, directory quoting, path exclusion, and file type filtering Notes: Without the `--` end-of-options marker, files whose names begin with a dash (e.g. `-rf`) could be misinterpreted as flags by `stat`, causing silent failures or incorrect output during workspace file listing. Signed-off-by: Jean Schmidt <contato@jschmidt.me>

perf: batch find -exec calls with {} +

8beca90

- Replace `\;` with `+` in find -exec for stat Using `{} +` batches filenames into fewer stat invocations instead of spawning one process per file, reducing fork/exec overhead in large dirs. Signed-off-by: Jean Schmidt <contato@jschmidt.me>

Copilot AI review requested due to automatic review settings April 22, 2026 21:06

jeanschmidt requested review from a team and nikola-jokic as code owners April 22, 2026 21:06

Copilot started reviewing on behalf of jeanschmidt April 22, 2026 21:06 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread packages/k8s/src/k8s/utils.ts Outdated

Comment thread packages/k8s/src/k8s/utils.ts Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch `stat` calls in `find -exec` by replacing `\;` with `+`#340

Batch `stat` calls in `find -exec` by replacing `\;` with `+`#340
jeanschmidt wants to merge 2 commits intoactions:mainfrom
jeanschmidt:find_exec_batching

jeanschmidt commented Apr 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeanschmidt commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Behavioral equivalence

What this does NOT change

Performance impact

Files changed

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeanschmidt commented Apr 22, 2026 •

edited

Loading