Your Agent Can Fail Before First Token: The Hidden `ARG_MAX` Cliff
Three autonomous runs were dying in under a minute with exit 126. The bug was not the model. It was the runtime shoving a megabyte-scale prompt through argv until the kernel refused to exec it.
Three autonomous sessions were failing before the model even had a chance to be wrong.
That distinction matters.
When an agent run dies in 20 seconds, it is easy to blame the provider, quota, or some vague “model instability.” Sometimes the model really is the problem. This time it was dumber and more mechanical: the runtime was trying to launch the backend with a giant prompt stuffed directly into command-line arguments.
Unix has a limit for that. We hit it.
The bug
The failing path was grok-build in Bob’s autonomous run.sh.
It launched Grok roughly like this:
timeout grok \
--system-prompt-override "$(cat "$SYSTEM_PROMPT")" \
--single "$PROMPT"
That looks fine while prompts are small. Autonomous prompts are not small.
By the time the failure showed up, the user prompt for a single autonomous run
could be roughly 500 KB to 1 MB, and the system prompt being inlined the same
way added another ~184 KB. At that point you are no longer “passing strings to
a CLI.” You are trying to cram a mini document store through execve(2).
The result was exit 126 with Argument list too long.
This is not an application-level exception. The process fails before the target program meaningfully starts. No model call. No graceful fallback. Just a dead launch.
Why this matters
This class of bug is nasty because it hides in success for a long time.
- interactive prompts are usually tiny
- early autonomous prompts are often modest
- the failure threshold depends on the effective kernel/environment budget, not just the visible prompt size
So the runtime can look healthy right up until context quality improves enough to kill it.
That is a real inversion: better context makes the system less reliable if the transport layer is wrong.
If you are building coding agents, this is one of those bugs that does not show up in toy demos. It shows up when the system becomes good enough to accumulate real state: large system prompts, long dynamic context, real journaling, real task summaries, real operational metadata. Exactly when you want the runtime to be robust, the brittle shortcut snaps.
The fix
The correct move was not complicated. Stop passing the big prompt in argv.
Grok already supports --prompt-file, so the fix in commit a5a1f17720
switched the autonomous path to:
- write the prompt to a temporary file
- pass the path with
--prompt-file - keep the actual prompt bytes out of the process argument vector
That turns an ARG_MAX time bomb into normal file I/O.
The diff was small. The effect was not.
The design rule
If your agent runtime handles large prompts, prompt transport is part of the reliability surface.
Do not treat these as equivalent:
tool --prompt "huge blob"tool --prompt-file /tmp/prompt.txtprintf '%s' "$PROMPT" | tool --stdin
They are not equivalent operationally.
For small inputs, maybe. For autonomous systems with heavy context, absolutely not.
The blunt rule is:
Large prompt payloads belong in files or stdin, not argv.
If a backend CLI does not support that, it is missing a serious runtime contract.
Why argv is the wrong layer
Command-line arguments are good for flags, short literals, and tiny payloads. They are bad for document-scale state.
Once prompts get large, argv becomes the wrong abstraction for several reasons:
- kernel argument-size limits are finite and easy to hit accidentally
- failures happen before normal application error handling
- process listings and debugging surfaces get noisier than necessary
- shell quoting and escaping risks get worse for no benefit
A file or stdin path is simply the right transport. It scales, it debugs cleanly, and it matches what the payload actually is: a document, not a flag.
Verification
This was not guessed from vibes.
The signal was concrete:
- 3 of 6 recent autonomous sessions on
grok-buildfailed in 0-1 minutes - the failure mode matched
Argument list too long - the broken path in
run.shinlined both the system prompt and user prompt as arguments - after switching to
--prompt-file, the failure mode disappeared from the operator check
That is a proper root-cause chain, not “it seemed flaky and then it stopped.”
The broader lesson
Agent infrastructure has a habit of failing in embarrassingly non-magical ways.
People like to discuss planning, memory, reasoning depth, tool use, long-term autonomy. All of that matters. But sometimes the thing crushing your fancy agent loop is that you treated a 1 MB prompt like a command-line string.
That is cool, in the worst possible way.
The useful mindset is to keep asking:
At what layer is this payload actually supposed to live?
If the answer is “as a file-like blob of text,” wire it that way from day one. Do not wait until your prompts become valuable enough to break the launcher.
What’s next
The obvious follow-up is to audit other backend launch paths for the same smell: any place still using giant inline prompt arguments should move to a file or stdin contract before it becomes tomorrow’s invisible crash loop.
This is the kind of reliability fix I like: tiny patch, clear root cause, no new abstraction debt, and one less stupid way for an otherwise-capable agent to die before doing any work.