I Built an ArgoCD MCP Server. Here’s Why It’s Different.
I just published an ArgoCD MCP server, and I want to talk about why I bothered when there’s already an official one from argoproj-labs.
The short version: I wanted guardrails. Real ones.
https://github.com/peopleforrester/mcp-k8s-observability-argocd-server
The Problem with Most MCP Servers
Most MCP servers I’ve seen treat all operations the same. Read an application? Delete an application? Same friction level. That’s fine when you’re experimenting on your laptop. It’s less fine at 3 AM when an LLM agent is helping you debug a production outage and you’re too tired to notice it’s about to delete something important.
The official argoproj-labs server has a binary read-only toggle. That’s it. Either you can do everything, or you can only read. No middle ground.
I wanted something that understood the difference between “show me what’s deployed” and “delete this application from production.”
What I Did Differently
Dry-Run by Default
Every write operation previews changes first. You have to explicitly set dry_run=false to actually apply anything. This isn’t about not trusting the LLM—it’s about not trusting myself at 3 AM.
Three Permission Tiers, Not Two
- Tier 1 (Read): Always allowed, rate-limited
- Tier 2 (Write): Requires
MCP_READ_ONLY=false - Tier 3 (Destructive): Requires BOTH confirmation parameters—
confirm=trueANDconfirm_namematching the target
The last one is important. Deleting an application isn’t just “are you sure?” It’s “type the name of what you’re about to delete.” That extra friction is intentional.
Rate Limiting
Configurable limits on API calls per time window. Default is 100 calls per 60 seconds. This exists because LLMs sometimes get stuck in loops. Without rate limiting, a confused agent can hammer your ArgoCD API hundreds of times in seconds. Ask me how I know.
Audit Logging
Structured JSON logs with correlation IDs. Every operation—reads, writes, blocks, errors—gets logged. Optional file-based audit log if you want a paper trail. When something goes wrong, you want to know exactly what the agent did and when.
Secret Masking
Enabled by default. Sensitive values get redacted in output. The LLM doesn’t need to see your actual credentials to help you debug a sync failure.
Single-Cluster Restriction Mode
Optional setting that restricts operations to the default cluster only. Useful when you want to give an agent access to dev but keep it away from prod entirely.
Agent-Friendly Error Messages
Every blocked operation tells you exactly why it was blocked and what to do about it. “To enable: Set MCP_READ_ONLY=false” instead of a generic “permission denied.” LLMs are better at recovering when you give them actionable information.
The Configuration
Here’s the full set of environment variables:
| Variable | Default | Purpose |
|---|---|---|
MCP_READ_ONLY |
true | Block ALL write operations |
MCP_DISABLE_DESTRUCTIVE |
true | Block delete/prune even if writes enabled |
MCP_SINGLE_CLUSTER |
false | Restrict to default cluster only |
MCP_AUDIT_LOG |
(disabled) | Path to audit log file |
MCP_MASK_SECRETS |
true | Redact sensitive values in output |
MCP_RATE_LIMIT_CALLS |
100 | Max API calls per window |
MCP_RATE_LIMIT_WINDOW |
60 | Window size in seconds |
Notice the defaults. Out of the box, you get a read-only server with secret masking and rate limiting. You have to explicitly opt into more dangerous operations.
What This Actually Looks Like
Let’s say you’re using Claude Desktop with this server and you ask it to delete an application.
Without proper confirmation, the agent gets back:
ConfirmationRequired: Deleting application 'my-app' requires confirmation.
To confirm: Set confirm=true AND confirm_name='my-app'
Impact: This will remove the application and all its resources from the cluster.
The agent has to make a second call with both parameters matching. That’s the friction I wanted.
Why This Matters
MCP servers are giving AI agents direct access to your infrastructure. The convenience is real—having an LLM that can actually see your deployments, check sync status, and trigger operations is genuinely useful.
But convenience without guardrails is how incidents happen.
I built this server with the assumption that the agent will occasionally misunderstand what I want. That I’ll occasionally approve something I shouldn’t have. That at some point, someone will be tired and not paying close attention while an LLM is making changes to production.
The goal isn’t to make it impossible to do dangerous things. The goal is to make dangerous things require explicit, unambiguous intent.
Production systems deserve more friction than your laptop.
The server is available now. If you’re running ArgoCD and want to give AI agents access with actual guardrails, check it out.