Monitor and rate-limit your MCP server: analytics & logs
Once agents are calling your tools, you need visibility and guardrails. Here's how to read tool-call logs, watch analytics, and apply rate limits to keep things safe.
Once AI clients start calling your tools, two questions become urgent: what are they doing, and how do I stop one from doing too much? Observability and rate limiting are what turn a working MCP server into a production one. This guide covers reading tool-call logs, watching analytics, and applying limits to keep your upstream API — and your bill — safe.
- →Every client connection is tracked as a session — transport, geography, duration, and tool-call count.
- →Logs record every tool call with its arguments, prompt context, and outcome.
- →Cast mines recurring tool sequences across sessions into reusable heuristics — and even drafts skill suggestions.
- →Those patterns tell you which tools to add next and which to package together.
- →Rate limiting (distributed, not per-instance) protects your upstream from runaway agent loops.
Why observability comes first
An AI deciding which tools to call is, by nature, less predictable than code you wrote. Logs answer the questions that come up constantly: Why did that call fail? Which tool did the model actually pick? Is an agent stuck in a loop? Without them, you're debugging blind.
Reading tool-call logs
Every call through your MCP server is recorded — the tool name, the arguments the model supplied, the upstream response status, and timing. Open the Logs tab to inspect them:
Use the logs to:
- Debug failures — see the exact arguments and the upstream error for any failed call.
- Spot misuse — catch an agent calling the same tool hundreds of times.
- Tune descriptions — if the model keeps picking the wrong tool, the logs show it, and you can fix the tool's description.
- Audit — keep a record of what was accessed and when.
Watching analytics
Where logs are per-call, analytics are the aggregate view. The Analytics tab shows call volume over time, the most-used tools, and error rates — the trends that tell you whether the server is healthy and which capabilities deliver value.
The most-used-tools breakdown is also a product signal: it tells you which integrations matter and which you could retire.
Session tracking: who's connected and from where
Beyond individual calls, Cast tracks each client connection as a session. A session captures the transport the client used (SSE or streamable HTTP), the geography resolved from the connection (country and country code via GeoIP), the user agent, when it connected and disconnected, and how many tool calls it made. That gives you a connection-level view, not just a call-level one.
United States
2m 31s · 14 tool calls
Germany
48s · 6 tool calls
United Kingdom
5m 02s · 22 tool calls
India
19s · 3 tool calls
Sessions answer questions analytics-in-aggregate can't:
- Where is the demand? Geography tells you which regions actually use the server — useful once you're on a branded custom domain.
- How deep is each session? A high tool-call count per session signals real workflows; lots of one-call sessions may mean clients can't find what they need.
- Which transport do clients use? Helps you decide what to support and document.
- Is something still connected? Live vs. ended sessions show current activity at a glance.
Heuristics: learning which tools to add next
This is where the data becomes a feedback loop. Cast doesn't just log calls in isolation — it analyzes the order in which tools are called within a session and finds recurring sequences across many sessions. Each pattern records the tool sequence, how many sessions contained it, and a representative prompt that triggered it.
"show this customer's latest unpaid invoice"
seen in 42 sessions"what's the price of the Pro plan?"
seen in 28 sessionsTwo kinds of insight fall out of this, and both tell you what to do next.
1. Gaps → introduce more tools
When a sequence keeps hitting a tool that isn't enabled — or a call repeatedly fails because the operation an agent wants doesn't exist yet — that's a signal to expose more. The pattern view highlights these gaps so you can turn on the missing tool instead of guessing:
"is this customer's subscription active?"
seen in 19 sessionsHere, agents repeatedly try to follow a customer lookup with a subscription check — but getSubscription was never enabled. The fix is one toggle in the Configure tab, and it's driven by real usage rather than a hunch.
2. Strong patterns → package them as a skill
When a sequence is common and succeeds, it's a candidate to package so clients run it in one step. Cast turns a frequent pattern into a draft skill suggestion — a ready-to-review SKILL.md built from the observed sequence and sample prompts:
Given a customer name or ID, find the customer, list their invoices, and return the most recent unpaid one with amount and due date.
The result of running Cast isn't just a server — it's accumulated knowledge about how AI clients actually use your API. Each session sharpens the next decision about what to expose.
Rate limiting: your safety valve
Agents act in loops, and a misbehaving one can fire requests far faster than a human ever would. Rate limiting caps how many calls can happen in a window, protecting your upstream API from overload and your account from surprise costs.
Why it must be distributed
A naïve in-memory counter breaks the moment your server runs on more than one instance — each instance keeps its own count, so the real limit is multiplied by the number of instances, and counts reset on restart. Production rate limiting uses shared state (such as Redis) so the limit holds across every instance.
Cast enforces limits with shared, self-expiring counters rather than per-instance memory, so a limit means what it says regardless of how many servers are running.
A practical setup
- Turn on logging from day one — you'll want the history when something breaks.
- Set a sensible rate limit before connecting any autonomous agent.
- Review session patterns to find gaps (tools to add) and strong sequences (skills to package).
- Watch analytics and geography weekly to spot error spikes, unused tools, and where demand is.
- Iterate on tool descriptions using what the logs reveal about wrong tool choices.
Run your MCP server with full visibility
Logs, analytics, and rate limiting built in — expose tools with confidence.
Try Cast freeFrequently asked questions
What's logged for each tool call?
Why not just use an in-memory rate limiter?
Can rate limiting break legitimate use?
How do I find out why a tool call failed?
What is a session in Cast?
How does Cast know which tools to add next?
What are the skill suggestions?