Integrationanalyticsrate-limiting

Monitor and rate-limit your MCP server: analytics & logs

Once agents are calling your tools, you need visibility and guardrails. Here's how to read tool-call logs, watch analytics, and apply rate limits to keep things safe.

June 9, 2026·7 min read

Once AI clients start calling your tools, two questions become urgent: what are they doing, and how do I stop one from doing too much? Observability and rate limiting are what turn a working MCP server into a production one. This guide covers reading tool-call logs, watching analytics, and applying limits to keep your upstream API — and your bill — safe.

Key takeaways

→Every client connection is tracked as a session — transport, geography, duration, and tool-call count.
→Logs record every tool call with its arguments, prompt context, and outcome.
→Cast mines recurring tool sequences across sessions into reusable heuristics — and even drafts skill suggestions.
→Those patterns tell you which tools to add next and which to package together.
→Rate limiting (distributed, not per-instance) protects your upstream from runaway agent loops.

Why observability comes first

An AI deciding which tools to call is, by nature, less predictable than code you wrote. Logs answer the questions that come up constantly: Why did that call fail? Which tool did the model actually pick? Is an agent stuck in a loop? Without them, you're debugging blind.

Reading tool-call logs

Every call through your MCP server is recorded — the tool name, the arguments the model supplied, the upstream response status, and timing. Open the Logs tab to inspect them:

Workspace navigationactual UI

overview

upload

configure

connect

analytics

logs

Use the logs to:

Debug failures — see the exact arguments and the upstream error for any failed call.
Spot misuse — catch an agent calling the same tool hundreds of times.
Tune descriptions — if the model keeps picking the wrong tool, the logs show it, and you can fix the tool's description.
Audit — keep a record of what was accessed and when.

Watching analytics

Where logs are per-call, analytics are the aggregate view. The Analytics tab shows call volume over time, the most-used tools, and error rates — the trends that tell you whether the server is healthy and which capabilities deliver value.

Workspace navigationactual UI

overview

upload

configure

connect

analytics

logs

📈

The most-used-tools breakdown is also a product signal: it tells you which integrations matter and which you could retire.

Session tracking: who's connected and from where

Beyond individual calls, Cast tracks each client connection as a session. A session captures the transport the client used (SSE or streamable HTTP), the geography resolved from the connection (country and country code via GeoIP), the user agent, when it connected and disconnected, and how many tool calls it made. That gives you a connection-level view, not just a call-level one.

Analytics → Sessionsactual UI

🇺🇸

United States

2m 31s · 14 tool calls

sselive

🇩🇪

Germany

48s · 6 tool calls

httpended

🇬🇧

United Kingdom

5m 02s · 22 tool calls

sseended

🇮🇳

India

19s · 3 tool calls

httpended

Sessions answer questions analytics-in-aggregate can't:

Where is the demand? Geography tells you which regions actually use the server — useful once you're on a branded custom domain.
How deep is each session? A high tool-call count per session signals real workflows; lots of one-call sessions may mean clients can't find what they need.
Which transport do clients use? Helps you decide what to support and document.
Is something still connected? Live vs. ended sessions show current activity at a glance.

Heuristics: learning which tools to add next

This is where the data becomes a feedback loop. Cast doesn't just log calls in isolation — it analyzes the order in which tools are called within a session and finds recurring sequences across many sessions. Each pattern records the tool sequence, how many sessions contained it, and a representative prompt that triggered it.

Analytics → Patternsactual UI

getCustomer→listInvoices→getInvoice

"show this customer's latest unpaid invoice"

seen in 42 sessions

listProducts→getProduct

"what's the price of the Pro plan?"

seen in 28 sessions

Two kinds of insight fall out of this, and both tell you what to do next.

1. Gaps → introduce more tools

When a sequence keeps hitting a tool that isn't enabled — or a call repeatedly fails because the operation an agent wants doesn't exist yet — that's a signal to expose more. The pattern view highlights these gaps so you can turn on the missing tool instead of guessing:

Analytics → Patternsactual UI

getCustomer→getSubscription?

"is this customer's subscription active?"

seen in 19 sessions

Here, agents repeatedly try to follow a customer lookup with a subscription check — but getSubscription was never enabled. The fix is one toggle in the Configure tab, and it's driven by real usage rather than a hunch.

2. Strong patterns → package them as a skill

When a sequence is common and succeeds, it's a candidate to package so clients run it in one step. Cast turns a frequent pattern into a draft skill suggestion — a ready-to-review SKILL.md built from the observed sequence and sample prompts:

Analytics → Suggested skillauto-generated

✦latest-unpaid-invoice

getCustomer→listInvoices→getInvoice

Given a customer name or ID, find the customer, list their invoices, and return the most recent unpaid one with amount and due date.

✦

The result of running Cast isn't just a server — it's accumulated knowledge about how AI clients actually use your API. Each session sharpens the next decision about what to expose.

Rate limiting: your safety valve

Agents act in loops, and a misbehaving one can fire requests far faster than a human ever would. Rate limiting caps how many calls can happen in a window, protecting your upstream API from overload and your account from surprise costs.

Why it must be distributed

A naïve in-memory counter breaks the moment your server runs on more than one instance — each instance keeps its own count, so the real limit is multiplied by the number of instances, and counts reset on restart. Production rate limiting uses shared state (such as Redis) so the limit holds across every instance.

⚙️

Cast enforces limits with shared, self-expiring counters rather than per-instance memory, so a limit means what it says regardless of how many servers are running.

A practical setup

Turn on logging from day one — you'll want the history when something breaks.
Set a sensible rate limit before connecting any autonomous agent.
Review session patterns to find gaps (tools to add) and strong sequences (skills to package).
Watch analytics and geography weekly to spot error spikes, unused tools, and where demand is.
Iterate on tool descriptions using what the logs reveal about wrong tool choices.

Run your MCP server with full visibility

Logs, analytics, and rate limiting built in — expose tools with confidence.

Try Cast free

Frequently asked questions

What's logged for each tool call?

The tool name, the arguments the model supplied, the upstream response status, and timing — enough to debug failures and audit access.

Why not just use an in-memory rate limiter?

In-memory counters are per-instance and reset on restart, so across multiple servers the effective limit balloons. Distributed counters (e.g. Redis-backed) enforce a real, shared limit.

Can rate limiting break legitimate use?

Set the limit above normal human and expected agent usage. The goal is to catch runaway loops, not throttle real work — tune it from your analytics.

How do I find out why a tool call failed?

Open the Logs tab and inspect that call's arguments and upstream response. The error status and payload usually point straight at the cause.

What is a session in Cast?

A session is one client connection to your MCP server. Cast records its transport, geography (via GeoIP), duration, user agent, and how many tool calls it made — a connection-level view on top of per-call logs.

How does Cast know which tools to add next?

It analyzes the order of tool calls within sessions and finds recurring sequences across many of them. Sequences that keep hitting a disabled or missing operation flag a gap — a concrete signal to enable another tool.

What are the skill suggestions?

When a tool sequence is both frequent and successful, Cast drafts a SKILL.md from the observed pattern and sample prompts, so you can package a multi-step workflow into a single reusable skill.

Keep reading

Turn the Stripe API into MCP tools for Claude

A step-by-step recipe to expose Stripe's REST API as a secure MCP server — read-only by default — so you can pull customers, invoices, and balances straight from your AI client.