Observability Metrics

Metrics are runtime measurements that capture indicators of a service’s availability and performance. Unlike tracing—which helps you understand the sequence of operations for a single request—metrics provide an aggregated statistical view across multiple requests or conversations. Typical examples include average response time, throughput, and CPU/memory consumption. Monitoring these helps you:

Track the health of your service.
Quickly detect and alert on outages or anomalies.
Quantify the impact of code or infrastructure changes on performance.

Tracing must also be enabled for metrics to be recorded. Metric measurements are emitted from Rasa's instrumentation layer, which is only activated when tracing is configured. With both enabled, metrics and traces together give you a complete view of your deployment behavior, making it easier to debug issues and optimize resource usage.

How To Use Metrics

Enabling Metrics in Rasa

Rasa uses an OpenTelemetry (OTEL) Collector to collect metrics and send them to your desired backend (e.g., Prometheus, Datadog, etc.).

Tracing is required

Configuring the metrics block alone does not activate metric collection. Rasa records measurements through the same instrumentation layer that powers tracing, and that layer starts only when a tracing block is present in your endpoints configuration. You must enable tracing in addition to metrics.

Enable tracing in your endpoints file (or Helm values):

Add a tracing block to activate Rasa's instrumentation layer. For example, using an OTLP collector:
endpoints.yml
```
tracing:
  - type: otlp
    endpoint: my-otlp-host:4318
    insecure: false
    service_name: rasa
    root_certificates: ./path/to/ca.pem
```
See Tracing for other supported backends (Jaeger, Langfuse, and more).
Configure metrics in the same endpoints file (or Helm values):
endpoints.yml
```
metrics:
  type: otlp
  endpoint: my-otlp-host:4318
  insecure: false
  service_name: rasa
  root_certificates: ./path/to/ca.pem
```
- type: otlp indicates you are using OpenTelemetry’s OTLP format.
- endpoint is the URL of the OTEL Collector or metrics backend.
- service_name is an identifier for your Rasa Pro service.
- insecure/root_certificates specify how TLS is handled.
Tracing surfaces the sequence of internal method calls for individual requests; metrics aggregate their performance across many requests. Enabling both gives you the most complete observability picture.

Custom Metrics Collected by Rasa

Once tracing and metrics are both configured, Rasa automatically collects several custom metrics relevant to large language model (LLM) usage and overall assistant performance:

CPU and Memory Usage of any LLM-based command generator (e.g., CompactLLMCommandGenerator, SearchReadyLLMCommandGenerator) at the time of making an LLM call.
Prompt Token Usage for LLM-based command generators, provided the trace_prompt_tokens config property is enabled.
Method Call Durations for LLM-specific components, such as:
- EnterpriseSearchPolicy
- ContextualResponseRephraser
- CompactLLMCommandGenerator
- SearchReadyLLMCommandGenerator
HTTP Request Metrics for the Rasa client:
- Duration of requests to external services (action server, NLG server, etc.).
- Request size in bytes.

Sub-agents (ReAct and A2A)

When a flow uses autonomous steps to hand off to a ReAct or A2A sub-agent, Rasa emits extra OpenTelemetry histograms (in addition to the spans described in Tracing):

agent_execution_duration — Wall time for each sub-agent run (_call_agent_with_retry in the agent executor). Useful for end-to-end latency and error rates per sub-agent. Attribute labels include:
- agent_name — Sub-agent id from configuration.
- protocol_type — How the sub-agent is connected: mcp_open or mcp_task for ReAct-style MCP sub-agents, or a2a for A2A sub-agents.
- status — Final status of the sub-agent result.
mcp_tool_execution_duration — Time spent inside an MCP tool call. The same metric name is used in two execution paths; use the execution_context label to distinguish them:
- execution_context = flow — Tool invoked from a flow MCP tool step. Labels include tool_id, mcp_server, and success.
- execution_context = agent — Tool invoked while a ReAct MCP sub-agent runs (_execute_tool_call). Labels include tool_name, agent_name, protocol_type, and success.
ReAct MCP sub-agent LLM usage — For MCP-based sub-agents, each LLM send_message round-trip also records resource-style histograms (CPU and memory sampled like other LLM components, plus estimated prompt size and response duration):
- mcp_agent_llm_cpu_usage
- mcp_agent_llm_memory_usage
- mcp_agent_llm_prompt_token_usage
- mcp_agent_llm_response_duration

By collecting these telemetry metrics, you gain robust insights into how your assistant performs under real-world usage. You can proactively detect issues, understand resource consumption, and tailor your assistant’s architecture to provide the best possible experience for your users.

How To Use Metrics​

Enabling Metrics in Rasa​

Custom Metrics Collected by Rasa​

Sub-agents (ReAct and A2A)​

How To Use Metrics

Enabling Metrics in Rasa

Custom Metrics Collected by Rasa

Sub-agents (ReAct and A2A)