Observability Metrics
Metrics are runtime measurements that capture indicators of a service’s availability and performance. Unlike tracing—which helps you understand the sequence of operations for a single request—metrics provide an aggregated statistical view across multiple requests or conversations. Typical examples include average response time, throughput, and CPU/memory consumption. Monitoring these helps you:
- Track the health of your service.
- Quickly detect and alert on outages or anomalies.
- Quantify the impact of code or infrastructure changes on performance.
Tracing must also be enabled for metrics to be recorded. Metric measurements are emitted from Rasa's instrumentation layer, which is only activated when tracing is configured. With both enabled, metrics and traces together give you a complete view of your deployment behavior, making it easier to debug issues and optimize resource usage.
How To Use Metrics
Enabling Metrics in Rasa
Rasa uses an OpenTelemetry (OTEL) Collector to collect metrics and send them to your desired backend (e.g., Prometheus, Datadog, etc.).
Configuring the metrics block alone does not activate metric collection. Rasa records measurements through the same instrumentation layer that powers tracing, and that layer starts only when a tracing block is present in your endpoints configuration. You must enable tracing in addition to metrics.
-
Enable tracing in your endpoints file (or Helm values):
Add a
tracingblock to activate Rasa's instrumentation layer. For example, using an OTLP collector:endpoints.ymltracing:
- type: otlp
endpoint: my-otlp-host:4318
insecure: false
service_name: rasa
root_certificates: ./path/to/ca.pemSee Tracing for other supported backends (Jaeger, Langfuse, and more).
-
Configure metrics in the same endpoints file (or Helm values):
endpoints.ymlmetrics:
type: otlp
endpoint: my-otlp-host:4318
insecure: false
service_name: rasa
root_certificates: ./path/to/ca.pemtype: otlpindicates you are using OpenTelemetry’s OTLP format.endpointis the URL of the OTEL Collector or metrics backend.service_nameis an identifier for your Rasa Pro service.insecure/root_certificatesspecify how TLS is handled.
Tracing surfaces the sequence of internal method calls for individual requests; metrics aggregate their performance across many requests. Enabling both gives you the most complete observability picture.
Custom Metrics Collected by Rasa
Once tracing and metrics are both configured, Rasa automatically collects several custom metrics relevant to large language model (LLM) usage and overall assistant performance:
- CPU and Memory Usage of any LLM-based command generator (e.g.,
CompactLLMCommandGenerator,SearchReadyLLMCommandGenerator) at the time of making an LLM call. - Prompt Token Usage for LLM-based command generators, provided the
trace_prompt_tokensconfig property is enabled. - Method Call Durations for LLM-specific components, such as:
EnterpriseSearchPolicyContextualResponseRephraserCompactLLMCommandGeneratorSearchReadyLLMCommandGenerator
- HTTP Request Metrics for the Rasa client:
- Duration of requests to external services (action server, NLG server, etc.).
- Request size in bytes.
Sub-agents (ReAct and A2A)
When a flow uses autonomous steps to hand off to a ReAct or A2A sub-agent, Rasa emits extra OpenTelemetry histograms (in addition to the spans described in Tracing):
-
agent_execution_duration— Wall time for each sub-agent run (_call_agent_with_retryin the agent executor). Useful for end-to-end latency and error rates per sub-agent. Attribute labels include:agent_name— Sub-agent id from configuration.protocol_type— How the sub-agent is connected:mcp_openormcp_taskfor ReAct-style MCP sub-agents, ora2afor A2A sub-agents.status— Final status of the sub-agent result.
-
mcp_tool_execution_duration— Time spent inside an MCP tool call. The same metric name is used in two execution paths; use theexecution_contextlabel to distinguish them:execution_context=flow— Tool invoked from a flow MCP tool step. Labels includetool_id,mcp_server, andsuccess.execution_context=agent— Tool invoked while a ReAct MCP sub-agent runs (_execute_tool_call). Labels includetool_name,agent_name,protocol_type, andsuccess.
-
ReAct MCP sub-agent LLM usage — For MCP-based sub-agents, each LLM
send_messageround-trip also records resource-style histograms (CPU and memory sampled like other LLM components, plus estimated prompt size and response duration):mcp_agent_llm_cpu_usagemcp_agent_llm_memory_usagemcp_agent_llm_prompt_token_usagemcp_agent_llm_response_duration
By collecting these telemetry metrics, you gain robust insights into how your assistant performs under real-world usage. You can proactively detect issues, understand resource consumption, and tailor your assistant’s architecture to provide the best possible experience for your users.