robometric_frame.efficiency.inference_latency

Inference Latency metric for robotics policy evaluation.

Inference Latency measures the time required to generate actions from visual observations and language instructions. This metric is crucial for real-time applications where responsive behavior is essential for effective human-robot interaction.

Reference:

A. Brohan et al., “RT-1: Robotics transformer for real-world control at scale,” arXiv:2212.06817, 2022.

Classes

InferenceLatency([percentiles])

Compute Inference Latency for robotics policy evaluation.

class robometric_frame.efficiency.inference_latency.InferenceLatency(percentiles=None, **kwargs)[source]

Compute Inference Latency for robotics policy evaluation.

Inference Latency is calculated as:

IL = t_infer,end - t_infer,start

This metric tracks the time elapsed during model inference operations, which is critical for real-time robotics applications. It accumulates timing measurements across multiple inference calls and provides statistics including mean, minimum, maximum, total latency, and configurable percentiles.

The metric is designed to be used in two ways: 1. Manual timing: Call start() before inference and stop() after 2. Direct update: Call update() with pre-measured latency values

Parameters:
  • percentiles (Optional[list[float]]) – List of percentile values to compute (e.g., [0.5, 0.95, 0.99]). Default: [0.5, 0.95, 0.99] for median, 95th, and 99th percentiles.

  • **kwargs (Any) – Additional keyword arguments passed to the base Metric class.

Example

>>> from robometric_frame.efficiency import InferenceLatency
>>> import torch
>>> import time
>>> metric = InferenceLatency()
>>> # Manual timing
>>> metric.start()
>>> # ... model inference ...
>>> time.sleep(0.1)  # Simulate inference
>>> metric.stop()
>>> result = metric.compute()
>>> result['mean'] > 0
tensor(True)
Example (direct update):
>>> # Direct update with measured latency
>>> metric = InferenceLatency()
>>> latencies = torch.tensor([0.1, 0.15, 0.12, 0.11])  # seconds
>>> metric.update(latencies)
>>> result = metric.compute()
>>> result['mean'].item()
0.12
Example (batched):
>>> # Multiple inference measurements
>>> metric = InferenceLatency()
>>> for _ in range(10):
...     metric.start()
...     time.sleep(0.01)  # Simulate inference
...     metric.stop()
>>> result = metric.compute()
>>> result['count']
tensor(10)
Example (distributed):
>>> # In distributed training, metrics are automatically synced
>>> metric = InferenceLatency()
>>> # On GPU 0
>>> metric.update(torch.tensor([0.1, 0.12]))
>>> # On GPU 1
>>> metric.update(torch.tensor([0.11, 0.13]))
>>> # Final result aggregates across all GPUs
>>> result = metric.compute()
>>> result['mean'].item()
0.115
Example (custom percentiles):
>>> # Track specific percentiles for robustness analysis
>>> metric = InferenceLatency(percentiles=[0.5, 0.9, 0.95, 0.99])
>>> latencies = torch.tensor([0.1, 0.12, 0.15, 0.11, 0.13, 0.2, 0.25, 0.3])
>>> metric.update(latencies)
>>> result = metric.compute()
>>> result['p50']  # median
tensor(0.1350)
>>> result['p95']  # 95th percentile
tensor(0.2875)
__init__(percentiles=None, **kwargs)[source]

Initialize the InferenceLatency metric.

reset()[source]

Reset the metric state.

Return type:

None