robometric_frame.efficiency

Efficiency metrics for robotics policy evaluation.

This module provides metrics for evaluating the computational efficiency of robotics policies, including inference latency, computation time, and memory usage.

class robometric_frame.efficiency.EfficiencyMetric(percentiles=None, **kwargs)[source]

Abstract base class for efficiency metrics with start/stop interface.

This base class provides common functionality for metrics that track resource usage (time, memory, etc.) over intervals. It implements: - start()/stop() interface for interval-based measurements - Percentile computation support - Common state management

Subclasses must implement: - _on_start(): Called when measurement starts - _on_stop(): Called when measurement stops, should return measured value - _get_measurement_unit(): Returns the unit suffix for computed statistics

Parameters:
  • percentiles (Optional[list[float]]) – List of percentile values to compute (e.g., [0.5, 0.95, 0.99]). Default: [0.5, 0.95, 0.99] for median, 95th, and 99th percentiles.

  • **kwargs (Any) – Additional keyword arguments passed to the base Metric class.

full_state_update: bool = False
is_differentiable: bool = False
higher_is_better: bool = False
total_value: Tensor
min_value: Tensor
max_value: Tensor
count: Tensor
values: list[Tensor]
__init__(percentiles=None, **kwargs)[source]

Initialize the efficiency metric.

start()[source]

Start a measurement interval.

This method begins tracking. Call stop() to end the interval and record the measurement.

Raises:

RuntimeError – If start() is called while already tracking.

Return type:

None

Example

>>> metric.start()
>>> # ... perform operations ...
>>> metric.stop()
stop()[source]

Stop tracking and record the measurement.

This method ends the tracking interval and records the measured value.

Raises:

RuntimeError – If stop() is called without a preceding start().

Return type:

None

Example

>>> metric.start()
>>> # ... perform operations ...
>>> metric.stop()
update(value)[source]

Update metric state with measurements.

Parameters:

value (Tensor) –

Measurement values. Can be: - Scalar tensor: Single measurement - 1D tensor: Batch of measurements

All values must be non-negative.

Raises:

ValueError – If value contains negative values.

Return type:

None

Example

>>> metric.update(torch.tensor(0.1))  # Single measurement
>>> metric.update(torch.tensor([0.11, 0.12]))  # Batch
compute()[source]

Compute statistics including percentiles.

Returns:

  • ‘mean{unit}’: Mean value

  • ’min{unit}’: Minimum value

  • ’max{unit}’: Maximum value

  • ’total{unit}’: Total accumulated value

  • ’count’: Number of measurements

  • ’p{X}{unit}’: Xth percentile (e.g., ‘p50’ for median)

Return type:

Dictionary containing

Raises:

RuntimeError – If no measurements have been recorded.

reset()[source]

Reset the metric state.

This method resets all metric states to their default values and clears any internal tracking state.

Return type:

None

class robometric_frame.efficiency.InferenceLatency(percentiles=None, **kwargs)[source]

Compute Inference Latency for robotics policy evaluation.

Inference Latency is calculated as:

IL = t_infer,end - t_infer,start

This metric tracks the time elapsed during model inference operations, which is critical for real-time robotics applications. It accumulates timing measurements across multiple inference calls and provides statistics including mean, minimum, maximum, total latency, and configurable percentiles.

The metric is designed to be used in two ways: 1. Manual timing: Call start() before inference and stop() after 2. Direct update: Call update() with pre-measured latency values

Parameters:
  • percentiles (Optional[list[float]]) – List of percentile values to compute (e.g., [0.5, 0.95, 0.99]). Default: [0.5, 0.95, 0.99] for median, 95th, and 99th percentiles.

  • **kwargs (Any) – Additional keyword arguments passed to the base Metric class.

Example

>>> from robometric_frame.efficiency import InferenceLatency
>>> import torch
>>> import time
>>> metric = InferenceLatency()
>>> # Manual timing
>>> metric.start()
>>> # ... model inference ...
>>> time.sleep(0.1)  # Simulate inference
>>> metric.stop()
>>> result = metric.compute()
>>> result['mean'] > 0
tensor(True)
Example (direct update):
>>> # Direct update with measured latency
>>> metric = InferenceLatency()
>>> latencies = torch.tensor([0.1, 0.15, 0.12, 0.11])  # seconds
>>> metric.update(latencies)
>>> result = metric.compute()
>>> result['mean'].item()
0.12
Example (batched):
>>> # Multiple inference measurements
>>> metric = InferenceLatency()
>>> for _ in range(10):
...     metric.start()
...     time.sleep(0.01)  # Simulate inference
...     metric.stop()
>>> result = metric.compute()
>>> result['count']
tensor(10)
Example (distributed):
>>> # In distributed training, metrics are automatically synced
>>> metric = InferenceLatency()
>>> # On GPU 0
>>> metric.update(torch.tensor([0.1, 0.12]))
>>> # On GPU 1
>>> metric.update(torch.tensor([0.11, 0.13]))
>>> # Final result aggregates across all GPUs
>>> result = metric.compute()
>>> result['mean'].item()
0.115
Example (custom percentiles):
>>> # Track specific percentiles for robustness analysis
>>> metric = InferenceLatency(percentiles=[0.5, 0.9, 0.95, 0.99])
>>> latencies = torch.tensor([0.1, 0.12, 0.15, 0.11, 0.13, 0.2, 0.25, 0.3])
>>> metric.update(latencies)
>>> result = metric.compute()
>>> result['p50']  # median
tensor(0.1350)
>>> result['p95']  # 95th percentile
tensor(0.2875)
__init__(percentiles=None, **kwargs)[source]

Initialize the InferenceLatency metric.

reset()[source]

Reset the metric state.

Return type:

None

class robometric_frame.efficiency.MemoryUsage(track_ram=True, track_vram=None, percentiles=None, **kwargs)[source]

Compute Memory Usage for robotics policy evaluation.

Memory Usage is calculated as:

MU = max_t(RAM_t + VRAM_t)

This metric tracks the peak memory consumption (RAM + VRAM) during model operations, which is critical for deployment on resource-constrained devices. It provides statistics including peak, mean, current, and configurable percentiles.

The metric can be used in two ways: 1. Manual tracking: Call start() to begin tracking and stop() to end 2. Direct update: Call update() with pre-measured memory values

Parameters:
  • track_ram (bool) – Whether to track RAM usage. Default: True.

  • track_vram (Optional[bool]) – Whether to track VRAM (GPU memory) usage. Default: True if CUDA available.

  • percentiles (Optional[list[float]]) – List of percentile values to compute (e.g., [0.5, 0.95, 0.99]). Default: [0.5, 0.95, 0.99] for median, 95th, and 99th percentiles.

  • **kwargs (Any) – Additional keyword arguments passed to the base Metric class.

Example

>>> from robometric_frame.efficiency import MemoryUsage
>>> import torch
>>> # Manual tracking
>>> metric = MemoryUsage()
>>> metric.start()
>>> # ... model operations ...
>>> _ = torch.randn(1000, 1000)  # Allocate some memory
>>> metric.stop()
>>> result = metric.compute()
>>> result['peak_mb'] > 0
tensor(True)
Example (direct update):
>>> # Direct update with measured memory (in MB)
>>> metric = MemoryUsage()
>>> memory_readings = torch.tensor([100.0, 150.0, 120.0, 180.0])  # MB
>>> metric.update(memory_readings)
>>> result = metric.compute()
>>> result['peak_mb'].item()
180.0
Example (batched tracking):
>>> # Multiple memory measurements
>>> metric = MemoryUsage()
>>> for _ in range(10):
...     metric.start()
...     _ = torch.randn(500, 500)  # Some operation
...     metric.stop()
>>> result = metric.compute()
>>> result['count']
tensor(10.)
Example (custom percentiles):
>>> # Track specific percentiles
>>> metric = MemoryUsage(percentiles=[0.5, 0.9, 0.95, 0.99])
>>> memory = torch.tensor([100.0, 120.0, 150.0, 180.0, 200.0])
>>> metric.update(memory)
>>> result = metric.compute()
>>> result['p95_mb']  # 95th percentile
tensor(191.)
Example (RAM only):
>>> # Track only RAM, not VRAM
>>> metric = MemoryUsage(track_ram=True, track_vram=False)
>>> metric.start()
>>> _ = [i for i in range(100000)]  # CPU memory allocation
>>> metric.stop()
>>> result = metric.compute()
__init__(track_ram=True, track_vram=None, percentiles=None, **kwargs)[source]

Initialize the MemoryUsage metric.

compute()[source]

Compute memory usage statistics including percentiles.

Returns:

  • ‘mean_mb’: Mean memory usage in megabytes

  • ’peak_mb’: Peak memory usage in megabytes (alias for max_mb)

  • ’min_mb’: Minimum memory usage in megabytes

  • ’max_mb’: Maximum memory usage in megabytes

  • ’total_mb’: Total accumulated memory in megabytes

  • ’count’: Number of measurements

  • ’p{X}_mb’: Xth percentile in MB (e.g., ‘p50_mb’ for median)

Return type:

Dictionary containing

Raises:

RuntimeError – If no measurements have been recorded.

Modules

base

Base class for efficiency metrics with start/stop interface.

inference_latency

Inference Latency metric for robotics policy evaluation.

memory_usage

Memory Usage metric for robotics policy evaluation.