robometric_frame.task_performance
Task Performance Metrics for robotics policies.
This module contains metrics for evaluating task execution performance including: - Success Rate (SR) - Task Completion Rate (TCR) - Action Accuracy (MSE, AMSE, NAMSE)
- class robometric_frame.task_performance.ActionAccuracy(normalize=False, action_variance=None, **kwargs)[source]
Compute Action Accuracy metrics (MSE, AMSE, NAMSE) for robotics policy evaluation.
This metric computes three related measures of action prediction accuracy: - MSE: Mean Squared Error per trajectory - AMSE: Average MSE across multiple trajectories - NAMSE: Normalized AMSE (scaled by action variance)
- Formulas:
MSE = (1/T) * sum_{t=1}^{T} |a_t - â_t|_2^2 AMSE = (1/K) * sum_{k=1}^{K} MSE_k NAMSE = AMSE / σ²_action
- where:
a_t is the ground truth action at timestep t
â_t is the predicted action at timestep t
T is the number of timesteps in a trajectory
K is the number of trajectories
σ²_action is the variance of ground truth actions
- Parameters:
normalize (
bool) – Whether to compute NAMSE. If True, action variance is computed from the data. If False, only MSE and AMSE are computed. Default: False.action_variance (
Optional[float]) – Pre-computed action variance for normalization. If provided, this value is used instead of computing from data. Default: None.**kwargs (
Any) – Additional keyword arguments passed to the base Metric class.
Example
>>> from robometric_frame import ActionAccuracy >>> import torch >>> metric = ActionAccuracy() >>> >>> # Single trajectory >>> predictions = torch.randn(10, 4) # 10 timesteps, 4-dim actions >>> targets = torch.randn(10, 4) >>> metric.update(predictions, targets) >>> results = metric.compute() >>> print(f"MSE: {results['mse']:.4f}, AMSE: {results['amse']:.4f}") >>> >>> # With normalization >>> metric = ActionAccuracy(normalize=True) >>> metric.update(predictions, targets) >>> results = metric.compute() >>> print(f"NAMSE: {results['namse']:.4f}")
- Example (multiple trajectories):
>>> metric = ActionAccuracy() >>> # Trajectory 1 >>> metric.update(torch.randn(10, 4), torch.randn(10, 4)) >>> # Trajectory 2 >>> metric.update(torch.randn(15, 4), torch.randn(15, 4)) >>> results = metric.compute() >>> # AMSE is averaged across both trajectories
- __init__(normalize=False, action_variance=None, **kwargs)[source]
Initialize the ActionAccuracy metric.
- update(predictions, targets)[source]
Update metric state with predicted and target actions.
- Parameters:
- Raises:
ValueError – If predictions and targets have different shapes or are empty.
- Return type:
- compute()[source]
Compute the final Action Accuracy metrics.
- Returns:
‘mse’: Mean Squared Error of the last trajectory
’amse’: Average MSE across all trajectories
’namse’: Normalized AMSE (only if normalize=True)
- Return type:
Dictionary containing
- Raises:
RuntimeError – If no trajectories have been recorded.
- class robometric_frame.task_performance.SuccessRate(threshold=None, ignore_index=None, **kwargs)[source]
Compute Success Rate for robotics policy task evaluation.
- Success Rate is calculated as:
SR = N_success / N_total
where N_success is the number of successfully completed tasks and N_total is the total number of tasks attempted.
This metric supports both binary success indicators and continuous success scores with an optional threshold.
- Parameters:
threshold (
Optional[float]) – Threshold for binary classification when using continuous scores. If None, assumes binary inputs (0 or 1). Default: None.ignore_index (
Optional[int]) – Value to ignore in the success tensor. Default: None.**kwargs (
Any) – Additional keyword arguments passed to the base Metric class.
Example
>>> from robometric_frame import SuccessRate >>> metric = SuccessRate() >>> # Binary success indicators >>> success = torch.tensor([1, 1, 0, 1, 0, 0, 1]) >>> metric(success) tensor(0.5714)
>>> # With continuous scores and threshold >>> metric = SuccessRate(threshold=0.8) >>> scores = torch.tensor([0.9, 0.7, 0.85, 0.6, 0.95]) >>> metric(scores) tensor(0.6000)
- Example (distributed):
>>> # In distributed training, metrics are automatically synced >>> metric = SuccessRate() >>> # On GPU 0 >>> success_gpu0 = torch.tensor([1, 1, 0]) >>> metric(success_gpu0) >>> # On GPU 1 >>> success_gpu1 = torch.tensor([1, 0, 1]) >>> metric(success_gpu1) >>> # Final result aggregates across all GPUs >>> result = metric.compute() # Returns aggregated success rate
- update(success)[source]
Update metric state with new success indicators.
- Parameters:
success (
Tensor) – Tensor of shape (N,) containing binary success indicators (0 or 1) or continuous success scores if threshold is set. Values can be int, float, or bool.- Raises:
ValueError – If success tensor is empty or contains invalid values.
- Return type:
- compute()[source]
Compute the final Success Rate.
- Return type:
- Returns:
Success rate as a scalar tensor in range [0, 1].
- Raises:
RuntimeError – If no tasks have been recorded (total_tasks == 0).
- class robometric_frame.task_performance.TaskCompletionRate(threshold=None, ignore_index=None, **kwargs)[source]
Compute Task Completion Rate for robotics policy task chain evaluation.
- Task Completion Rate is calculated as:
TCR = N_completed_tasks / N_task_chains
where N_completed_tasks is the number of successfully completed task chains and N_task_chains is the total number of task chains attempted.
This metric evaluates multi-step task sequences, measuring success rates across sequential steps. Research shows that success rates drop significantly between sequential steps, indicating challenges in complex instruction following.
- Parameters:
threshold (
Optional[float]) – Threshold for binary classification when using continuous scores. If None, assumes binary inputs (0 or 1). Default: None.ignore_index (
Optional[int]) – Value to ignore in the completion tensor. Default: None.**kwargs (
Any) – Additional keyword arguments passed to the base Metric class.
Example
>>> from robometric_frame import TaskCompletionRate >>> metric = TaskCompletionRate() >>> # Binary completion indicators for task chains >>> completion = torch.tensor([1, 0, 1, 1, 0]) >>> metric(completion) tensor(0.6000)
>>> # With continuous scores and threshold >>> metric = TaskCompletionRate(threshold=0.8) >>> scores = torch.tensor([0.9, 0.7, 0.85, 0.95]) >>> metric(scores) tensor(0.7500)
- Example (multi-step evaluation):
>>> # Evaluate task chains over multiple batches >>> metric = TaskCompletionRate() >>> # First batch: 3 task chains, 2 completed >>> batch1 = torch.tensor([1, 0, 1]) >>> metric.update(batch1) >>> # Second batch: 2 task chains, 1 completed >>> batch2 = torch.tensor([0, 1]) >>> metric.update(batch2) >>> # Overall completion rate >>> metric.compute() tensor(0.6000)
- __init__(threshold=None, ignore_index=None, **kwargs)[source]
Initialize the TaskCompletionRate metric.
- update(completion)[source]
Update metric state with new task chain completion indicators.
- Parameters:
completion (
Tensor) – Tensor of shape (N,) containing binary completion indicators (0 or 1) or continuous completion scores if threshold is set. Values can be int, float, or bool.- Raises:
ValueError – If completion tensor is empty or contains invalid values.
- Return type:
- compute()[source]
Compute the final Task Completion Rate.
- Return type:
- Returns:
Task completion rate as a scalar tensor in range [0, 1].
- Raises:
RuntimeError – If no task chains have been recorded (total_chains == 0).
Modules
Action Accuracy metrics for robotics policy evaluation. |
|
Success Rate metric for robotics policy evaluation. |
|
Task Completion Rate metric for robotics policy evaluation. |