robometric_frame.task_performance.task_completion_rate
Task Completion Rate metric for robotics policy evaluation.
Task Completion Rate (TCR) evaluates the ability to execute multi-step task sequences, revealing critical limitations in current robotics policies when handling complex natural language instructions that require multiple sequential actions.
- Reference:
A. Brohan et al., “RT-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022.
Classes
|
Compute Task Completion Rate for robotics policy task chain evaluation. |
- class robometric_frame.task_performance.task_completion_rate.TaskCompletionRate(threshold=None, ignore_index=None, **kwargs)[source]
Compute Task Completion Rate for robotics policy task chain evaluation.
- Task Completion Rate is calculated as:
TCR = N_completed_tasks / N_task_chains
where N_completed_tasks is the number of successfully completed task chains and N_task_chains is the total number of task chains attempted.
This metric evaluates multi-step task sequences, measuring success rates across sequential steps. Research shows that success rates drop significantly between sequential steps, indicating challenges in complex instruction following.
- Parameters:
threshold (
Optional[float]) – Threshold for binary classification when using continuous scores. If None, assumes binary inputs (0 or 1). Default: None.ignore_index (
Optional[int]) – Value to ignore in the completion tensor. Default: None.**kwargs (
Any) – Additional keyword arguments passed to the base Metric class.
Example
>>> from robometric_frame import TaskCompletionRate >>> metric = TaskCompletionRate() >>> # Binary completion indicators for task chains >>> completion = torch.tensor([1, 0, 1, 1, 0]) >>> metric(completion) tensor(0.6000)
>>> # With continuous scores and threshold >>> metric = TaskCompletionRate(threshold=0.8) >>> scores = torch.tensor([0.9, 0.7, 0.85, 0.95]) >>> metric(scores) tensor(0.7500)
- Example (multi-step evaluation):
>>> # Evaluate task chains over multiple batches >>> metric = TaskCompletionRate() >>> # First batch: 3 task chains, 2 completed >>> batch1 = torch.tensor([1, 0, 1]) >>> metric.update(batch1) >>> # Second batch: 2 task chains, 1 completed >>> batch2 = torch.tensor([0, 1]) >>> metric.update(batch2) >>> # Overall completion rate >>> metric.compute() tensor(0.6000)
- __init__(threshold=None, ignore_index=None, **kwargs)[source]
Initialize the TaskCompletionRate metric.
- update(completion)[source]
Update metric state with new task chain completion indicators.
- Parameters:
completion (
Tensor) – Tensor of shape (N,) containing binary completion indicators (0 or 1) or continuous completion scores if threshold is set. Values can be int, float, or bool.- Raises:
ValueError – If completion tensor is empty or contains invalid values.
- Return type:
- compute()[source]
Compute the final Task Completion Rate.
- Return type:
- Returns:
Task completion rate as a scalar tensor in range [0, 1].
- Raises:
RuntimeError – If no task chains have been recorded (total_chains == 0).
- training: bool