robometric_frame.task_performance.task_completion_rate

Task Completion Rate metric for robotics policy evaluation.

Task Completion Rate (TCR) evaluates the ability to execute multi-step task sequences, revealing critical limitations in current robotics policies when handling complex natural language instructions that require multiple sequential actions.

Reference:: A. Brohan et al., “RT-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022.

Classes

TaskCompletionRate([threshold, ignore_index])

Compute Task Completion Rate for robotics policy task chain evaluation.

class robometric_frame.task_performance.task_completion_rate.TaskCompletionRate(threshold=None, ignore_index=None, **kwargs)[source]

Compute Task Completion Rate for robotics policy task chain evaluation.

Task Completion Rate is calculated as:

\[TCR = \frac{N_{\text{completed tasks}}}{N_{\text{task chains}}}\]

where \(N_{\text{completed tasks}}\) is the number of successfully completed task chains and \(N_{\text{task chains}}\) is the total number of task chains attempted.

This metric evaluates multi-step task sequences, measuring success rates across sequential steps. Research shows that success rates drop significantly between sequential steps, indicating challenges in complex instruction following.

Parameters:

threshold (Optional[float]) – Threshold for binary classification when using continuous scores. If None, assumes binary inputs (0 or 1). Default: None.
ignore_index (Optional[int]) – Value to ignore in the completion tensor. Default: None.
**kwargs (Any) – Additional keyword arguments passed to the base Metric class.

Example

>>> from robometric_frame import TaskCompletionRate
>>> metric = TaskCompletionRate()
>>> # Binary completion indicators for task chains
>>> completion = torch.tensor([1, 0, 1, 1, 0])
>>> metric(completion)
tensor(0.6000)

>>> # With continuous scores and threshold
>>> metric = TaskCompletionRate(threshold=0.8)
>>> scores = torch.tensor([0.9, 0.7, 0.85, 0.95])
>>> metric(scores)
tensor(0.7500)

Example (multi-step evaluation):

>>> # Evaluate task chains over multiple batches
>>> metric = TaskCompletionRate()
>>> # First batch: 3 task chains, 2 completed
>>> batch1 = torch.tensor([1, 0, 1])
>>> metric.update(batch1)
>>> # Second batch: 2 task chains, 1 completed
>>> batch2 = torch.tensor([0, 1])
>>> metric.update(batch2)
>>> # Overall completion rate
>>> metric.compute()
tensor(0.6000)

full_state_update: bool = False

total_completed: Tensor

total_chains: Tensor

__init__(threshold=None, ignore_index=None, **kwargs)[source]

Initialize the TaskCompletionRate metric.

update(completion)[source]

Update metric state with new task chain completion indicators.

Parameters:: completion (Tensor) – Tensor of shape (N,) containing binary completion indicators (0 or 1) or continuous completion scores if threshold is set. Values can be int, float, or bool.
Raises:: ValueError – If completion tensor is empty or contains invalid values.
Return type:: None

compute()[source]

Compute the final Task Completion Rate.

Return type:: Tensor
Returns:: Task completion rate as a scalar tensor in range [0, 1].
Raises:: RuntimeError – If no task chains have been recorded (total_chains == 0).

training: bool