World-Model-Based Evaluation of Robot Policies

A scalable evaluator that predicts real-world robot policy outcomes using a learned world model, eliminating the need for time-consuming manual rollouts.

Sentience Inc.

Abstract

Evaluating robot policies in the real world is expensive, slow, and often unsafe. We present a world-model-based policy evaluator that simulates policy execution directly in a learned dynamics model. Our approach enables rapid, automated assessment of policy success and failure modes without requiring physical rollouts. We demonstrate strong qualitative alignment between real-world executions and world-model predictions across diverse manipulation tasks, substantially reducing evaluation time while preserving fidelity.

Policy Evaluation Demos

Task 1
Real-World Policy
World Model
Instruction: Pick up the white tissue ball and place it in the black mug.
Task 2
Real-World Policy
World Model
Instruction: Pick up the white tissue ball and place it in the black mug.
Task 3
Real-World Policy
World Model
Instruction: Pick up the white tissue ball and place it in the black mug.
Task 4
Real-World Policy
World Model
Instruction: Pick up the white tissue ball and place it in the black mug.