About the World Action Model MotuBrain
ShengShu Technology was among the first to define the path toward a general-purpose world model. In July and December 2025, we introduced Vidar, the first embodied foundation model built on video generation, and Motus, a unified-architecture general world model. Two months ahead of the industry, we proposed and validated the core concept of World Action Models, laying the groundwork for this emerging paradigm.
As the evolutionary core connecting the digital and physical worlds, MotuBrain marks a generational leap—from visual simulation to physical decision-making—in general world models.
Positioned as a universal brain for embodied robots, MotuBrain supports multi-platform adaptation, strong task generalization, and long-horizon execution. It enables robots to reliably complete complex, continuous tasks across real-world environments such as homes, industrial settings, and commercial spaces.
At its core, MotuBrain unifies perception and action within a single model—bringing “what the robot sees” and “what it does” into one coherent framework. This allows robots not only to understand their environment, but also to anticipate changes and generate actionable strategies in real time.

General World Model Architecture

World Action Model Architecture
Key Capabilities of MotuBrain
One Brain. Many Skills.
The more it learns, the better it performs — displaying true multitasking intelligence and extended generalization
One Brain. Any Robot.
One model for endless embodiments. Near-instant integration of new robots within the established ecosystem.
One Brain. Directly Long-Horizon.
Long-term, complex tasks — completed in one continuous flow — without high-level VLM planning or manual breakdown.
One Brain. Foresight.
More than just reacting to the world. Anticipating environmental changes and dynamic challenges.
MotuBrain ranks #1 on both RoboTwin 2.0 and WorldArena, two leading international benchmarks
RoboTwin 2.0 Leaderboard
On RoboTwin 2.0, MotuBrain achieves 95.8 in the Clean setting and 96.1 in the Randomized setting—ranking #1 in both. It is the only model on the leaderboard to surpass an average score of 95 in randomized environments, and it reaches 100 or near-perfect scores across most individual tasks. Compared with models such as ABot, LingBot, JEPA-VLA, and pi0.5, MotuBrain demonstrates clear, across-the-board leadership on the RoboTwin benchmark.
WorldArena Leaderboard
On WorldArena, MotuBrain ranks #1 with an overall EWM Score of 63.77. It outperforms leading models such as ABot and GigaWorld-1, and delivers top-tier results across key motion metrics—including Motion Quality, Flow Score, and Motion Smoothness—consistently leading the benchmark.
Motus: A Unified World Model Unlocking a New Paradigm for Multi-Task Generalization and Scalable Embodied Intelligence
At the architectural level, Motus is built on the UniDiffuser unified modeling framework. Through cross-modal priors fusion, it integrates vision-language knowledge (VLM), video dynamics (video generation models), and action expertise (action experts) into a single model. This enables a unified representation and generation of language, video, and action—laying the foundation for a truly unified world model and establishing a new paradigm for scalable, multi-task embodied intelligence.
Strategic Partners



News

Announced at Zhongguancun Forum: ShengShu Technology unveils its universal world model strategy

ShengShu Technology closes nearly RMB 2 billion Series B, defining the next-generation productivity foundation for the digital and physical worlds with universal world models

Outperforming Pi0.5 by 40% — ShengShu Technology and Tsinghua University open-source the unified world model Motus

ShengShu Technology partners with Shenpu Intelligence to build general intelligence for the physical world

Universal world models: the bridge between the digital and physical worlds

