Sharecafe

Meta unveils V-JEPA 2, a powerful AI “world model” for physical understanding and robot planning

Thumbnail
V-JEPA 2 simulates reality, enhancing self-driving cars and delivery robots technology.

Model helps machines reason like humans using raw video data—no labels required

 

Meta has launched a new artificial intelligence model called V-JEPA 2, a major advance in the emerging field of “world models”—AI systems designed to simulate the physical world and help machines reason, plan, and act more like humans.

 

Unveiled at the VivaTech conference in Paris, V-JEPA 2 can understand how objects move, predict physical outcomes, and plan actions in unfamiliar environments. It does so without relying on labelled training data, using instead more than one million hours of raw video to learn patterns of movement, interaction, and cause-and-effect. The open-source model is now available for research and commercial use.

 

Understanding gravity, motion, and object permanence

 

The model allows AI agents to develop basic physical intuition, such as understanding that a ball falling off a table doesn’t disappear, or anticipating how a hand holding a spatula and a plate might next transfer food between them. These are core cognitive skills that humans acquire early in life but that have proven difficult to replicate in AI—until now.

 

“World models will usher in a new era for robotics,” said Meta’s chief AI scientist Yann LeCun. “They allow machines to think before acting, predict outcomes, and plan toward goals—even without having seen the exact scenario before.”

 

From training to action: robot performance without custom datasets

 

V-JEPA 2 uses a two-stage training process: an initial “actionless” phase using videos and images to learn general world dynamics, followed by a smaller, action-conditioned phase using 62 hours of robot data to link visual input with specific control actions. The result is a robot planning system that works in unfamiliar environments and with novel objects—without requiring custom training for each new deployment.

 

In Meta’s lab tests, robots equipped with V-JEPA 2 successfully completed pick-and-place tasks involving unseen objects, achieving 65%–80% success rates using only visual subgoals as guidance. The system works by imagining outcomes from candidate actions and selecting the best move at each step.

 

Benchmarks for physical reasoning

 

Alongside the model, Meta released three new benchmarks to evaluate AI understanding of physical phenomena:

 

  • IntPhys 2: Detects implausible physics in paired videos.
  • MVPBench: Uses minimal video pairs to test causal comprehension.
  • CausalVQA: Assesses whether models can answer “what if” and “what next” questions based on physical cause-and-effect.

 

Meta notes that while humans achieve up to 95% accuracy on these tasks, current video models—including V-JEPA 2—still trail significantly, highlighting room for improvement.

 

A broader shift toward embodied intelligence

 

V-JEPA 2 positions Meta at the forefront of a growing movement toward AI systems that engage directly with the physical world. Google DeepMind’s “Genie” and Fei-Fei Li’s startup “World Labs” are among others investing heavily in world models. Meta itself plans to invest US$14bn into Scale AI and bring CEO Alexandr Wang into its AI leadership ranks to accelerate progress.

 

While large language models have dominated headlines, Meta’s latest announcement reflects a strategic shift: from understanding text to understanding reality.

Serving up fresh finance news, marker movers & expertise.
LinkedIn
Email
X

All Categories