Mobile UltimateMobile Ultimate

Accelerating AI and XR innovation with XR Blocks

Biggest XR Innovations Across Industries in 2024 | Unity

The combination of artificial intelligence (AI) and extended reality (XR) has the potential to unlock a new paradigm of immersive intelligent computing. However, there is currently a significant gap between the ecosystems of these two fields. Mature frameworks like JAX, PyTorch, TensorFlow, and benchmarks like ImageNet and LMArena accelerate AI research and development. Meanwhile, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction.

We present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation, at ACM UIST 2025 to fill this void. This is a significant step from our prior research in Visual Blocks for ML, which targets non-XR use cases and streamlines prototyping machine learning pipelines with visual programming. For core abstraction in AI + XR, XR Blocks provides a modular architecture with plug-and-play components for the user, world, interface, AI, and agents. It is crucially designed to speed up the rapid prototyping of perceptive AI and XR applications. Our toolkit, which is based on readily available technologies like WebXR, threejs, LiteRT, and Gemini, lowers the entry barrier for XR creators. We demonstrate its utility through a set of open-source templates, live demos, and source code on GitHub, with the goal of empowering the community to quickly move from concept to interactive prototype. You can find an overview of these capabilities in our directional paper and teaser video.

Design principles

Our architectural and API design choices are guided by three principles:
Embrace simplicity and readability: Inspired by Python’s Zen, we prioritize clean, human-readable abstractions. The script written by a developer ought to resemble a high-level description of the desired experience. Simple tasks should be simple to implement, and complex logic should remain explicit and understandable.

Make the creator experience a priority: Our primary objective is to make it as simple as possible to write intelligent and perceptive XR applications. We believe that developers should concentrate on the user experience rather than the low-level “plumbing” of cross-platform interaction logic, AI model integration, or sensor fusion. Pragmatism over completeness: We follow a design philosophy of pragmatism, since the fields of AI and XR are evolving quickly. When it is released, a comprehensive, complicated framework that tries to be perfect will be obsolete. For a wide range of applications, we favor a straightforward, modular, and adaptable architecture that runs on desktop and Android XR devices.

XR Blocks framework

We designed the XR Blocks framework, drawing inspiration from Visual Blocks for ML and InstructPipe, to provide a high-level, human-centered abstraction layer that separates the what of an interaction—referred to as Script—from the how of its low-level implementation.

Abstractions

We propose a new Reality Model composed of high-level abstractions to guide the implementation of the XR Blocks framework. Our Reality Model is made up of replaceable modules for XR interaction, in contrast to the World Model, which was made for unsupervised training from beginning to end. Script, an application’s narrative and logical core, is at the center of our design. Script operates on six first-class primitives (described and visualized below):

User and the real world: The hands, gaze, and avatar of the User are at the center of our model. Script can inquire about perceptions of reality through the physical world, such as depth (demo), estimated lighting condition (demo), and objects (demo). Virtual UI elements, from 2D panels (demo) to fully 3D assets (demo), are added to the blended reality by the model. The environment, activities, and interaction histories are examined by the perception pipeline. An example application can be found in Sensible Agent (discussed more below).

Intelligent and Social Entities: The model treats remote human peers and AI-driven agents as primary entities. This enables dynamic group conversations in hybrid human-AI conversations in DialogLab.

Implementation

The modular Core engine of XR Blocks is what makes this Reality Model a reality. It offers high-level APIs that let developers use the following subsystems without having to know how they work: Pipeline for perception and input The camera, depth, and sound modules constantly feed and update the Reality Model’s physical reality representation. The raw data for XR Blocks’ interpretation is provided by the input module, which normalizes user actions from various devices. AI as a core utility: The ai module serves as a nervous system by providing powerful yet simple functions like “.query” and “.runModel” that make large models accessible. Experience and visualization toolkit: The toolkit provides a library of common features to facilitate rapid creation. The ux module offers reusable interaction behaviors like .selectable and .draggable (demo), while the ui and effect modules handle the rendering of interfaces and complex visual effects like occlusion (demo).