Immersion is the easiest and most fun way to learn a new language.
When your brain experiences language in context, it forms stronger, faster, and more natural connections. Linguists like Stephen Krashen have shown for decades that “comprehensible input,” hearing and interacting with language in meaningful situations, is one of the most effective drivers of real fluency.
And mixed reality finally makes true immersion possible anywhere.
Instead of memorizing flashcards or staring at screens, Language IRL uses the objects, spaces, and situations around you as living vocabulary lessons. You see it, you hear it, you say it, and your brain locks it in effortlessly.
Your tutor meets you right where you are, guiding you through spoken practice, real-life vocabulary, and interactive activities that feel natural and intuitive. No pressure. No overthinking. Just learning that flows.
The result? A more engaging, more effective, and genuinely enjoyable way to learn a new language, one that actually sticks.
Let your tutor take your skills to the next level!
Language IRL began with a question I could not shake: What if there were a better way to learn a new language? Not by memorizing flashcards, but by turning the world around you into your own personal classroom.
That question grew from an unexpected place.
Eight years earlier, when I was just getting started in VR, I made an action comedy called Kungfuscius, a story about an AI mentor who appears in augmented reality to help you level up your life. It was wild and slapstick and I genuinely risked my life to make it, but the heart of it stayed with me: What if everyone could have a personal guide who sees the world alongside you, talks to you, and helps you grow?
For years, that idea lived only as a VR film. Then Meta opened computer vision access on the Quest 3, and suddenly an AI mentor no longer felt fictional. I finally had the tools to build a companion that could recognize your environment, understand your voice, and interact with you in mixed reality.
Language learning felt like the perfect place to begin; something millions attempt yet few ever get the immersive environment they need to succeed.
Language IRL transforms your real environment into an interactive language-learning space. Your room becomes your vocabulary list. Your world becomes your lesson.
Monobonobo, your mixed reality tutor, floats through your space teaching you words tied to real objects. You respond by speaking aloud. Wherever you go there’s a new lesson to be learned.
Instead of abstract memorization, Language IRL lets you learn language the way the brain prefers, through spatial context.
For a two-person team with less than a month, the tech stack felt massive: computer vision, spatial mapping, gesture input, multilingual text-to-speech, pronunciation grading, and a fully animated mentor.
We began by cloning Meta’s Passthrough API sample project and validating each system one by one. Meta’s tools gave us a huge head start; without them, there is no chance we could have built something this ambitious in time.
I designed the UX, UI, and lesson plan, mapping out how learners move, gesture, and speak as their environment becomes the curriculum. Once the flow felt intuitive, Riko built the core architecture: lesson logic, gesture recognition, spatial UI, event sequencing, and Azure integrations for text-to-speech and pronunciation grading.
With the foundation in place, I built Monobonobo, our Monkey Bot. Blending the spirit of Kungfuscius with the charm of Sunny from our first Quest game Monkey Tower. I wanted a joyful, expressive companion far from the uncanny valley. I modeled, rigged, and animated the character entirely in Blender.
The breakthrough moment came when everything finally ran together: MB moved and spoke in sync. Objects were recognized. Gestures registered. Speech was understood. That was when I knew we were actually going to pull this off.
The tech stack was enormous for two people in under a month.
Getting all systems to work together – CV, gestures, TTS, speech scoring, MR UI – was far harder than validating them individually.
Hand tracking required careful tuning to avoid frustration.
Mixed reality UI composition needed precision to feel integrated.
Pronunciation grading had to feel supportive, not punishing.
Designing lessons tied to physical environments introduced new UX questions we had never considered before.
We built a fully functional AI MR tutor in less than a month.
Monkey Bot feels expressive, responsive, and emotionally engaging.
We created a seamless interaction loop using CV, gestures, and speech.
The experience genuinely teaches faster because it is grounded in the real world.
We proved that mixed reality opens the door to meaningful, human-centered learning.
And we turned a fictional concept from a silly action comedy into something that actually works.
Riko arrived in Germany, and found the app really useful when preparing for the trip!
Vocabulary grounded in real objects dramatically improves retention.
Hand tracking can feel magical when designed with forgiveness.
Cloud plus on-device AI gives the best mix of speed and intelligence.
Small touches like weekly tracking measurably boost engagement.
Most importantly: learning feels more human when tied to your space, your actions, and your voice.