Inside the Tech is a blog series that accompanies our Tech Talks Podcast. In episode 19 of the podcast, International, Roblox CEO David Baszucki spoke with Product Senior Director Zhen Fang about Roblox’s International strategy, and the technical challenges we’re solving to ensure a localized experience for tens of millions of people around the globe. In this edition of Inside the Tech, we talked with Engineering Manager Ravali Kandur to learn more about one of those technical challenges, multilingual and semantic search, and how the Growth team’s work is helping Roblox users across the globe search for—and quickly find—anything they want on our platform.
What is the biggest technical challenge your team is taking on?
Until about a year ago, Roblox search used a lexical system to match results to users’ searches, meaning it focused solely on text matching. But search behaviors are changing quickly and that approach is no longer sufficient to give users relevant content. At the same time, some Roblox users may use incorrect spelling in their queries. So, we have to be able to suggest results that match what they’re looking for, which means understanding their intent.
Another major problem in search is a lack of training data across languages. Before semantic search, our first step was to leverage machine translations within the Roblox system. We indexed the translations and then did a text match. But that isn’t sufficient for always showing users relevant content. So, we’ve adopted a more state-of-the-art ML technique called a student-teacher model: the teacher learns from our biggest source of context for any specific scenario.
English is the most used language on Roblox, which is why we learn as many semantic relationships as we can in English—the teacher model—and then we distill it to the student model by extending that to other languages. This helps us solve that problem even though we don’t have a lot of data in certain languages. This has led to a 15% increase in plays originating from search in Japan.
We’ve recently been working to better support our of catalog queries like “đua xe (racing).” But users are more frequently submitting long, freeform queries, like, “Hey, I remember playing a game where there was a dragon and a girl fighting with it. Can you help me find that?” This presents more technical challenges and we’re continuing to improve our systems along these lines.
What are some of the innovative approaches to incorporating more context and more semantic search?
We’ve built a hybrid search system that takes lexical search and combines it with ML techniques and models utilizing semantic search and the understanding of a query’s intent. We’re continuously evolving our systems to build context understanding, handle complex queries, and return relevant content.
The magic of semantic search is in the embeddings, which are rich representations of a variety of signals we get from all across Roblox. For example, we’re incorporating signals like user demographics, a user’s query, how long it is, or what its unique aspects are.
We’re also looking at content signals, like experiences, avatar items, and engagement—how often was this game played or how many users did it have, and from how many countries? There are also things like monetization and retention, as well as metadata like an experience’s title, description, or creator. We put all of these through a BERT-based, transformer-based architecture and we use a Multilayer Perceptron at the end to generate embeddings, which become our source of truth.
Another innovation is our in-house similarity search system. When someone makes a search query, we retrieve the closely-related embeddings, and rank them to be sure they’re relevant to what the user is looking for. And then we return the results to users.
What are some of the key things that you’ve learned from doing this technical work?
Every language presents its own unique challenge. And especially with search, we need to understand what users in different parts of the world are looking for so that we can show them the most relevant results. We have to understand different language elements. For example, pre-trained transformers have been essential to understanding the multiple dialects of Japanese.
Secondly, search query patterns have been changing quite a bit and we have to continuously evolve our technology stack to keep up. At the same time, we need to inform our users about what is possible on our platform, as they may not realize it. For example, we could tell our users that search can support things like freestyle queries (such as racing games or popular food games) and that it understands what people are looking for and can return appropriate results.
Which Roblox value does your team most align with?
Taking the long view is core to our team and it’s one of the reasons why I love working at Roblox.
One example from my team is our tech stack, which consists of our ML- and NLP-based search systems—semantic search, autocomplete and spelling correction using pre-trained large models.
We’ve built this with reusability in mind across different types of searches made by our tens of millions of daily active users. That means we can plug in a different type of data (for example, avatar items instead of experiences), and it should work with very minimal changes.
We’ve incorporated semantic search for experiences, and we’ve shared it with other verticals like Marketplace, and they’ve been able to just jump on the existing architecture. It’s not perfectly plug-and-play, but with some fine-tuning, we can adapt it across different use cases.
What excites you the most about where Roblox and your team are headed?
Search is the only surface where users express their explicit intent. And that means it’s essential that we understand what they want and give them the most relevant results. So it’s really exciting to me to work on understanding that intent and educating our users about what is possible, sometimes even before the user realizes it.
A user in any country can ask something and we can give them exactly what they want and that’s most relevant to them. This builds trust which, in turn, improves retention. It’s exciting to me to take on the challenge of improving search to build that trust and help Roblox achieve our goal of having a billion users.
The post Inside the Tech – Solving for Multilingual & Semantic Search appeared first on Roblox Blog.