Inside the Tech is a blog series that goes hand-in-hand with our Tech Talks Podcast. Here, we dive further into a key technical challenge we’re tackling and share the unique approaches we’re taking to do so. In this edition of Inside the Tech, we spoke with Growth group Technical Director Ivan Marcin to learn more about matchmaking on Roblox.
What technical challenges are you solving for?
Matchmaking builds the services that match Roblox users to an experience server in the join process. When someone wants to visit a Roblox experience, we look at thousands of data points from multiple Roblox engine instances and rank them to make that match. Roblox is unique because people and places are changing constantly, and the system we’re building has to account for these fluctuations.
To do this, we have to develop the technologies to solve two challenges that are key to maximizing user satisfaction. The first is determining how to track and rank the places we match people to in real-time. The second is optimizing matchmaking for efficiency at scale. This hybrid system needs to match our millions of concurrent users to experiences with minimal latency while also orchestrating Roblox engine instances across our fleet of edge data centers. That’s what drives maximum engagement.
The process has numerous complexities, but a good example of a particular challenge is what’s called the “thundering herd problem.” That’s when our systems see massive spikes of load in a short period of time. For example, when millions of people attempt to join a popular experience at the same time on a Saturday morning.
In those cases, we may see a quick 10x jump in requests. This sudden increased pressure stresses our systems and in the past, these types of events had brought the platform down. But now, many Roblox experiences have this type of special event, limited release, or update. While it increases engagement, it also forces us to be ready to handle regular thundering herds.
Is the thundering herd problem something that other social networks and platforms have?
Any platform can face a sudden massive surge of users. But it’s particularly challenging for us because of our scale. A limited item launch may be just a one-time event for an experience, but on Roblox there are millions of experiences and many have popular events like these. So for Roblox, thundering herd incidents aren’t rare, isolated, or predictable. They can happen at any time across any of our experiences, and we need to be ready. We’ve hardened the matchmaking and other systems to be more reliant towards these patterns.
What are some of the innovative solutions we’re building to address these challenges?
We needed to build a custom lookup and recommender system that’s constantly indexing Roblox experiences and matching people to them in real time.
To send users to the best place and handle the thundering herds at any time, anywhere across Roblox, the system considers inputs like users’ state, location, latency, and other player properties. It also has to track and refresh the state of all Roblox experiences every few seconds.
From there, we need to generate these match recommendations in real time. With many traditional matchmaking systems, users connect and wait in a virtual lobby for the game to launch. That can take several minutes, but on Roblox, we need to send people to the right experiences the second they click the join button.
To do this requires building an experience system that reindexes our data every few seconds. Doing this at scale is a key challenge because we can’t use standard distributed systems techniques, like relying solely on caching, to handle load spikes. Instead, we relied on building a custom indexing system. Every Roblox engine instance is constantly pushing data into this system. Any experience join request scans the properties of every active place, ranks them across multiple indexes, and makes a recommendation of where to send the user based on what’s happening at that exact time.
What are the key learnings from doing this technical work?
One of the key learnings from doing this technical work is that we need to look at things from a balanced perspective. We’ve been working hard on improving our platform’s reliability but we’re also developing new features that will improve the user experience over the long term. It’s like a pendulum swinging back and forth because change is constant. We have to be able to learn, adapt, and figure out what we can do in the short-term while building for the long-term.
Take, for example, how we handled the thundering herd problem. Our developer community realized they could leverage hype on weekends to attract users to their experiences. This resulted in masses of people joining experiences on Saturday mornings. So we had to shift our engineering plans, as that scaling challenge wasn’t something that could be easily solved. When content is static, you tackle this by adding caching layers on top and by provisioning capacity for peak use. But the real-time nature of our systems meant rearchitecting our indexing and scanning systems to divide the lookups and scale our concurrency.
Which Roblox value do you think best aligns with how you and your team tackle technical challenges?
Respect the community best aligns with how our team tackles technical challenges. Our community is made up of both the users and the creators who make experiences and push our technical requirements. Both are equally important. So when we change something, we have to be very thoughtful about how it impacts everyone.
For example, if we’re considering modifying something like the APIs that impact teleporting, we have to understand how it will affect both users and developers. We spend a lot of time thinking about how we get people to play the right game, but also how to give developers more options and controls. We regularly reach out to developers to brainstorm new features with them.
What excites you most about where Roblox and your team are headed?
Three things. First, I’m impressed by our tremendous growth. The second is the potential of creation and innovation on Roblox: people are constantly coming up with new ideas and experiences, and pushes us to be creative as well on how to scale to that creativity. Third, AI/ML is booming, and Roblox is right at the forefront of this wave. For example, we’re integrating further ML into matchmaking, and generative AI in other unique and cutting edge ways at Roblox. It’s truly exciting.
The post Inside the Tech – Solving for Matchmaking on Roblox appeared first on Roblox Blog.