Momchil Atanasov

Posted on Apr 1

[Gaming] ARAM - rigged or just unlucky

#analytics #data #go #watercooler

Preface

I questioned whether to write this article as I mostly explore topics about software development and GoLang. However, this article has given me the chance to play around with the go-echarts package, and if you are a fan of charts and stats, you might find this interesting.

Background

A while back a few friends of mine introduced me to League of Legends - Wild Rift. I have always been told to stay away but since this is the mobile version of the game (quicker and simpler to set up) and there were friends to socialize with, I gave it a chance. Once I started getting the hang of it, I got hooked.

And while the game can be extremely fun at times, it can be just as frustrating, if not more - lousy matchmaking (though still better than CS2 but I digress), glitches in auto-aiming where the champ won't shoot or tries to target an enemy out of range at a critical moment, being forced to play your worst lanes, etc.

But the most frustrating for me has been the random system. For a while now I have mostly been playing ARAM. It is a 5v5 game mode where each player gets a random champion assigned. If a player is not satisfied with their pick, they can request a new random champion that has not been picked yet, by "rolling a die". They can then roll a die again if they are still not satisfied. The dice are limited - you accumulate them with time, so players use them sparingly.

For the longest time I have been getting horrible picks. It would always seem to be the champions that I hate playing and never the champions that I enjoy and that I have purchased/earned expensive skins for.

It has been known for a while that free-to-play live-service games employ tactics (some borderline immoral) to keep player engagement high. If you are curious, here are a few good starter videos on the subject, though there are more out there.

So when I managed to roll the same champion 5-6 times within 10-12 games in a row in a single day, I decided that something must be off and went on to prove it.

Tracking my Games

My main idea was to start tracking my picks in a ledger in order to verify whether they are fair or not. So at the start of every game, I would record my picks.

CAVEAT: The game features a wild-card mechanic where a special die would allow you to select one out of 5 random champions. Since those are rare and are hard to track, I decided to consistently ignore them, regardless of what they contain.

In the end I managed to record my picks for 163 games, without gaps. At that same time the game had 136 unique champions. That sample size should be sufficient for a meaningful comparison, and I have no plans on tracking any further.

Creating a Simulation

Just by looking at the ledger, it is hard to figure out if clusterings in the picks are intentional or legit random (unlike blue noise, random noise produces patterns).

What I ended up doing was to implement in Go code the champion selection process the way I believe it to be fair (how I would do it if I developed a similar game) - using cryptographic random - and run simulations to see how they compare.

CAVEAT: As I mentioned, players use their dice sparingly. So I had to model that somehow. From experience, players would most often use one die each game. As such, I have chosen that ~68% of the times a player would use one die, ~16% of the times would not use any die and again ~16% of the time they would use two dice. These percentages are modelled after a normal distribution centred at one die per game.

You can find the code for this in the internal/game package of my fairmoba project. I have opted to keep my ledger and champion ratings (discussed later) private. Furthermore, I have left it generic (haven't hardcoded the champions) so that it can potentially be used for other games that follow a similar pattern as well.

I would run 100,000 simulations to be used as comparison.

Tracking Pick Count Distribution

As mentioned, it was common to roll the same champion over and over again. So I wanted to create a histogram to visualize that. However, instead of visualizing how many times a particular champion is picked, I would visualize how many champions end up being picked a particular number of times (e.g. how many champions are picked once in total, twice in total, etc).

Using the recorded ledger, the distribution looks as follows:

As can be seen, most champions occurred one, two or three times. There was one champion (Jax) that I had picked eight times and one (Twitch) that I had picked seven times.

And here is how the distribution looks for the average of all 100,000 simulated games.

It appears to follow a bell shaped curve. Let's look at both of them side by side and see how they compare.

While there are some differences, the distribution of my ledger is quite close to the ideal distribution, so there does not seem to be any manipulation here.

Unfortunately, the ledger does not include an outlier like that case with the 6-8 picks within 10-12 games I mentioned (and others that had occurred previously). So with the data I have, there is no proof of any type of tampering.

I was seeing a similar result as I was collecting the ledger and I was almost ready to give up half-way through.

However, I kept having that nagging feeling that something was off. Even if the distribution was correct, it always felt like I was just not getting good champs. This is when I got a better idea.

Rating the Champions

I decided to rate the champions in terms of how much I'd like to play them and then compare how much better or worse the simulated games are compared to my ledger.

First things first, I categorized all the champions in four categories.

Rating	Score
Undesired	-1
Neutral	0
Acceptable	+1
Desired	+2

For reference, once I counted how many of the champions fit into each rating category, I got the following histogram.

There are 83 champions that I don't enjoy playing, 14 that I enjoy playing and 4 that I can't wait to play. The remaining ones are "meh".

Clearly, I am more likely to pick an unenjoyable champion than not by a factor of 2. So you may think that this explains it - end of story. Well, not quite.

I wanted to see how this compares to the simulated games. So what I did was to calculate a score for each game sequence. I would add the ratings of all picks and divide the sum by their count in order to get a normalized score. I could do that for both my ledger and the simulated runs so that I could compare them.

Side note: The simulated runs have the same length (total games) as my ledger, so normalization is not really necessary but it produces more meaningful values.

Once I had the scores for each run, I could compare them and figure out how many simulations are actually worse than my playthrough.

It turned out that this percentage changed as the ledger was collected. So instead, I created a chart that would track it along the way.

In an ideal world, the squiggly line would dance around the 50% mark. But as you can see, it is significantly below.

So what does that mean?

If we look at the value at game 67 it is 1.5%. This means that if we were to simulate a random sequence of 67 games, 98.5% of the time it would be more enjoyable than what I had experienced by game 67.

As the ledger grew to 163 games, the percentage grew to ~20%, which is much better, though still meaning that 80% of the time a simulated 163-game sequence is more enjoyable.

But in the end, the line is still well below 50%. So in a sense, my feeling that something was off wasn't completely unfounded.

Summary

After having tracked my champion picks for 163 games in a row, I managed to compare the experience to a large set of simulated scenarios. The end result is that I am well below what is expected on average.

Does that mean the game is rigged? Not really. As with anything random related - it could be due to chance. The only way to verify that would be to continue tracking this for much longer, which I don't plan on doing, or have a large number of players do the same, though I don't see that happening anytime soon.

Furthermore, as explained above, I have made some assumptions (in terms of die usage behavior) and have skipped wildcard dice. This might have some impact to the results. Not to mention that it's possible I got the math wrong or the simulation has bugs.

Lastly, I don't pretend to hold the moral high-ground here. While I have played a lot of decent games, where I've given it my all, I have also trolled a fair number of them as well. In fact, just to collect these 163 games here, I have been speed-running some of them (especially when I was at the 1.5% mark with only this "research" to keep me motivated).

All in all, this has been an interesting exercise, though extremely time-consuming, so I have no plans on continuing it further, and at least for me it makes it clear that investing much time and money into this game isn't really worth it.

Have you ever tracked your own in-game data? I would be curious to hear whether others have had similar experiences.