Part 5 | CI strategies, how Embark branches
Arvid Burström, Technical Animation Director and Patrik Åkerberg, Tech lead Tools & CI
Installments
Part 5: CI strategies, how Embark branches
In this series of posts, I will outline how our CI system helps us ship high quality content at a fast pace. If we refer back to the terminology from my colleague Arvid’s blog posts, this article series would be overkill if your team is small enough that you’re within shouting distance in the office. It does apply if you’re more than say 15 devs and it scales well into the hundreds of developers (like Embark).
Where Arvid left off, our development team is growing and we need to find some way to share and coordinate work between a lot of people. A key to this is that people work on the same branch as much as possible and that something preserves the quality of that branch. This is where powerful CI systems come in.
This might get a bit technical at times, but I hope to make it understandable even for readers who don’t work on this stuff day-to-day.
Client-side vs Backend-side CL
Before starting off, let’s go through the basics of CI. Like all multiplayer games, our games consist of a client that the user downloads, and a server which runs in some data center. There’s also a large number of additional backends that handle matchmaking, user inventory, events, and so on. Server-side CI and client-side CI are different, mainly because of the cost of deployment.
Backends are generally updated on every code change (this is called continuous deployment or CD) because you can relatively easily shift traffic onto new instances, and it’s easy to roll back.
Clients are expensive to deploy because customers will have to download them. Reverting or hot-fixing is expensive for the same reason. There’s a balance here: we want our players to download as little as possible but also get fresh and high-quality content regularly.
Game servers are a bit in between the client and the backend. They are much easier to deploy than a client, but we can’t kill ongoing games so there’s a bit more delay than a normal request-based backend.
We’re going to talk about clients and game servers in this article series.
CL Strategies
There’s no one way to run CI, and most studios do it slightly differently. In the image above we show the main branch, sometimes also called trunk, where developers check in their changes. Here CL is shorthand for “changelist”, i.e. a change in the game content or engine code. Some common strategies are:
Absence of CI: Devs just sync and compile the games locally and hope nobody broke it recently.
Post-submit only: Devs test locally and check in and somebody keeps an eye on an automated build.
Pre-submit + Post-submit: Changes don’t land on main until they pass at least some of the build steps (like compile + run tests).
Some studios mitigate the lack of presubmit with branching strategies, i.e. main is often broken but there’s a stable branch that gets updated from main every now and then after QA tests the build. This protects the users on the stable branch from e.g. the broken CL3, but the downside is that it can take weeks before they see the latest work on main.
See the figure on the below:
Embark’s Strategy
We do the pre-submit+post-submit strategy without branching. We believe this is the fastest way to let the rubber hit the road and get changes out to our players. All our games use the same main branch and game engine, everybody works on main, and we try to keep the quality high so we can always branch out and stabilize a client in a couple days. We keep quality high like this:
We use Unreal with Angelscript support, and we built a test framework for Angelscript, so much of our game logic is unit testable. This would have been way harder if it was all blueprints. The framework also allows integration tests where we load a bespoke level for the test and play out some game scenarios, like “weapons fire bullets and they damage player characters”, or “if an explosive barrel takes enough damage it will (eventually) explode and deal damage to actors nearby”.
We have a strong suite of asset validators that enforce asset conventions and invariants (“backpacks must have > 0 slots”). Asset validators are surprisingly useful to gradually tighten up correctness and prevent most asset errors before they even land.
Our CI is well enough optimized to run almost everything in presubmit, so obviously bad changes are very unlikely to land on main. We have hundreds of windows+linux machines in the system and the presubmit is highly parallelized, which means we can do a lot in short wall-clock time.
Multiple playtests per week and game with studio employees
This strategy helps with the coordination problems outlined in Arvid’s article Adding more Developers: CI pumps out editors and builds for minimal engineer/artist friction, while the tests and validators ensure devs are less likely to be hampered by editor or game crashes/bugs.
Stabilization Branches
Paradoxically you can have too high quality on main. It sounds attractive that you can just build any CL and ship it to players, but developers would have to test for days locally before submitting, and the presubmit would take hours to run. There is a tension between innovation and quality. For us it’s enough if we can get a good client out of a couple days’ worth of playtesting and bug fixing.
We once were in a rough spot where we were unable to even playtest ARC Raiders for a period of two weeks. You could not nearly get through a round. That’s an example of too low trunk quality. It might be acceptable early in development but even then it’s not good. You need to playtest regularly or you have no idea what the game actually is. Unit and integration tests can catch a lot of simple errors but are no replacement for real playtests - you need both!
We use throw-away stabilization branches:
Pick some recent CL on main that looks promising
Playtest with the game team (and the occasional CI engineer 🙂)
Fix bugs on main
Copy the bug fix to the branch (cherry-picking)
After creating the stabilization branch we can start working on the next release on main. We can mostly avoid code freezes because we know our quality is high enough. If we have a big feature that develops over multiple releases, we can hide it behind a feature flag until it’s ready to be turned on in internal playtests and later shipped.
You can contrast the feature flags approach with Feature Branches as described by Arvid in Branching. Branches can be useful for throw-away prototypes (and prototyping is important in games!), but you get a number of advantages with having everyone on main: you avoid merges (which are notoriously difficult to get right), everyone is looking at the same content, and you can focus your CI resources on main.
Now you have an overview of our CI strategy, next installment we’ll get more into details how it is to make game builds on Unreal!