Archive for October, 2006

Automated Testing

Sunday, October 8th, 2006

Having been responsible for setting up our distributed build system and automated test system at Nihilistic, I have some personal interest in test systems for game code. Before working in the game industry I did some time as a corporate software programmer where extensive automated testing was used. It seems like a dry subject but in general I think automated testing is a useful addition to the test process at any company who is developing a large software project. In games specifically, there are some fairly large testing challenges due to the interactive and often times unpredictable nature of the gameplay. I’ll discuss how the test system works at Nihilistic later in this post, but first let me enumerate some different types of testing that we do.

The first type of testing I’d like to discuss is so-called “smoke testing”. The idea of smoke testing is to do a pass at all the general features of a software product to make sure there are no serious defects in a build. In other words, it’s a consistency check. It can take the place of a tester whom has to load every level in the game to make sure they are all currently launching, for example. At Nihilistic, we run this test nightly on the target hardware for our game (XB360, PS3). Every morning we come in, we can sit down and see which levels are crashing and immediately go to fix them without any need to wait for the testers to examine all levels on all platforms. When a level is not loading, it’s reported in the nightly test report.

The next type of testing we do is automatic benchmarking of performance and memory. During our nightly test process, we also probe our memory manager and performance tracker to get specific information about what is occurring in a level after load. We test all levels in this way as we smoke test them. Since devkits often have more memory than production machines, it’s useful to know what areas are over memory budgets to get an idea of how much and where we need to trim memory back. We can also report average FPS over a few hundred frames, number of objects rendering, number of lights affecting the scene, number of polygons, etc. All of those metrics are also useful to see at a glance, to know if the artists are killing the performance of a level by using too many spotlights, for example. All of this data is spit out by our test system into an Excel spreadsheet, with over budget items highlighted, and is sent out on e-mail to the parties interested in the data.

The issue I’m having with performance testing is that it can be difficult to get a good sample set of data – to get full data on a level you’d need the player to navigate through the entire level. Since we’re doing a third person action game with long winding levels this is especially important. Right now we don’t have a good way of doing that… I’ve toyed with the idea of recording a real players input as they navigate a level, but with random decisions being taken by the AI on subsequent loads, playing back that data may not result in consistent results of getting the player through the level. We could also teleport the player through the level at checkpoints, taking metrics at each point, but that wouldn’t be representative of a real play through since it wouldn’t trigger all script code, etc. Perhaps a full record of both the player and AI input would allow us to play it back and test the whole level?

The last type of testing that is definitely useful but rarely used is function level unit testing. Whether it’s lack of time or perceived value, it doesn’t seem like this type of unit testing is used much in the game industry. In my opinion, it’s usefulness really shows itself when someone goes to modify some code that was written say a year ago. When a suite of tests exists for that code, it can be executed on the newly modified source to determine if any errors had been introduced by the modifications. The biggest problem with implementing unit testing in game code is often that the results of a function can be very difficult to verify, especially when multiple systems start working together. For example, how would you test that a player control system is working correctly when it’s primarily based on how it feels to a human?

Now that I’ve listed some testing types, I’d like to share a bit about how our testing system works at Nihilistic. All tests and builds can be requested over the network from anyone’s PC to the build servers and the tests are run every night starting at midnight. This request system is a client -> master server -> build server setup such that there is one master server and can be many build servers to fulfill requests. All of the client/server request code is written in C# due to it’s ease of use. The actual build scripts which launch the compiler and the testing application are written in ANT XML. Finally, the testing application is a C# app as well that launches the game iteratively into each level. Once the game is running it connects to it using TCP/IP (almost all of our tools can connect to the game as it’s running on the target platform, more on that in another post) and then requests data back from the game. Once data for all the levels has been collected, the testing application dumps out a nicely formatted Excel spreadsheet and e-mails it out.

This system saves us a lot of time, especially near the end of the project when we are more focused on stability and performance. If you’re not currently using an automated testing system I’d highly recommend looking into developing one as the time to implement the test system is nowhere near the time saved by having it.