Quantizing and Sound Playback Latency

Sound playback latency on PCs sucks. Or I am doing something completely wrong.

For a while I have been working on and off on the audio tech for a game I am making in my free time. The game part hasn’t progressed much due to getting hung up in the minutia of sound programming.

The problem is that I want millisecond accurate sound quantizing – meaning that I want to play a sound back exactly at a specific time. The purpose of this is to string a large set of separate individual sounds together to make music. For example, playing a kick drum on the first and third beat of a bar, a snare on two and four, and a high-hat on every eighth note.

What you end up with is kind of a mess due to variable sound latency in the API, OS, and sound driver. When my program says “play this right now” it’s impossible to know when the sound will actually be audible at the speakers. It kinda sounds right but there is audible error. The error becomes more and more obvious when trying to align two sounds on the same beat. I don’t mean playing two sounds on the same frame in code – that gives good results. What I mean is that if I have a beat loop playing for quite some time and I’m tracking it on my internal timeline then trying to play other sounds that match to that loop is a challenge due to variable latency.

Here’s a visual illustration:

Basic timeline w/ eighth notes as smallest division.

Times I play back sound algorithmically.

What you actually hear.
Notice how in the final image the distance between the timeline position and when you hear the sound is different every time a sound is played. You can’t accurately reverse compensate for the error, i.e. by playing the sound slightly before you actually want it to be heard, because the latency is always changing. (Framerate is a component of this latency and can be compensated for, if your framerate is generally stable, but it is not the only component.)

I have written this piece of software twice, once with C#/XNA and once with C++/DirectSound. XNA would be ideal but there is an additional 10 milliseconds or so latency on top of using DirectSound. Regardless, neither work to my satisfaction.

In normal game scenarios this variable sound latency, which averages around 50 milliseconds in a rough estimate on my questionable hardware, is unnoticeable. This is simply because the sounds are independent events that do not need to be timed and aligned with other events in the game’s sound environment.

I’m thinking this kind of accurate system can only work where the software is filling out a sound buffer ahead of where the buffer is being played back in the sound file. A dynamically generated sound stream. That rules out using XNA unfortunately which offers no low-level sound buffer access through its API. I’m not sure how far ahead the software will need to write into the sound buffer to stay in front of the playback head. If I can guarantee a stable framerate theoretically it would be just 1-2 frames. To account for spikes in framerate or OS unpredictability I’m sure being further ahead is necessary. This also opens the door to having to mix all the individual sounds myself in a single sound buffer which sounds like a nightmare.

The whole point of my game idea is that I want realtime musical response to what the player is doing in the game. Not like Guitar Hero or a dynamic music system that fades between a few preset tracks. Rather, to capture the feeling of the player creating the music, at a micro per-instrument level, through their play in a fast paced action game. Changing the music changes the game state, play field, and visuals. Therefore I need to be able to change the music both quickly for realtime response and to ensure that my changes are on time and harmonize with the other sounds.

I want to capture the feeling that the player is both playing this game but is also acting as a creator within it, directing the music and visuals as they go. Hopefully I can find a solution that will let me get past this and finally delve further into the game and sound design which is what I really want to do be doing.

3 Responses to “Quantizing and Sound Playback Latency”

  1. Kjell says:

    One of the simpler "tricks" you can apply is to set the starting position of the to-be-played sample to the amount of milliseconds you've gone past your cue point. Depending on the sample ( excluding extremely short samples & sounds that are prolific in the first couple of ms ) this might not be noticeable at all .. given that your running at a somewhat stable and high-ish framerate.

  2. Benjamin says:

    Your description sounds a lot like
    Rez.

    I remember there was a direct correlation between the frequency of button presses, your relative success in the game and the response of both the audio track and the visuals. These all worked together quite nicely and I don’t remember noticing any latency.

    I can’t speak about PCs because they are unreliable and especially unsuitable for real-time applications, medical equipment or nuclear power stations ( have you read the NT 4.0 EULA recently ), but at a recent sony seminar there was mention of a magic number of ms on consoles that you must update your sound playback in order to avoid any noticeable latency or produce unwanted clicks and other audio artifacts.

    That being said, I think you’re on the right track thinking about a streaming mechanism. I’m not very familiar with any current sound APIs but from a naive standpoint it sounds like if you had some sort of synchronization buffer from which all tracks were synced to this would be helpful. What I mean by this is an event buffer that had slots at fixed time intervals. Whenever you inserted a new sound or multiple sounds into a slot you could be guaranteed that they would fire at the same time.Also you would want each track to be divided into the same intervals so that every thing syncs up nicely. I haven’t thought this through fully, but it’s an interesting problem, not having really done much audio programming.

  3. Mark Cooke says:

    @Kjell – that totally makes sense, thanks for the tip.

    @Benjamin – Rez is definitely an inspiration. I love the atmosphere and the way the music and gameplay are integrated. I have the opportunity to meet with Mizuguchi-san (the designer) which only furthered my desire to give this a shot.

    I’m starting to veer into a different direction though. I was using lots of short loops that I made and then swapping between loops as the user does things in my game. That was working OK but I started experimenting with using individual instrument samples – like one kick drum or snare drum.

    I’m not sure how workable this will be – I have sound system performance concerns – but I’m going to see if I can generate much more of the music mathematically at run time using smaller bits of sound.

Leave a Reply