top of page
AmnesiaSound Logo

Dialog editing is an entire art in itself that I believe everyone is constantly learning and adapting for every project. Streamlining the editing process can make the whole pipeline run smoother so that the recorded dialog is heard in your project as quickly as possible. Editing dialog generally has two phases: cleanup, and then enhancement.

The first thing I do before editing the dialog is set up and organize my project. The setup involves arranging the microphones to be on two separate tracks (generally with motion capture or studio recording sessions, there is a primary microphone and then a secondary safety microphone in case the primary microphone clips or has an unwanted noise), splitting the recording by character onto separate tracks, creating empty "dummy" tracks beneath the main edit tracks, and creating a grouped track with that has the raw audio duplicated. This setup allows for any non-destructive edits I do (cuts, trimming items, etc.) to become mirrored across the raw audio and the main track. At some point in the process, I usually have to do destructive editing to the audio. Using a setup like this allows me to quickly reference or revert to the raw audio and bring a segment to the main edit track without having to recreate any non-destructive edits or search for the raw audio files. At this point, it can be helpful to create your own reusable piece of room tone by editing together segments of clean room tone from the recording - 15-30 seconds should be plenty.

Once the initial setup is finished, I like to start with two basic filters: a high pass and a low pass. These will get set depending on the voice and the overall style that the dialog should be in. The high pass filter is set up to clean up any low rumble, noise, or anything that is below the fundamental of the spoken dialog so that no unwanted low frequencies are causing potential problems. The low pass filter is something that I will typically set at different spots depending on how “airy” the dialog should be, it would be as low as 12khz if I want a more focused sounding line or it would be set at 17khz or higher to preserve some of the high air so the dialog sounds like it is coming from higher up. Once the initial setup is finished, I like to start with two basic filters: a high pass and a low pass. These will get set depending on the voice and the overall style that the dialog should be in. The filters will generally look similar to the ones below.

After the filters, I begin the first phase of the edit - the non-destructive portion. It involves going through the audio and cutting out unwanted noises, removing long gaps between lines, assembling alt takes to the dummy tracks (also called stacking), and trimming the start and ends of the lines to start exactly when the line (or breath) starts. In Game Audio, the dialog is usually implemented as separate lines (except for cinematics, usually) and is triggered programmatically. Keeping the edits tight allows for lines to trigger exactly when you want them to without any unwanted spaces before the line begins.

There will generally be a take sheet from the recording session that notes which take to use for each line. If the main take to keep has an unwanted characteristic for a single word, you can often edit in that word using one of the alt takes that conveniently are on the dummy tracks below the main edit track.

While doing this initial edit, there can be a handful of various issues that would need to be addressed. Some of these include distortion, unwanted noises between words, or even a word that is not intelligible enough within the take to use. The best approach to this is often to sneak in a word from one of the alt takes (usually it will sound more natural). Other approaches to this are editing in the backup microphone for that word or line (be sure to level and tone match whenever you are doing this to prevent any jarring changes), replacing the noises between the words with the clean room tone that was created initially, or as a last resort, using a destructive process to fix that word.

When switching between takes or the backup microphone, the listener is more likely to hear an abrupt start over an abrupt stop. With this in mind, using a long fade-in at the start of an item and a shorter fade-out to the next item you're blending into allows for a natural-sounding edit that will usually be transparent.

Once the non-destructive portion is complete, the next step in how I approach dialog is to begin the cleanup phase - which tends to be more destructive. There are a handful of different tools that can be used for this. I use iZotope RX but any option that can get the same result will work. Once the line is in RX, I’ll start by cleaning up any unwanted mouth noises. There can be times when mouth noises are part of the performance. It is best to use your judgment on how many noises you want to remove to best fit the performance. For the cleanup process itself, I will generally listen through line by line and highlight the noise like the one below and then use Mouth-DeClick and remove the click. Other times when timelines are tight, sometimes I would use a tame setting to catch the largest clicks and then run all the dialog with this setting. RX comes with many modules that can be used to solve various issues. It is best to use your instinct and ear on what to do on a case-by-case.

Another consideration to keep in mind when editing mouth noises is to listen for any nose knocks or nose whistles. To remove these, I find using spectral recovery works best. However, Austin Mullen has a fantastic mini-video series highlighting how to remove these and it is worth a watch.

Often people will want to know about noise reduction at this point, in my experience it is something that should be reserved for when the recorded audio is in less-than-ideal conditions. If you did have to do noise reduction, using spectral repair and delinking the voice and tonal sensitivity tends to work best in my experience. Too much reduction can suck the life out of a performance and leave it in an unnatural void of silence so it’s best to listen closely for any unnatural artifacts (especially in higher frequencies).

Once the dialog is cleaned up in RX, it then transitions from the cleanup phase to the enhancement phase. The goal here is to control the dialog a bit more, bring out more of the performance, and make it sound as good as possible no matter where it is being heard. The first thing I will go and do is perform item/clip gain on the dialog to turn down the sibilance by a little bit so a later de-esser does not have to work as hard. There is a handy script for Reaper that can do this quickly called: Script: gen_Envelope-based Deesser.eel While turning down the sibilance, I will also be turning up or down the quiet/loud bits of dialog so a compressor does not have to work as hard. This is also a moment where if you want to make a line sound bigger or softer, changing the volume of those lines can have a very dramatic impact on the emotional delivery. For example, having the line slowly come up in volume as its being performance (by only a few dB) can make the line come across more powerful. At the same time, the opposite can make the delivery sound weaker. After doing the item automation, I’ll then set a gentle compressor to “glue” the dialog together and control the peaks, often I will have different attack and release settings for different performances to change how punchy or aggressive the line comes across. For example, a slower attack will make the line sound more “in your face” whereas a faster attack will make it sound smoother. The goal here is around 2-3dB of reduction on average so that it is transparent - unless you want to hear the compression effect.

Since compression tends to bring out the subtle details tone-wise of a line, I like to do a final EQ pass for any extra “corrections” that are needed from the compression. Slower attacks tend to make the voice brighter since it’s not latching onto higher frequencies as much. This is also where I will do some dynamic eq to control the low and low mids of the voice of the performer going up and down their register a lot. This allows control over the amount of bass in their voice or even control the amount of proximity effect from the microphone. to control the amount of bass in their voice), if they were moving around a lot in front of the mic, or to correct any “boxyness” in the recording. I’ll often also turn up or down somewhere between 2-5khz depending on if the line needs to sound more aggressive or soft to fit the context. Turning up or down ~1khz can change how focused sounding the line is as well.

Once the edit for that line is complete, I will create a render region with the correct filename and then move on to the next line.

All and all, this all changes from project to project, performer to performer, and even recording to recording. It’s best to use your judgment on whether something needs to be done to each line to bring out the best of each performance. I hope all of this is helpful or inspires you to try something new - happy editing!

I have been listening to the soundbytes podcast lately and I was incredibly inspired during episode 51 by something Barney had talked about when it comes to learning a new skill. To summarize, try doing the thing you want to learn in small chunks of time but every time try doing it differently. In my case, I have been wanting to get better at synthesizing laser sounds and to add to the challenge I only used 1 plugin to do it. Thankfully, I have also been wanting to learn phaseplant so it worked out.

Over the course of 2 or so weeks, I decided that every morning when I got to my desk, I would challenge myself to create a laser using Phaseplant by the time I finish my coffee (give or take 10 minutes). These short sprints allowed me to accept it won't be perfect and to try a new idea each time without getting too caught up in perfecting the sound. Surprisingly, some of these have already made their way onto a project so I suppose it worked out.

Here is a link to listen to some of the sonic explorations I came up with:

I am now working to get better at designing general synth textures and impacts so I could have more unique sources to use when designing sounds.

I hope this inspires you to explore new ways to create sounds!

For the work, I have been doing for the projects Super Auto Pets and Joon, a common challenge in the sound design for each of these projects is creating sounds that are exciting to listen to but have a calm feeling to them. While both of these games are vastly different in their design, they have this shared audio philosophy that I wanted to share and how I have been able to work through that challenge. The way it's done will be different depending on what you are designing but trying out some of these ideas may just do the trick.


This one took a while for me to understand - to briefly explain the audio design for Super Auto Pets, the game is what the team would call a "Chill Auto-Battler", the player assembles a team of pets and then they battle it out. These pets have unique calls whenever they are picked and their sounds can range from anything abstract to something more threatening. The challenging part of this is designing sounds for a creature that is calm, but also unique.

A common way I realized to make a creature sound calm and less aggressive was to use transient designers like the free Kilohearts transient designer. Using a slower setting like below, it's fairly simple to take a more aggressive-sounding monster and make it a little more soothing for the listener.

Frequency Control

To briefly explain Joon, it's a task list game for kids where tasks they complete in real life allow them to progress in the game and take care of their pets (otherwise known as Doters). It's a really cool way to help keep a child focused while allowing them to have fun and play the game.

Because of the nature of the project, I have to be incredibly careful about frequency control, anything that is too harsh or too much body could cause a child to lose interest so it was crucial to maintain frequency control. There are 3 main ways that I worked through this that are all kind of the same thing but working in different ways: EQ, dynamic EQ, then multiband compression.

Using EQ to generally roll off the ultra high end and most of the low end, allowing for the sounds to not distort when being played back on mobile devices - this was one of those things I ended up leaving on the main out and I designed everything to this. Alongside this was Dynamic EQ, which was done using the plugin Soothe. Soothe is essentially a hypersensitive dynamic eq that targets resonant frequencies and this allowed me to easily target certain frequency bands and tame any harsh frequencies within the sound I was designing. Finally, I used multiband compression as a general "even it out" tool so that there weren't any general frequency ranges that poked out and everything felt very similar.

Below are some example settings that I used across the project.

I hope this inspires you or helps!

bottom of page