uu kk: 2025

There is a technique used to shorten radio commercials that I have a physical reaction to. Well, to be precise, it is the result that causes my reaction not the technique.

Radio commercials typically remove silence and dead space (including breaths between phrases) from an audio track. If you pay close enough attention to most commercials you will notice that the voice actor will talk for an inordinate amount of time without breathing. This gives me the sensation of being underwater and needing to take a breath - I want to hear the person take a breath so much that my mind causes a sensation as if I am also unable to breathe.

Naturally, because of that sensation I want to understand a bit about this process. In particular, is there something in the approach that can be adjusted to dampen the unnatural experience of a human not taking a breath for an extended period.

Experimentation

The approach I settled on was to do two passes over an audio file: the first would identify the portions of the track that had gaps of 'silence'; the second pass would cut those sections from the audio. In this case silence is defined using a combination of amplitude and duration.

I experimented with using a rolling average of the amplitude but that led to very choppy results and ended up being much harder to control than simply examining the instantaneous value at each sample.

After processing, the sections were partitioned similar to the following image where anything under the blue envelop is captured in the final version of the track.

Results

Being too aggressive on what constitutes silence (i.e. setting the amplitude threshold too high) results in an almost distorted track.

Original

Distorted

By adjusting amplitude along with the duration yields a better result but the sound is still a bit choppy. In particular, words that end with an 's' sound tend to get clipped too early. In addition to the amplitude and duration I ended up adding a buffer on either side of the silent section that would still be included in the clipped audio. This struck a balance between the overall amount of silence removed and continuity of the final track.

Balanced

Removal of extended pauses is somewhat easier as there is literally no sound, but addressing breaths is different in that there is sound in the track we just do not want it to register as loud enough to be included in the final track.

This is the opposite of the situation of words ending in an 's' sound. In that case we still wanted the sound included and it was being clipped; in this case we dont want the sound. It turns out - if the breaths are quiet enough - amplitude is sufficient to filter them out. Louder breaths would certainly need an alternate approach.

Breath Example (original)

Breath Removed

I've not yet determined if any of this resolves the situation I described at the beginning of the post. I anticipate I would need much longer samples of audio to test with. In either case, the tooling I built is flexible enough to adjust for that situation and is a working prototype for answering that question.

Note

While this was a fun experiment to better understand the techniques involved, I would suggest that anyone looking to do this on a more professional level use one of the existing tools that already supports silence removal out-of-the-box. Audacity, Audition, and Studio One all have support for silence detection. There are varied features across the tools for automating tasks based on the detected silent sections (ranging from manual to options for fade in and fade out transitions).

Wednesday, June 25, 2025

Radio Silence

Experimentation

Results

Note