Understand Sound With The Web Audio API

Ever wondered how browser-based games and music apps make sounds without loading a single audio file?

No MP3s. No WAVs. Just JavaScript.

I learnt this while creating a small game which you can play here: https://beat-bird.vercel.app/

The Web Audio API lets you synthesize sound from scratch using oscillators, noise, filters, and envelopes. In this article, I’ll show how I built a complete drum kit for a rhythm game using nothing but code—and how you can do the same.

https://codesandbox.io/embed/c4pvq5?view=preview

If you’ve never worked with audio before, the Web Audio API can feel mysterious.

Oscillators. Filters. Noise. Frequencies.

It sounds like music theory—but it isn’t.

This post is about understanding what sound actually is in code, and how a few simple building blocks let us create convincing drum and game sounds entirely in JavaScript.

No audio files. No libraries. Just numbers.

What Sound Really Is (In Code Terms)

Let’s remove the mystery first.

Sound is just air moving back and forth.
Your speakers do that by receiving a stream of numbers.

Big number → speaker moves out
Small number → speaker moves in

If you change those numbers fast enough (about 44,100 times per second), your brain hears sound.

The Web Audio API is just a system for generating and modifying those numbers.

AudioContext: A Factory for Sound

An AudioContext is not a “player”.

It’s a sound factory.

const audioContext = new AudioContext()

Inside it, you create nodes that:

generate numbers
modify numbers
send numbers to your speakers

These nodes are connected together into an audio graph.

Think of it like a data pipeline, but for sound.

Audio always flows left to right:

Source → processing → destination

Oscillators: Repeating Patterns

An oscillator generates a repeating pattern of numbers.

That pattern is called a waveform.

const osc = audioContext.createOscillator()
osc.type = "sine"
osc.frequency.value = 440

This does not mean “play a note”.

It means:

“Repeat this shape 440 times per second.”

Different shapes → different feelings.

An oscillator produces a continuous tone. What that tone sounds like depends on its waveform.

Waveform	Character	Common Uses
sine	Pure and smooth	Sub bass, low-end, flutes
square	Hollow, retro	Chiptunes, leads
sawtooth	Bright and aggressive	Brass, strings, effects
triangle	Soft but textured	Percussion, wooden sounds

The Most Important Concept: Envelopes

An envelope is how a value changes over time.

In sound design, envelopes usually control:

volume
pitch

gain.gain.setValueAtTime(1, t)
gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15)

Drums don’t stop instantly.
They lose energy.

These two lines describe how energy fades over time.
That shape is what turns a tone into a drum.

`gain.gain` Is Just a Multiplier

A GainNode doesn’t know anything about music.
It simply multiplies numbers.

output = input * gain

So when you see:

gain.gain.setValueAtTime(1, t)

It means:

At time t, let the sound pass through unchanged.

1 → full strength
0.5 → half strength
0 → silence

This isn’t decibels or anything fancy—just a number.

The Fade-Out Is the Sound - `exponentialRampToValueAtTime`

Our ears hear logarithmically, not linearly.

Exponential ramps feel natural. Linear ramps sound artificial.

gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15)

This line says:

Over the next 150ms, smoothly reduce the volume to almost zero.

Why not 0?

Because exponential ramps can’t reach zero—and 0.01 is already inaudible.

Conceptually, the envelope creates:

Volume
1.0 |\
    | \
    |  \
    |   \
0.0 |____\____ Time
       150ms

Instant attack (the hit)
Fast exponential decay (energy loss)

That shape is the drum.

An exponential fade sounds like energy dissipating.

That’s how:

drum heads stop vibrating
strings lose motion
physical objects settle

This is why almost every percussive sound uses exponential decay.

Time Works Differently in Audio

The Audio Engine Has Its Own Clock

The Web Audio API runs on a dedicated, high-precision clock that is completely independent from the browser’s event loop.

You access that clock through:

const t = audioContext.currentTime

This value:

is measured in seconds
increases continuously
is extremely stable
does not pause when JavaScript is busy

Scheduling Slightly in the Future

You’ll almost always see:

const t = audioContext.currentTime + 0.05

That extra 0.05 seconds (50ms) does two things:

Gives the audio engine time to schedule everything cleanly
Prevents clicks and timing jitter

You’re telling the engine:

“I’m planning ahead. Please play this precisely.”

Frequency: How Pitch Is Created

When we say:

“That person has a high-pitched voice”

We’re not describing loudness.
We’re describing how fast something is vibrating.

A high-pitched voice means:

The vocal cords are opening and closing very quickly.

That motion pushes air in and out faster, which produces higher frequencies.

Voices and Oscillators Are Doing the Same Thing

Your vocal cords work a lot like an oscillator:

They vibrate
They repeat a pattern
That pattern pushes air

A deep voice vibrates slowly.
A high voice vibrates quickly.

The Web Audio API just replaces vocal cords with math.

osc.frequency.value = 100

→ slow vibration → deep sound

osc.frequency.value = 1000

→ fast vibration → high sound

Your brain hears speed as pitch.

Why This Still Works Without Music Theory

You don’t need to know notes or scales.

Your brain evolved to interpret:

slow vibrations as “big”
fast vibrations as “small or sharp”

That’s why:

big animals sound deep
small animals sound high
tense situations feel higher-pitched

The Web Audio API just gives you direct access to that perception.

One Sentence to Remember

Pitch is how fast something vibrates.
Frequency is how we describe that speed.

Noise: Where Order Becomes Chaos

Up until now, every sound we’ve created has been predictable.

Oscillators repeat a clean pattern:

same shape
same speed
same result every time

But real-world sounds—especially percussion—aren’t like that.

They’re messy.

That’s where noise comes in.

const noise = ctx.createBufferSource();

This line does not create noise by itself.

It creates a player—something that can play a chunk of audio data.

The “noise” comes from what we put into it.

Noise Is Just Random Numbers

Earlier, we created a buffer like this:

data[i] = Math.random() * 2 - 1;

That means:

At every audio sample, move the speaker to a random position.

No pattern.
No repetition.
Just randomness.

When played fast enough, that randomness becomes static.

Why We Use a Buffer Instead of an Oscillator

An oscillator produces order:

repeatable
smooth
stable

Noise is the opposite:

unpredictable
rough
chaotic

You can’t generate that with an oscillator.

So instead, we:

Create random values
Store them in a buffer
Play that buffer

noise.buffer = noiseBuffer;
noise.start(when);

What `createBufferSource()` Really Means

A BufferSource is best thought of as:

“Play this array of numbers as sound.”

It doesn’t generate sound.
It doesn’t modify sound.
It just replays data.

Why Noise Is Essential for Percussion

Many real-world sounds are not tonal:

snares
hi-hats
shakers
explosions
wind
crashes

These sounds don’t vibrate in a stable way.

They’re friction, collisions, and chaos.

Noise gives us that chaos.

Noise by Itself Is Too Much

Raw noise contains:

low rumble
mid clutter
sharp highs

That’s why noise alone rarely sounds good.

We shape it with:

filters (to remove unwanted frequencies)
envelopes (to make it short-lived)

Noise is the raw material.
Filters and envelopes turn it into something usable.

The Important Mental Model

Oscillators = predictable motion
Noise = random motion
Filters = selective removal
Envelopes = energy over time

Once you see noise as intentional randomness, it stops feeling strange—and starts feeling necessary.

Filters: Removing Information on Purpose

A filter does exactly one thing:

It removes parts of the sound.

What a Filter Actually Does

When you write:

const filter = ctx.createBiquadFilter()
filter.type = "highpass"
filter.frequency.value = 1000

You’re saying:

Remove everything below 1000Hz.
Keep only the fast, sharp movement.

A high-pass filter:

lets high frequencies through
removes low frequencies

A low-pass filter does the opposite.

No sound is added.
Only information is removed.

Filters Shape Chaos into Something Recognizable

This is the real “aha” moment.

Oscillators create order
Noise creates chaos
Filters carve that chaos into a shape
Envelopes give it life

A snare works not because it’s complex,
but because it’s controlled randomness.

Frequency + Filters = Character

Two sounds can have:

the same envelope
the same timing

But feel completely different because:

one has more high frequencies
the other has more low frequencies

Filters decide where the energy lives.

A Simple Rule of Thumb

If a sound feels:

muddy → remove low frequencies
too sharp → remove high frequencies
too plain → let more frequencies through

Building a Kick Drum (Pitch Is Energy)

A kick drum’s “thump” comes from a rapid drop in pitch.

A kick drum sounds deep, but it starts high.

We simulate that with a frequency envelope:

  const playKick = () => {
    const t = now(ctx);
    const osc = ctx.createOscillator();
    const gain = ctx.createGain();

    osc.connect(gain);
    gain.connect(master);

    osc.frequency.setValueAtTime(150, t);
    osc.frequency.exponentialRampToValueAtTime(50, t + 0.1);

    gain.gain.setValueAtTime(0.8, t);
    gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15);

    osc.start(t);
    osc.stop(t + 0.15);
  }

High → low in a fraction of a second.

Your brain hears that as impact.

Snare Drum = Tone + Noise

A snare isn’t just a drum head. It’s also metal wires rattling underneath.

We synthesize those two components separately.

White Noise (the “crack”)

const bufferSize = ctx.sampleRate
const noiseBuffer = ctx.createBuffer(1, bufferSize, ctx.sampleRate)
const data = noiseBuffer.getChannelData(0)

for (let i = 0; i < bufferSize; i++) {
  data[i] = Math.random() * 2 - 1
}

Then we filter and envelope it:

const playSnare = (when: number): void => {
  const noise = ctx.createBufferSource()
  const filter = ctx.createBiquadFilter()
  const gain = ctx.createGain()

  noise.buffer = noiseBuffer
  filter.type = "highpass"
  filter.frequency.value = 1000

  noise.connect(filter)
  filter.connect(gain)
  gain.connect(ctx.destination)

  gain.gain.setValueAtTime(0.2, when)
  gain.gain.exponentialRampToValueAtTime(0.01, when + 0.1)

  noise.start(when)
  noise.stop(when + 0.1)

  playTone(180, 0.08, "triangle", when)
}

The result: a tight, punchy snare with zero samples.

Error Sounds and Game Feedback

Sound design is also UX.

A descending pitch universally signals failure.

const playMiss = (when: number): void => {
  const osc = ctx.createOscillator()

  osc.type = "sawtooth"
  osc.frequency.setValueAtTime(300, when)
  osc.frequency.exponentialRampToValueAtTime(100, when + 0.15)

  osc.start(when)
  osc.stop(when + 0.15)
}

No explanation needed. Your brain just gets it.

More Sounds Using the Same Building Blocks

Hi-Hat (Closed)

const playHiHat = (when: number): void => {
  const noise = ctx.createBufferSource()
  const filter = ctx.createBiquadFilter()
  const gain = ctx.createGain()

  noise.buffer = noiseBuffer
  filter.type = "highpass"
  filter.frequency.value = 7000

  noise.connect(filter)
  filter.connect(gain)
  gain.connect(ctx.destination)

  gain.gain.setValueAtTime(0.3, when)
  gain.gain.exponentialRampToValueAtTime(0.01, when + 0.05)

  noise.start(when)
  noise.stop(when + 0.05)
}

Tom Drums

const playTom = (when: number, pitch: "high" | "mid" | "low"): void => {
  const map = { high: 200, mid: 150, low: 100 }
  const osc = ctx.createOscillator()
  const gain = ctx.createGain()

  osc.frequency.setValueAtTime(map[pitch] * 1.5, when)
  osc.frequency.exponentialRampToValueAtTime(map[pitch], when + 0.1)

  gain.gain.setValueAtTime(0.6, when)
  gain.gain.exponentialRampToValueAtTime(0.01, when + 0.3)

  osc.connect(gain)
  gain.connect(ctx.destination)

  osc.start(when)
  osc.stop(when + 0.3)
}

The iOS Safari Audio “Unlock” Trap

One thing that will bite you on mobile Safari:

Audio must be unlocked by a user gesture

Even calling resume() isn’t always enough.

const unlockAudio = (): void => {
  if (ctx.state === "suspended") {
    ctx.resume()
  }

  const buffer = ctx.createBuffer(1, 1, 22050)
  const source = ctx.createBufferSource()
  source.buffer = buffer
  source.connect(ctx.destination)
  source.start()
}

document.body.addEventListener("touchstart", unlockAudio)
document.body.addEventListener("touchend", unlockAudio)

This silent buffer trick is the difference between “works on desktop” and “works everywhere”.

Final Thoughts

By now, every piece should fit into a single mental model.

Oscillators create order
A predictable, repeating motion.
Noise creates chaos
Random movement with no pattern.
Raw chaos is unusable
It’s too wide, too messy, too much information.
Filters remove what you don’t want
Carving chaos into something recognizable.
Envelopes give everything life
Energy appears, fades, and disappears.
Time simply tells the system when all of this happens.

With all of these you can build:

Full drum kits
UI sounds
Game effects
Musical arpeggios

No assets to load. No bandwidth wasted. Infinite variation.

If you enjoy understanding how things work instead of just importing libraries, the Web Audio API is deeply rewarding.

Sound design is just programming—your ears are the debugger.

Command Palette