Skip to main content

Command Palette

Search for a command to run...

Mastering Drums with the Web Audio API

No Samples Required

Updated
10 min read
Mastering Drums with the Web Audio API

Ever wondered how browser-based games and music apps make sounds without loading a single audio file?

No MP3s. No WAVs. Just JavaScript.

I learnt this while creating a small game which you can play here: https://beat-bird.vercel.app/

The Web Audio API lets you synthesize sound from scratch using oscillators, noise, filters, and envelopes. In this article, I’ll show how I built a complete drum kit for a rhythm game using nothing but code—and how you can do the same.

If you’ve never worked with audio before, the Web Audio API can feel mysterious.

Oscillators. Filters. Noise. Frequencies.

It sounds like music theory—but it isn’t.

This post is about understanding what sound actually is in code, and how a few simple building blocks let us create convincing drum and game sounds entirely in JavaScript.

No audio files. No libraries. Just numbers.


What Sound Really Is (In Code Terms)

Let’s remove the mystery first.

Sound is just air moving back and forth.
Your speakers do that by receiving a stream of numbers.

  • Big number → speaker moves out

  • Small number → speaker moves in

If you change those numbers fast enough (about 44,100 times per second), your brain hears sound.

The Web Audio API is just a system for generating and modifying those numbers.

AudioContext: A Factory for Sound

An AudioContext is not a “player”.

It’s a sound factory.

const audioContext = new AudioContext()

Inside it, you create nodes that:

  • generate numbers

  • modify numbers

  • send numbers to your speakers

These nodes are connected together into an audio graph.

Think of it like a data pipeline, but for sound.

Audio always flows left to right:

Source → processing → destination

Oscillators: Repeating Patterns

An oscillator generates a repeating pattern of numbers.

That pattern is called a waveform.

const osc = audioContext.createOscillator()
osc.type = "sine"
osc.frequency.value = 440

This does not mean “play a note”.

It means:

“Repeat this shape 440 times per second.”

Different shapes → different feelings.

An oscillator produces a continuous tone. What that tone sounds like depends on its waveform.

WaveformCharacterCommon Uses
sinePure and smoothSub bass, low-end, flutes
squareHollow, retroChiptunes, leads
sawtoothBright and aggressiveBrass, strings, effects
triangleSoft but texturedPercussion, wooden sounds

The Most Important Concept: Envelopes

An envelope is how a value changes over time.

In sound design, envelopes usually control:

  • volume

  • pitch

gain.gain.setValueAtTime(1, t)
gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15)

Drums don’t stop instantly.
They lose energy.

These two lines describe how energy fades over time.
That shape is what turns a tone into a drum.


gain.gain Is Just a Multiplier

A GainNode doesn’t know anything about music.
It simply multiplies numbers.

output = input * gain

So when you see:

gain.gain.setValueAtTime(1, t)

It means:

At time t, let the sound pass through unchanged.

  • 1 → full strength

  • 0.5 → half strength

  • 0 → silence

This isn’t decibels or anything fancy—just a number.

The Fade-Out Is the Sound - exponentialRampToValueAtTime

Our ears hear logarithmically, not linearly.

Exponential ramps feel natural. Linear ramps sound artificial.

gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15)

This line says:

Over the next 150ms, smoothly reduce the volume to almost zero.

Why not 0?

Because exponential ramps can’t reach zero—and 0.01 is already inaudible.

Conceptually, the envelope creates:

Volume
1.0 |\
    | \
    |  \
    |   \
0.0 |____\____ Time
       150ms
  • Instant attack (the hit)

  • Fast exponential decay (energy loss)

That shape is the drum.

An exponential fade sounds like energy dissipating.

That’s how:

  • drum heads stop vibrating

  • strings lose motion

  • physical objects settle

This is why almost every percussive sound uses exponential decay.


Time Works Differently in Audio

The Audio Engine Has Its Own Clock

The Web Audio API runs on a dedicated, high-precision clock that is completely independent from the browser’s event loop.

You access that clock through:

const t = audioContext.currentTime

This value:

  • is measured in seconds

  • increases continuously

  • is extremely stable

  • does not pause when JavaScript is busy

Scheduling Slightly in the Future

You’ll almost always see:

const t = audioContext.currentTime + 0.05

That extra 0.05 seconds (50ms) does two things:

  1. Gives the audio engine time to schedule everything cleanly

  2. Prevents clicks and timing jitter

You’re telling the engine:

“I’m planning ahead. Please play this precisely.”


Frequency: How Pitch Is Created

When we say:

“That person has a high-pitched voice”

We’re not describing loudness.
We’re describing how fast something is vibrating.

A high-pitched voice means:

The vocal cords are opening and closing very quickly.

That motion pushes air in and out faster, which produces higher frequencies.


Voices and Oscillators Are Doing the Same Thing

Your vocal cords work a lot like an oscillator:

  • They vibrate

  • They repeat a pattern

  • That pattern pushes air

A deep voice vibrates slowly.
A high voice vibrates quickly.

The Web Audio API just replaces vocal cords with math.

osc.frequency.value = 100

→ slow vibration → deep sound

osc.frequency.value = 1000

→ fast vibration → high sound

Your brain hears speed as pitch.

Why This Still Works Without Music Theory

You don’t need to know notes or scales.

Your brain evolved to interpret:

  • slow vibrations as “big”

  • fast vibrations as “small or sharp”

That’s why:

  • big animals sound deep

  • small animals sound high

  • tense situations feel higher-pitched

The Web Audio API just gives you direct access to that perception.

One Sentence to Remember

Pitch is how fast something vibrates.
Frequency is how we describe that speed.

Noise: Where Order Becomes Chaos

Up until now, every sound we’ve created has been predictable.

Oscillators repeat a clean pattern:

  • same shape

  • same speed

  • same result every time

But real-world sounds—especially percussion—aren’t like that.

They’re messy.

That’s where noise comes in.

const noise = ctx.createBufferSource();

This line does not create noise by itself.

It creates a player—something that can play a chunk of audio data.

The “noise” comes from what we put into it.


Noise Is Just Random Numbers

Earlier, we created a buffer like this:

data[i] = Math.random() * 2 - 1;

That means:

At every audio sample, move the speaker to a random position.

No pattern.
No repetition.
Just randomness.

When played fast enough, that randomness becomes static.


Why We Use a Buffer Instead of an Oscillator

An oscillator produces order:

  • repeatable

  • smooth

  • stable

Noise is the opposite:

  • unpredictable

  • rough

  • chaotic

You can’t generate that with an oscillator.

So instead, we:

  1. Create random values

  2. Store them in a buffer

  3. Play that buffer

noise.buffer = noiseBuffer;
noise.start(when);

What createBufferSource() Really Means

A BufferSource is best thought of as:

“Play this array of numbers as sound.”

It doesn’t generate sound.
It doesn’t modify sound.
It just replays data.


Why Noise Is Essential for Percussion

Many real-world sounds are not tonal:

  • snares

  • hi-hats

  • shakers

  • explosions

  • wind

  • crashes

These sounds don’t vibrate in a stable way.

They’re friction, collisions, and chaos.

Noise gives us that chaos.


Noise by Itself Is Too Much

Raw noise contains:

  • low rumble

  • mid clutter

  • sharp highs

That’s why noise alone rarely sounds good.

We shape it with:

  • filters (to remove unwanted frequencies)

  • envelopes (to make it short-lived)

Noise is the raw material.
Filters and envelopes turn it into something usable.


The Important Mental Model

  • Oscillators = predictable motion

  • Noise = random motion

  • Filters = selective removal

  • Envelopes = energy over time

Once you see noise as intentional randomness, it stops feeling strange—and starts feeling necessary.


Filters: Removing Information on Purpose

A filter does exactly one thing:

It removes parts of the sound.


What a Filter Actually Does

When you write:

const filter = ctx.createBiquadFilter()
filter.type = "highpass"
filter.frequency.value = 1000

You’re saying:

Remove everything below 1000Hz.
Keep only the fast, sharp movement.

A high-pass filter:

  • lets high frequencies through

  • removes low frequencies

A low-pass filter does the opposite.

No sound is added.
Only information is removed.

Filters Shape Chaos into Something Recognizable

This is the real “aha” moment.

  • Oscillators create order

  • Noise creates chaos

  • Filters carve that chaos into a shape

  • Envelopes give it life

A snare works not because it’s complex,
but because it’s controlled randomness.


Frequency + Filters = Character

Two sounds can have:

  • the same envelope

  • the same timing

But feel completely different because:

  • one has more high frequencies

  • the other has more low frequencies

Filters decide where the energy lives.


A Simple Rule of Thumb

If a sound feels:

  • muddy → remove low frequencies

  • too sharp → remove high frequencies

  • too plain → let more frequencies through


Building a Kick Drum (Pitch Is Energy)

A kick drum’s “thump” comes from a rapid drop in pitch.

A kick drum sounds deep, but it starts high.

We simulate that with a frequency envelope:

  const playKick = () => {
    const t = now(ctx);
    const osc = ctx.createOscillator();
    const gain = ctx.createGain();

    osc.connect(gain);
    gain.connect(master);

    osc.frequency.setValueAtTime(150, t);
    osc.frequency.exponentialRampToValueAtTime(50, t + 0.1);

    gain.gain.setValueAtTime(0.8, t);
    gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15);

    osc.start(t);
    osc.stop(t + 0.15);
  }

High → low in a fraction of a second.

Your brain hears that as impact.

Snare Drum = Tone + Noise

A snare isn’t just a drum head. It’s also metal wires rattling underneath.

We synthesize those two components separately.

White Noise (the “crack”)

const bufferSize = ctx.sampleRate
const noiseBuffer = ctx.createBuffer(1, bufferSize, ctx.sampleRate)
const data = noiseBuffer.getChannelData(0)

for (let i = 0; i < bufferSize; i++) {
  data[i] = Math.random() * 2 - 1
}

Then we filter and envelope it:

const playSnare = (when: number): void => {
  const noise = ctx.createBufferSource()
  const filter = ctx.createBiquadFilter()
  const gain = ctx.createGain()

  noise.buffer = noiseBuffer
  filter.type = "highpass"
  filter.frequency.value = 1000

  noise.connect(filter)
  filter.connect(gain)
  gain.connect(ctx.destination)

  gain.gain.setValueAtTime(0.2, when)
  gain.gain.exponentialRampToValueAtTime(0.01, when + 0.1)

  noise.start(when)
  noise.stop(when + 0.1)

  playTone(180, 0.08, "triangle", when)
}

The result: a tight, punchy snare with zero samples.

Error Sounds and Game Feedback

Sound design is also UX.

A descending pitch universally signals failure.

const playMiss = (when: number): void => {
  const osc = ctx.createOscillator()

  osc.type = "sawtooth"
  osc.frequency.setValueAtTime(300, when)
  osc.frequency.exponentialRampToValueAtTime(100, when + 0.15)

  osc.start(when)
  osc.stop(when + 0.15)
}

No explanation needed. Your brain just gets it.

More Sounds Using the Same Building Blocks

Hi-Hat (Closed)

const playHiHat = (when: number): void => {
  const noise = ctx.createBufferSource()
  const filter = ctx.createBiquadFilter()
  const gain = ctx.createGain()

  noise.buffer = noiseBuffer
  filter.type = "highpass"
  filter.frequency.value = 7000

  noise.connect(filter)
  filter.connect(gain)
  gain.connect(ctx.destination)

  gain.gain.setValueAtTime(0.3, when)
  gain.gain.exponentialRampToValueAtTime(0.01, when + 0.05)

  noise.start(when)
  noise.stop(when + 0.05)
}

Tom Drums

const playTom = (when: number, pitch: "high" | "mid" | "low"): void => {
  const map = { high: 200, mid: 150, low: 100 }
  const osc = ctx.createOscillator()
  const gain = ctx.createGain()

  osc.frequency.setValueAtTime(map[pitch] * 1.5, when)
  osc.frequency.exponentialRampToValueAtTime(map[pitch], when + 0.1)

  gain.gain.setValueAtTime(0.6, when)
  gain.gain.exponentialRampToValueAtTime(0.01, when + 0.3)

  osc.connect(gain)
  gain.connect(ctx.destination)

  osc.start(when)
  osc.stop(when + 0.3)
}

The iOS Safari Audio “Unlock” Trap

One thing that will bite you on mobile Safari:

Audio must be unlocked by a user gesture

Even calling resume() isn’t always enough.

const unlockAudio = (): void => {
  if (ctx.state === "suspended") {
    ctx.resume()
  }

  const buffer = ctx.createBuffer(1, 1, 22050)
  const source = ctx.createBufferSource()
  source.buffer = buffer
  source.connect(ctx.destination)
  source.start()
}

document.body.addEventListener("touchstart", unlockAudio)
document.body.addEventListener("touchend", unlockAudio)

This silent buffer trick is the difference between “works on desktop” and “works everywhere”.

Final Thoughts

By now, every piece should fit into a single mental model.

  • Oscillators create order
    A predictable, repeating motion.

  • Noise creates chaos
    Random movement with no pattern.

  • Raw chaos is unusable
    It’s too wide, too messy, too much information.

  • Filters remove what you don’t want
    Carving chaos into something recognizable.

  • Envelopes give everything life
    Energy appears, fades, and disappears.

  • Time simply tells the system when all of this happens.

With all of these you can build:

  • Full drum kits

  • UI sounds

  • Game effects

  • Musical arpeggios

No assets to load. No bandwidth wasted. Infinite variation.

If you enjoy understanding how things work instead of just importing libraries, the Web Audio API is deeply rewarding.

Sound design is just programming—your ears are the debugger.