Mastering Drums with the Web Audio API
No Samples Required

Ever wondered how browser-based games and music apps make sounds without loading a single audio file?
No MP3s. No WAVs. Just JavaScript.
I learnt this while creating a small game which you can play here: https://beat-bird.vercel.app/
The Web Audio API lets you synthesize sound from scratch using oscillators, noise, filters, and envelopes. In this article, I’ll show how I built a complete drum kit for a rhythm game using nothing but code—and how you can do the same.
If you’ve never worked with audio before, the Web Audio API can feel mysterious.
Oscillators. Filters. Noise. Frequencies.
It sounds like music theory—but it isn’t.
This post is about understanding what sound actually is in code, and how a few simple building blocks let us create convincing drum and game sounds entirely in JavaScript.
No audio files. No libraries. Just numbers.
What Sound Really Is (In Code Terms)
Let’s remove the mystery first.
Sound is just air moving back and forth.
Your speakers do that by receiving a stream of numbers.
Big number → speaker moves out
Small number → speaker moves in
If you change those numbers fast enough (about 44,100 times per second), your brain hears sound.
The Web Audio API is just a system for generating and modifying those numbers.
AudioContext: A Factory for Sound
An AudioContext is not a “player”.
It’s a sound factory.
const audioContext = new AudioContext()
Inside it, you create nodes that:
generate numbers
modify numbers
send numbers to your speakers
These nodes are connected together into an audio graph.
Think of it like a data pipeline, but for sound.
Audio always flows left to right:
Source → processing → destination
Oscillators: Repeating Patterns
An oscillator generates a repeating pattern of numbers.
That pattern is called a waveform.
const osc = audioContext.createOscillator()
osc.type = "sine"
osc.frequency.value = 440
This does not mean “play a note”.
It means:
“Repeat this shape 440 times per second.”
Different shapes → different feelings.
An oscillator produces a continuous tone. What that tone sounds like depends on its waveform.
| Waveform | Character | Common Uses |
| sine | Pure and smooth | Sub bass, low-end, flutes |
| square | Hollow, retro | Chiptunes, leads |
| sawtooth | Bright and aggressive | Brass, strings, effects |
| triangle | Soft but textured | Percussion, wooden sounds |

The Most Important Concept: Envelopes
An envelope is how a value changes over time.
In sound design, envelopes usually control:
volume
pitch
gain.gain.setValueAtTime(1, t)
gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15)
Drums don’t stop instantly.
They lose energy.
These two lines describe how energy fades over time.
That shape is what turns a tone into a drum.
gain.gain Is Just a Multiplier
A GainNode doesn’t know anything about music.
It simply multiplies numbers.
output = input * gain
So when you see:
gain.gain.setValueAtTime(1, t)
It means:
At time
t, let the sound pass through unchanged.
1→ full strength0.5→ half strength0→ silence
This isn’t decibels or anything fancy—just a number.
The Fade-Out Is the Sound - exponentialRampToValueAtTime
Our ears hear logarithmically, not linearly.
Exponential ramps feel natural. Linear ramps sound artificial.
gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15)
This line says:
Over the next 150ms, smoothly reduce the volume to almost zero.
Why not 0?
Because exponential ramps can’t reach zero—and 0.01 is already inaudible.
Conceptually, the envelope creates:
Volume
1.0 |\
| \
| \
| \
0.0 |____\____ Time
150ms
Instant attack (the hit)
Fast exponential decay (energy loss)
That shape is the drum.
An exponential fade sounds like energy dissipating.
That’s how:
drum heads stop vibrating
strings lose motion
physical objects settle
This is why almost every percussive sound uses exponential decay.
Time Works Differently in Audio
The Audio Engine Has Its Own Clock
The Web Audio API runs on a dedicated, high-precision clock that is completely independent from the browser’s event loop.
You access that clock through:
const t = audioContext.currentTime
This value:
is measured in seconds
increases continuously
is extremely stable
does not pause when JavaScript is busy
Scheduling Slightly in the Future
You’ll almost always see:
const t = audioContext.currentTime + 0.05
That extra 0.05 seconds (50ms) does two things:
Gives the audio engine time to schedule everything cleanly
Prevents clicks and timing jitter
You’re telling the engine:
“I’m planning ahead. Please play this precisely.”
Frequency: How Pitch Is Created
When we say:
“That person has a high-pitched voice”
We’re not describing loudness.
We’re describing how fast something is vibrating.
A high-pitched voice means:
The vocal cords are opening and closing very quickly.
That motion pushes air in and out faster, which produces higher frequencies.
Voices and Oscillators Are Doing the Same Thing
Your vocal cords work a lot like an oscillator:
They vibrate
They repeat a pattern
That pattern pushes air
A deep voice vibrates slowly.
A high voice vibrates quickly.
The Web Audio API just replaces vocal cords with math.
osc.frequency.value = 100
→ slow vibration → deep sound
osc.frequency.value = 1000
→ fast vibration → high sound
Your brain hears speed as pitch.
Why This Still Works Without Music Theory
You don’t need to know notes or scales.
Your brain evolved to interpret:
slow vibrations as “big”
fast vibrations as “small or sharp”
That’s why:
big animals sound deep
small animals sound high
tense situations feel higher-pitched
The Web Audio API just gives you direct access to that perception.
One Sentence to Remember
Pitch is how fast something vibrates.
Frequency is how we describe that speed.
Noise: Where Order Becomes Chaos
Up until now, every sound we’ve created has been predictable.
Oscillators repeat a clean pattern:
same shape
same speed
same result every time
But real-world sounds—especially percussion—aren’t like that.
They’re messy.
That’s where noise comes in.
const noise = ctx.createBufferSource();
This line does not create noise by itself.
It creates a player—something that can play a chunk of audio data.
The “noise” comes from what we put into it.
Noise Is Just Random Numbers
Earlier, we created a buffer like this:
data[i] = Math.random() * 2 - 1;
That means:
At every audio sample, move the speaker to a random position.
No pattern.
No repetition.
Just randomness.
When played fast enough, that randomness becomes static.
Why We Use a Buffer Instead of an Oscillator
An oscillator produces order:
repeatable
smooth
stable
Noise is the opposite:
unpredictable
rough
chaotic
You can’t generate that with an oscillator.
So instead, we:
Create random values
Store them in a buffer
Play that buffer
noise.buffer = noiseBuffer;
noise.start(when);
What createBufferSource() Really Means
A BufferSource is best thought of as:
“Play this array of numbers as sound.”
It doesn’t generate sound.
It doesn’t modify sound.
It just replays data.
Why Noise Is Essential for Percussion
Many real-world sounds are not tonal:
snares
hi-hats
shakers
explosions
wind
crashes
These sounds don’t vibrate in a stable way.
They’re friction, collisions, and chaos.
Noise gives us that chaos.
Noise by Itself Is Too Much
Raw noise contains:
low rumble
mid clutter
sharp highs
That’s why noise alone rarely sounds good.
We shape it with:
filters (to remove unwanted frequencies)
envelopes (to make it short-lived)
Noise is the raw material.
Filters and envelopes turn it into something usable.
The Important Mental Model
Oscillators = predictable motion
Noise = random motion
Filters = selective removal
Envelopes = energy over time
Once you see noise as intentional randomness, it stops feeling strange—and starts feeling necessary.
Filters: Removing Information on Purpose
A filter does exactly one thing:
It removes parts of the sound.
What a Filter Actually Does
When you write:
const filter = ctx.createBiquadFilter()
filter.type = "highpass"
filter.frequency.value = 1000
You’re saying:
Remove everything below 1000Hz.
Keep only the fast, sharp movement.
A high-pass filter:
lets high frequencies through
removes low frequencies
A low-pass filter does the opposite.
No sound is added.
Only information is removed.
Filters Shape Chaos into Something Recognizable
This is the real “aha” moment.
Oscillators create order
Noise creates chaos
Filters carve that chaos into a shape
Envelopes give it life
A snare works not because it’s complex,
but because it’s controlled randomness.
Frequency + Filters = Character
Two sounds can have:
the same envelope
the same timing
But feel completely different because:
one has more high frequencies
the other has more low frequencies
Filters decide where the energy lives.
A Simple Rule of Thumb
If a sound feels:
muddy → remove low frequencies
too sharp → remove high frequencies
too plain → let more frequencies through
Building a Kick Drum (Pitch Is Energy)
A kick drum’s “thump” comes from a rapid drop in pitch.
A kick drum sounds deep, but it starts high.
We simulate that with a frequency envelope:
const playKick = () => {
const t = now(ctx);
const osc = ctx.createOscillator();
const gain = ctx.createGain();
osc.connect(gain);
gain.connect(master);
osc.frequency.setValueAtTime(150, t);
osc.frequency.exponentialRampToValueAtTime(50, t + 0.1);
gain.gain.setValueAtTime(0.8, t);
gain.gain.exponentialRampToValueAtTime(0.01, t + 0.15);
osc.start(t);
osc.stop(t + 0.15);
}
High → low in a fraction of a second.
Your brain hears that as impact.
Snare Drum = Tone + Noise
A snare isn’t just a drum head. It’s also metal wires rattling underneath.
We synthesize those two components separately.
White Noise (the “crack”)
const bufferSize = ctx.sampleRate
const noiseBuffer = ctx.createBuffer(1, bufferSize, ctx.sampleRate)
const data = noiseBuffer.getChannelData(0)
for (let i = 0; i < bufferSize; i++) {
data[i] = Math.random() * 2 - 1
}
Then we filter and envelope it:
const playSnare = (when: number): void => {
const noise = ctx.createBufferSource()
const filter = ctx.createBiquadFilter()
const gain = ctx.createGain()
noise.buffer = noiseBuffer
filter.type = "highpass"
filter.frequency.value = 1000
noise.connect(filter)
filter.connect(gain)
gain.connect(ctx.destination)
gain.gain.setValueAtTime(0.2, when)
gain.gain.exponentialRampToValueAtTime(0.01, when + 0.1)
noise.start(when)
noise.stop(when + 0.1)
playTone(180, 0.08, "triangle", when)
}
The result: a tight, punchy snare with zero samples.
Error Sounds and Game Feedback
Sound design is also UX.
A descending pitch universally signals failure.
const playMiss = (when: number): void => {
const osc = ctx.createOscillator()
osc.type = "sawtooth"
osc.frequency.setValueAtTime(300, when)
osc.frequency.exponentialRampToValueAtTime(100, when + 0.15)
osc.start(when)
osc.stop(when + 0.15)
}
No explanation needed. Your brain just gets it.
More Sounds Using the Same Building Blocks
Hi-Hat (Closed)
const playHiHat = (when: number): void => {
const noise = ctx.createBufferSource()
const filter = ctx.createBiquadFilter()
const gain = ctx.createGain()
noise.buffer = noiseBuffer
filter.type = "highpass"
filter.frequency.value = 7000
noise.connect(filter)
filter.connect(gain)
gain.connect(ctx.destination)
gain.gain.setValueAtTime(0.3, when)
gain.gain.exponentialRampToValueAtTime(0.01, when + 0.05)
noise.start(when)
noise.stop(when + 0.05)
}
Tom Drums
const playTom = (when: number, pitch: "high" | "mid" | "low"): void => {
const map = { high: 200, mid: 150, low: 100 }
const osc = ctx.createOscillator()
const gain = ctx.createGain()
osc.frequency.setValueAtTime(map[pitch] * 1.5, when)
osc.frequency.exponentialRampToValueAtTime(map[pitch], when + 0.1)
gain.gain.setValueAtTime(0.6, when)
gain.gain.exponentialRampToValueAtTime(0.01, when + 0.3)
osc.connect(gain)
gain.connect(ctx.destination)
osc.start(when)
osc.stop(when + 0.3)
}
The iOS Safari Audio “Unlock” Trap
One thing that will bite you on mobile Safari:
Audio must be unlocked by a user gesture
Even calling resume() isn’t always enough.
const unlockAudio = (): void => {
if (ctx.state === "suspended") {
ctx.resume()
}
const buffer = ctx.createBuffer(1, 1, 22050)
const source = ctx.createBufferSource()
source.buffer = buffer
source.connect(ctx.destination)
source.start()
}
document.body.addEventListener("touchstart", unlockAudio)
document.body.addEventListener("touchend", unlockAudio)
This silent buffer trick is the difference between “works on desktop” and “works everywhere”.
Final Thoughts
By now, every piece should fit into a single mental model.
Oscillators create order
A predictable, repeating motion.Noise creates chaos
Random movement with no pattern.Raw chaos is unusable
It’s too wide, too messy, too much information.Filters remove what you don’t want
Carving chaos into something recognizable.Envelopes give everything life
Energy appears, fades, and disappears.Time simply tells the system when all of this happens.
With all of these you can build:
Full drum kits
UI sounds
Game effects
Musical arpeggios
No assets to load. No bandwidth wasted. Infinite variation.
If you enjoy understanding how things work instead of just importing libraries, the Web Audio API is deeply rewarding.
Sound design is just programming—your ears are the debugger.



