# Turn-taking & barge-in

swaram handles turns for you. It notices when the user stops speaking and replies
on its own — automatic turn-taking, the same in both modes.

## Automatic turns

Just stream the user's audio with `input_audio_buffer.append`. When they pause,
you'll see:

1. `input_audio_buffer.speech_started` when they begin,
2. `input_audio_buffer.speech_stopped` when they stop,
3. a `response.created`, then the audio and transcript deltas, then `response.done`.

You don't need to tell swaram when a turn ends — automatic turn-taking handles it
in both modes. The `input_audio_buffer.commit` and `response.create` events are
**optional** nudges for finer control in the Simple mode; Premium always uses
automatic turn detection, so it ignores them.

## Barge-in (interrupting)

If the user starts speaking while swaram is talking, swaram **stops right away**
and listens — the in-flight reply is cancelled and audio stops at once.

For the smoothest feel, have your app **stop its own playback** the moment the
user starts speaking. You'll know from `input_audio_buffer.speech_started` (or
your own client-side voice detection). You can also send `response.cancel` to
interrupt explicitly.

> **Tip.** Real-time speech feels best when both sides stop instantly. Drop any
> queued audio you haven't played yet as soon as the user cuts in.

## Send only the user's voice

swaram runs turn detection on whatever audio you send, so you don't mark turn
boundaries yourself. But it can only judge what it receives — so do a little on
your side to keep it hearing the user and nothing else:

- **Gate the microphone.** Use **push-to-talk** (capture only while a button is
  held or toggled on) or your own lightweight **voice activity detection**, so the
  model's playback and background noise aren't streamed back as if the user were
  speaking.
- **Cancel echo at capture.** Turn on the browser's echo cancellation and noise
  suppression — see [Audio](audio.html). On open speakers this is what stops the
  model interrupting itself; **headphones** sidestep it completely.
- **Stop your playback the instant the user speaks** (see Barge-in above) for the
  snappiest interruption.
