Quickstart
Connect, configure a session, stream the user's voice, and play the Malayalam audio that streams back. Here's the whole loop.
1. Get an API key
Create an account on app.swaram.live and create a key.
It looks like swaram_… and is shown once — keep it on your server.
2. Connect and talk
Open a WebSocket, send your settings as the first message, then stream audio. Audio is 16-bit PCM, 24 kHz, mono, base64 in both directions.
import asyncio, base64, json, websockets
API_KEY = "swaram_your_key_here"
URL = "wss://api.swaram.live/v1/realtime?model=mal-realtime-simple"
async def main():
headers = {"Authorization": f"Bearer {API_KEY}"}
async with websockets.connect(URL, additional_headers=headers) as ws:
# 1) configure the session up front (before streaming audio)
await ws.send(json.dumps({
"type": "session.update",
"session": {
"instructions": "You are a friendly Malayalam assistant.",
"voice": "mal-female",
},
}))
# 2) stream the user's microphone as base64 PCM16 @ 24 kHz, in chunks
await ws.send(json.dumps({
"type": "input_audio_buffer.append",
"audio": base64.b64encode(pcm16_chunk).decode(),
}))
# 3) read events; play the audio you get back
async for raw in ws:
event = json.loads(raw)
if event["type"] == "response.output_audio.delta":
play(base64.b64decode(event["delta"])) # PCM16 @ 24 kHz
elif event["type"] == "response.output_audio_transcript.delta":
print(event["delta"], end="", flush=True)
asyncio.run(main())
import WebSocket from "ws";
const API_KEY = "swaram_your_key_here";
const URL = "wss://api.swaram.live/v1/realtime?model=mal-realtime-simple";
const ws = new WebSocket(URL, { headers: { Authorization: `Bearer ${API_KEY}` } });
ws.on("open", () => {
// 1) configure the session up front
ws.send(JSON.stringify({
type: "session.update",
session: {
instructions: "You are a friendly Malayalam assistant.",
voice: "mal-female",
},
}));
// 2) stream the user's microphone as base64 PCM16 @ 24 kHz, in chunks
ws.send(JSON.stringify({
type: "input_audio_buffer.append",
audio: pcm16Chunk.toString("base64"),
}));
});
ws.on("message", (data) => {
const event = JSON.parse(data);
if (event.type === "response.output_audio.delta") {
play(Buffer.from(event.delta, "base64")); // PCM16 @ 24 kHz
} else if (event.type === "response.output_audio_transcript.delta") {
process.stdout.write(event.delta);
}
});
// Browsers can't set an Authorization header on a WebSocket, so your backend
// mints a short-lived token (see Authentication) and the page connects with it.
const token = await fetch("/your-backend/realtime-token").then((r) => r.text());
const ws = new WebSocket(
"wss://api.swaram.live/v1/realtime?model=mal-realtime-simple",
["realtime", "openai-insecure-api-key." + token] // pass the token as a subprotocol
);
ws.onopen = () => {
ws.send(JSON.stringify({
type: "session.update",
session: {
instructions: "You are a friendly Malayalam assistant.",
voice: "mal-female",
},
}));
// capture the mic (getUserMedia with echoCancellation: true so the model
// doesn't hear itself), downsample to 24 kHz PCM16 with an AudioWorklet, and
// send chunks as: { type: "input_audio_buffer.append", audio: <base64> }
};
ws.onmessage = (e) => {
const event = JSON.parse(e.data);
if (event.type === "response.output_audio.delta") {
playPcm16(event.delta); // base64 → PCM16 @ 24 kHz
}
};
That's the whole loop: configure once, stream audio, play what comes back.
What happens on connect
The server replies with session.created, then session.updated after your
config. As the user speaks it emits input_audio_buffer.speech_started /
speech_stopped, then a response.created, a stream of
response.output_audio.delta (the audio) and
response.output_audio_transcript.delta (the words), and finally response.done.
See the Events reference for the full list.
Next steps
- Authentication — server-side keys and browser tokens.
- Sessions & context — instructions, voices, and the config contract.
- Function calling — give swaram tools to act on.
- Transcripts — read the text of what the user said and what swaram said back.