The integration is genuinely small. There's no GPU to provision, no model to load, no moderation pipeline to build — those are the parts that normally eat your week, and they're already handled. What you write is two HTTP calls.
Everything below uses your API key (xav_live_…) as a bearer token against https://api.xavira.ai. Grab one from the dashboard — new accounts get 25 free credits, no card.
Before you start: what you need
Nothing exotic. An account (25 free credits, no card), an API key from the dashboard, and the ability to make an HTTP request from your backend. There's no SDK to install, no GPU to provision, no model to download. If you can curl, you can integrate. Keep the key server-side — it's a bearer token that spends real balance, so it never belongs in client code or a mobile app. Use a separate key per environment so staging can't drain production.
One concept to hold in your head before the code: the API has exactly two nouns. A character is a persistent person you create once. A generation is an image or video you produce against that character. Everything else — models, GPUs, face caching, moderation, storage — sits underneath those two nouns and isn't something you manage.
Pick a model first
When you create a character you choose a model_id, and that choice sets the aesthetic for everything generated against it. The two you'll reach for:
realistic-sharp— photoreal output with face conditioning. This is the default for "looks like a real person" use cases, and it's what the face-cache consistency is built around.anime— illustrated/anime aesthetic. Note that anime skips face-embedding conditioning (face detection doesn't work on drawn faces), so consistency there comes from the prompt and traits rather than a cached embedding.
You don't switch models per request — you pick one per character. If you need a person in two very different styles, make two characters. This keeps each generation fast and predictable, and it's why the model lives on the character, not on the call.
How prompts work here (so short prompts are fine)
A thing that surprises people: you don't need to write a 200-token mega-prompt. By default every /v1/images:generate call runs your prompt through a server-side enhancement layer before it reaches the model, and that layer does the heavy lifting:
- Pose matching. Short labels like
"blowjob pov"or"reverse cowgirl"are matched against a curated pose library and expanded into a full prompt with camera angles, weighted tokens, and anatomy anchors. Without it, a terse prompt gives the model too little signal and you get a default portrait. - Identity anchor. The character's stored traits (age, ethnicity, hair, etc.) are formatted into natural language and prepended, which is a big part of what keeps a character consistent.
- Quality prefix & negatives. A fixed "RAW photo, DSLR…" prefix counteracts the model's stock-photo bias, and a house negative-prompt covers common artifacts.
If you'd rather send a fully-formed prompt and skip all of that, set raw_prompt: true and you're in full control. Either way, the response includes the final enhanced_prompt so there's no black box — you can see exactly what hit the model. The practical upshot for your integration: let your users type short, natural requests and lean on enhancement, rather than building your own prompt-engineering layer.
Step 1 — Create a character (once)
A character is the persistent primitive. You create it once; its identity (and a cached face embedding) is reused across every future generation, which is what keeps the same person looking like themselves.
curl -X POST https://api.xavira.ai/v1/characters \
-H "Authorization: Bearer $XAVIRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "realistic-sharp",
"name": "Mia",
"traits": {
"age": 26,
"ethnicity": "Hispanic",
"hairLength": "long",
"hairColor": "brown",
"breastSize": "medium"
}
}'
You get back a character_id. Store it — that's all you need to generate against this person forever.
Step 2 — Generate an image
Now the actual generation. Short, label-style prompts work because the API expands them server-side (pose matching, identity anchor, quality prefix) before they hit the model — so "on her knees, pov" produces what you'd expect instead of a default portrait.
curl -X POST https://api.xavira.ai/v1/images:generate \
-H "Authorization: Bearer $XAVIRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"character_id": "151fccbd-...",
"prompt": "on her knees, pov, bedroom",
"resolution": "896x1152"
}'
Image generations are fast enough to return synchronously. A 201 comes back with a permanent URL — the asset is already in object storage, not a temporary link that expires on you:
{
"generation_id": "b83bc79c-...",
"status": "completed",
"output_url": "https://pub-....r2.dev/.../image.png",
"enhanced_prompt": "RAW photo, DSLR, ... 26 year old Hispanic woman ...",
"cost_credits": 1
}
That's the feature. Render output_url in your app and you're done.
What you got for free
- Moderation. Every output runs through a classifier before it's returned. A blocked prompt or image comes back as
422 moderation_blockedand is not charged. - Permanent storage. Output is written to object storage on generation — no ephemeral host that 404s next week.
- Face consistency. The cached embedding means generation #500 of Mia still looks like Mia, without you managing any of it.
Step 3 — Video, when you want it
Video is the same shape, pointed at a prior image generation. It's slower (~80s of GPU), so it's async — you get a generation_id and either poll it or receive a webhook:
# Poll
curl https://api.xavira.ai/v1/generations/$GEN_ID \
-H "Authorization: Bearer $XAVIRA_API_KEY"
{ "status": "completed", "output_url": "https://pub-....r2.dev/.../video.mp4", "cost_credits": 5 }
If a video FAILED, the credits are refunded automatically. Webhooks carry an HMAC signature — verify it before trusting the payload (the docs have the snippet).
Handling the unhappy paths
The two calls above are the demo. The difference between a demo and a production feature is how you handle the four things that will happen: a blocked prompt, an empty balance, a rate limit, and a network blip. Every error comes back in the same shape — match on error.code, never the message:
{ "error": { "code": "moderation_blocked", "message": "..." } }
The ones worth handling explicitly:
422 moderation_blocked— the prompt or output was flagged. You are not charged. Surface a clean "that prompt isn't allowed" to your user; don't retry it unchanged.402 insufficient_credits— balance too low. The response includes your current balance and the required cost, so you can trigger a top-up flow instead of failing silently.429 rate_limited— you've hit the per-key minute cap. Honour theRetry-Afterheader (always under 60 seconds) rather than hammering.500/502— transient internal or upstream failure. Not charged, safe to retry with backoff.
Idempotency: retry without double-charging
If a request times out you don't know whether it landed. Retrying blindly risks two generations and two charges. Send an Idempotency-Key header and a retry is recognised as the same request instead of a new one:
curl -X POST https://api.xavira.ai/v1/images:generate \
-H "Authorization: Bearer $XAVIRA_API_KEY" \
-H "Idempotency-Key: 7c1f...your-uuid" \
-H "Content-Type: application/json" \
-d '{ "character_id": "151fccbd-...", "prompt": "on her knees, pov" }'
Reuse the same key only for the same logical request. Generate a fresh UUID per user action, persist it, and replay it on retry.
Back off on rate limits
Every generation response carries the headers you need to stay under the cap gracefully:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1779350400
Image generation allows 60/min on the base tier (Scale doubles it, Volume quadruples it); video is 5/min because each clip is ~80 seconds of GPU. If you're fanning out a batch, watch X-RateLimit-Remaining and pace yourself rather than catching 429s.
Polling vs webhooks for video
Images return synchronously, so there's nothing to wait on. Video is async — pick one of two patterns:
- Polling is simplest. Hit
GET /v1/generations/:idevery 5–10 seconds, budget ~3 minutes of wall-clock (cold start + ~80s gen). The poll endpoint is unlimited and lazily completes the job, so it's safe to call concurrently. Good for scripts and low volume. - Webhooks scale better. Register an endpoint and we POST a signed payload on completion, with retries and exponential backoff. Your handler should return a 2xx within 5 seconds and verify the
X-Xavira-Signature(HMAC-SHA256) before trusting the body. After repeated delivery failures the event is marked abandoned — but you can always fall back to the poll endpoint.
A robust setup uses both: webhooks as the primary path, a slow poll as a safety net for any delivery that didn't land. And remember — if a video FAILED, the credits are refunded automatically, so your accounting stays clean without special-casing.
Wrap it in your language
There's no SDK to learn — it's HTTP — but a thin wrapper keeps your app code clean. In Node:
async function generate(characterId, prompt) {
const res = await fetch("https://api.xavira.ai/v1/images:generate", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.XAVIRA_API_KEY}`,
"Content-Type": "application/json",
"Idempotency-Key": crypto.randomUUID(),
},
body: JSON.stringify({ character_id: characterId, prompt }),
});
if (res.status === 429) {
const wait = Number(res.headers.get("Retry-After") || 5);
await new Promise(r => setTimeout(r, wait * 1000));
return generate(characterId, prompt);
}
if (!res.ok) throw new Error((await res.json()).error.code);
return (await res.json()).output_url;
}
The same shape in Python is a dozen lines with requests — point it at the same endpoint, read output_url off the JSON, branch on error.code.
What each call costs
So your unit economics aren't a mystery: an image is 1 credit, a short video is 5. Credits are prepaid — you top up a balance and it drains per generation, so there's no monthly seat and no invoice surprise at the end of the month. Two consequences worth designing around:
- Blocked and failed generations aren't charged. A
422 moderation_blockedcosts nothing, and a video that comes backFAILEDis auto-refunded. You only pay for output you actually receive, so you don't need to build your own reconciliation for those cases. - Balance is a first-class state. Because service halts when the balance hits zero, treat
402 insufficient_creditsas a normal branch, not an error — wire it to a top-up prompt. For a B2B product, surfacing balance and a refill path beats silently failing a user's generation.
The prepaid model is deliberate: there's no post-paid bill to dispute and no metered invoice to reconcile. You always know your exposure, because it's whatever you've topped up and not yet spent.
Test it in the playground first
Before you write any code, it's worth spending ten minutes in the playground. Create a character, run a few prompts, see how enhancement expands them, get a feel for which short labels map to which poses, and confirm the aesthetic you want from your chosen model. Everything you do there maps one-to-one to the API calls above — the playground is just a thin UI over the same endpoints — so what you learn transfers directly. It's the fastest way to calibrate your prompts and your expectations without burning a single line of integration code, and it's covered by the same 25 free credits.
A short production checklist
- Separate API keys for staging and production — independent rate-limit counters, independent blast radius.
- An
Idempotency-Keyon every generate call, persisted per user action. - Explicit handling for
402(top-up),422(user-facing "not allowed"),429(back off),5xx(retry). - Webhook signature verification before trusting any payload — plus a poll fallback.
- Store the
character_idandgeneration_idyou get back; they're your handles for everything later. - Show users your own clean states; never surface a raw error code.
Frequently asked
Do I have to manage GPUs or models at all?
No. You pick a model_id when you create a character (realistic or anime); everything below that — GPUs, warm pools, model loading, the face-cache — is handled. You send prompts, you get URLs.
How do I keep a character consistent across hundreds of images?
That's the entire point of the character primitive. The face embedding is computed once and cached, so generation #500 of the same character still looks like them, with no extra work from you. Just keep generating against the same character_id.
What happens to my images — do they expire?
No. Output is written to permanent object storage on generation, so output_url keeps working. You can render it directly or copy it into your own bucket; it won't 404 next week.
Is moderation going to block normal adult content?
Moderation targets illegal and policy-violating content, not adult content per se — that's the product. You'll get a 422 on genuinely blocked prompts (uncharged), which you surface to the user as "not allowed."
That's the whole afternoon
Create a character, generate, render the URL. The infrastructure that normally turns this into a quarter-long project — warm GPUs, model wrangling, moderation, storage, payments built for adult — is the part you skipped.
Build it now
25 free credits, no card. Create your first character and ship the feature today.
Start Generating →Request and response shapes shown are illustrative; see the API docs for the authoritative, current spec, full parameter list and error codes.