Skip to main content

Synthesizes speech audio from text using the specified provider and voice.

POST 

/studio/speech

Records an OrcaRequest, estimates token usage/cost, persists the generated audio, and returns media metadata.

Processing steps:

  1. Validate inputs (text, provider).
  2. Resolve the provider adapter from configured speech groups.
  3. Create and persist an OrcaRequest in InProgress state.
  4. Estimate prompt/completion tokens and cost via provider-specific estimators; save to request.
  5. Invoke synthesis (SynthesizeSpeechAsync), receive base64 audio.
  6. Persist audio as OrcaAssetType.Speech and return its metadata.

Always responds with HTTP 200 for handled outcomes; clients should inspect the Error field.

Request

Responses

OK