What are the best options for local conversational voice agents?

sobchak@programming.dev · 6 days ago

What are the best options for local conversational voice agents?

SmokeyDope@lemmy.world · 6 days ago

Kobold.CPP has pretty good TTS model integration I used OuteTTS model when I played around with it but theres also API integration with commercial ones like kokoro.

However, I’m no sure if its able to stream to a TTS model as the llm is generating when I tried it just waited till after output to send to voice model you may need to do some documentation reading to see if real time streaming is possible if you go that route.

hendrik@palaver.p3x.de · 6 days ago

I got a bonus question… Is there a good end-to-end voice conversation solution? I’d like to try something which directly processes the audio and returns audio, rather than the whole pipeline with vad -> stt -> llm -> tts

lynx@sh.itjust.works · 6 days ago

There are not many models that support any-to-any, currently the best seems to be Qwen3-Omni, the audio quality is not great and it is not supported by llama.cpp: https://github.com/ggml-org/llama.cpp/issues/16186

hendrik@palaver.p3x.de · 6 days ago

Thanks! if anyone has more (good) alternatives or something like a curated list, I’d have a look at that as well… always a bit complicated to stay up to date and go through the myriad of options myself…

kata1yst@sh.itjust.works · 6 days ago

OpenWebUI has TTS and STT.

TheLeadenSea@sh.itjust.works · 6 days ago

Alpaca on Flathub seems ok