• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    4 hours ago

    You should be running hybrid inference of GLM Air with a setup like that. Qwen 8B is kinda obsolete.

    I dunno what kind of speeds you absolutely need, but I bet you could get at least 12 tokens/s.