I have been thinking a lot about digital sovereignty lately and how quickly the internet is turning into a weird blend of surreal slop and centralized control. It feels like we are losing the ability to tell what is real because of how easy it is for trillionaire tech companies to flood our feeds with whatever they want.

Specifically I am curious about what I call “kirkification” which is the way these tools make it trivial to warp a person’s digital identity into a caricature. It starts with a joke or a face swap but it ends with people losing control over how they are perceived online.

If we want to protect ourselves and our local communities from being manipulated by these black box models how do we actually do it?

I want to know if anyone here has tried moving away from the cloud toward sovereign compute. Is hosting our own communication and media solutions actually a viable way to starve these massive models of our data? Can a small town actually manage its own digital utility instead of just being a data farm for big tech?

Also how do we even explain this to normal people who are not extremely online? How can we help neighbors or the elderly recognize when they are being nudged by an algorithm or seeing a digital caricature?

It seems like we should be aiming for a world of a million millionaires rather than just a room full of trillionaires but the technical hurdles like isp throttling and protocol issues make that bridge hard to build.

Has anyone here successfully implemented local first solutions that reduced their reliance on big tech ai? I am looking for ways to foster cognitive immunity and keep our data grounded in meatspace.

  • SuspciousCarrot78@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    4 hours ago

    Ha ha! I actually finished it over the weekend. Now it’s onto the documentation…ICBF lol

    I just tried to get shit GPT to do it this morning, as it’s generally pretty ok for that. As always, it produces real “page turners”. Here is its idea of a “lay explainer”

    Mixture of Assholes: Llama-swap + “MoA router”: making small local models act reliably (without pretending they’re bigger)

    This project is a harness for local inference: llama-swap is the model traffic-cop, and the router is the conductor that decides what kind of work you want done (straight answer, self-critique loop, style rewrite, vision/OCR), when, and with what context. Vodka acts as memory layer and context re-roll.

    The goal isn’t to manufacture genius. It’s to make local models behave predictably under hardware constraints by:

    • making retrieval explicit (no “mystery memory”),
    • keeping “fancy modes” opt-in,
    • and making the seams inspectable when something goes wrong.

    The shape is simple:

    UI → Router (modes + RAG + memory plumbing) → llama-swap (model switching) → answer. ([GitHub][1])


    The “what”: one OpenAI-style endpoint that routes workflows, not just models

    At the front is an OpenAI-compatible POST /v1/chat/completions endpoint. From the client’s point of view, it’s “just chat completions” (optionally streaming). From the router’s point of view, each request can become a different workflow.

    It also accepts OpenAI-style multimodal message blocks (text + image_url), which matters for the vision/OCR paths.

    Under the hood, the router does three things:

    1. Decides the pipeline (Serious / Mentats / Fun / Vision / OCR)
    2. Builds an explicit FACTS block (RAG) if you’ve attached any KBs
    3. Calls llama-swap, which routes the request to the chosen local model backend behind an OpenAI-like interface ([GitHub][1])

    The “why”: small models fail less when you make the seams visible

    A lot of local “agent” setups fail in the same boring ways:

    • they silently change behaviour,
    • they smuggle half-remembered context,
    • they hallucinate continuity.

    This design makes those seams legible and user-controlled:

    • You pick the mode explicitly (no silent “auto-escalation”).
    • Retrieval is explicit and inspectable.
    • There’s a “peek” path that can show what the RAG facts block would look like without answering — which is unbelievably useful for debugging.

    The philosophy is basically: if the system is going to influence the answer, it should be inspectable, not mystical.


    The “what’s cool”: you’re routing workflows (Serious / Mentats / Fun / Vision)

    There are two layers of control:

    A) Session commands (>>…): change the router state

    These change how the router behaves across turns (things like sticky fun mode, which KBs are attached, and some retrieval observability):

    • >>status — show session state (sticky mode, attached KBs, last RAG query/hits)
    • >>fun / >>fun off — toggle sticky fun mode
    • >>attach <kb> / >>detach <kb|all> / >>list_kb — manage KBs per session
    • >>ingest <kb> / >>ingest_all — ingest markdown into Qdrant
    • >>peek <query> — preview the would-be facts block

    B) Per-turn selectors (##…): choose the pipeline for one message

    • ## mentats … — deep 3-pass “draft → critique → final”
    • ## fun — answer, then rewrite in a persona voice
    • ## vision … / ## ocr … — image paths

    The three main pipelines (what they actually do)

    1) Serious: the default “boring, reliable” answer

    Serious is the default when you don’t ask for anything special. It can inject a FACTS block (RAG) and it receives a constraints block (which is currently a V1 placeholder). It also enforces a confidence/source line if it’s missing.

    Docs vs implementation (minor note): the docs describe Serious as “query + blocks” oriented. The current implementation also has a compact context/transcript shaping step as part of prompt construction. Treat the code as the operational truth; the docs are describing the intended shape and may lag slightly in details as things settle.

    2) Mentats: explicit 3-pass “think → critique → final”

    This is the “make the model check itself” harness:

    1. Thinker drafts using QUERY + FACTS + constraints
    2. Critic checks for overreach / violations
    3. Thinker produces the final, carrying forward a “FACTS_USED / CONSTRAINTS_USED” discipline

    If the pipeline can’t complete cleanly (protocol errors), the router falls back to Serious.

    3) Fun: answer first, then do the performance

    Fun is deliberately a post-processing transform:

    • pass 1: generate the correct content (lower temperature)
    • pass 2: rewrite in a persona voice (higher temperature), explicitly instructed not to change the technical meaning

    This keeps “voice” from leaking into reasoning or memory. It’s: get it right first, then style it.


    RAG, but practical: Qdrant + opt-in KB (knowledge base) attach + “peek what you’re feeding me”

    KBs are opt-in per session

    Nothing is retrieved unless you attach KBs (>>attach linux, etc.). The FACTS block is built only from attached KBs and the router tracks last query/hit counts for debugging.

    Ingestion: “KB folder → chunks → vectors in Qdrant”

    Ingestion walks markdown, chunks, embeds, and inserts into Qdrant tagged by KB. It’s simple and operational: turn a folder of docs into something you can retrieve from reliably.


    The KB refinery: SUMM → DISTILL → ingest

    This is one of the more interesting ideas: treat the KB as a product, not a dump.

    • SUMM produces a human-readable summary (strict: no fabrication, no silent renaming) from base text
    • DISTILL produces dense, retrieval-shaped atoms (embedding-friendly headings/bullets, minimal noise)
    • then ingest the distilled output

    The key point: DISTILL isn’t “a nicer summary.” It’s explicitly trying to produce retrieval-friendly material.


    Vodka: deterministic memory plumbing (not “AI memory vibes”)

    Vodka does two jobs:

    1. context reduction / stability: keep the effective context small and consistent
    2. explicit notes: store/retrieve nuggets on demand (!! store, ?? recall, plus cleanup commands), TTL (facts expire unless used)

    It can also leave internal breadcrumb markers and later expand them when building a transcript/context — those IDs aren’t surfaced unless you deliberately show them.


    Roadmap reality check: what’s left for V1.1

    • Constraints/GAG: placeholder in V1 (constraints block currently empty)
    • Coder role: present in config but not wired yet