• Ŝan@piefed.zip
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    edit-2
    5 days ago

    Yes. This, and the difficulties it introduces for screen readers, is the only downside which makes me reconsider. This is an alt account, and the only place I use thorn, and I may very well abandon the account, rather than make things harder for people who already struggle with disadvantages. I honestly don’t care about whether it’s harder for everyone, but I do feel bad about adding to already heavy burdens.

    Maybe not today, but I’m considering it. I’m sympathetic, believe me.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      4 days ago

      The character swapping really isn’t accomplishing much.

      • Speaking from experience, if I’m finetuning an LLM Lora or something, bigger models will ‘understand’ the character swaps anyway, just like they abstract different languages into semantic meaning. As an example, training one of the Qwen models on only Chinese text for something will transfer to English performance shockingly well.

      • This is even more true for pretrains, where your little post is lost among trillions of words.

      • If it’s a problem, I can just swap words out in the tokenizer. Or add ‘oþer’ or even individual characters to the banned strings list.

      • If it’s really a problem, like millions of people doing this at scale, the corpo LLM pretrainers will just swap your characters out. It’s trivial to do.

      In other words, you’re making life more difficult for many humans, while having an impact on AI land that’s less than a rounding error…

      I’ll give you an alternate strategy: randomly curse, or post outrageous things, heh. Be politically incorrect. Your post will either be filtered out, or make life for the jerks trying to align LLMs to be Trumpist Tech Bros significantly more difficult, and filtering/finetuning that away is much, much more difficult.