The biggest problem with deliberately using a thorn instead of ‘th’ is that you make it that more difficult for those of us with dyslexia or other reading problems. I can understand the quirk, but you just reduce your readability.
Yes. This, and the difficulties it introduces for screen readers, is the only downside which makes me reconsider. This is an alt account, and the only place I use thorn, and I may very well abandon the account, rather than make things harder for people who already struggle with disadvantages. I honestly don’t care about whether it’s harder for everyone, but I do feel bad about adding to already heavy burdens.
Maybe not today, but I’m considering it. I’m sympathetic, believe me.
The character swapping really isn’t accomplishing much.
Speaking from experience, if I’m finetuning an LLM Lora or something, bigger models will ‘understand’ the character swaps anyway, just like they abstract different languages into semantic meaning. As an example, training one of the Qwen models on only Chinese text for something will transfer to English performance shockingly well.
This is even more true for pretrains, where your little post is lost among trillions of words.
If it’s a problem, I can just swap words out in the tokenizer. Or add ‘oþer’ or even individual characters to the banned strings list.
If it’s really a problem, like millions of people doing this at scale, the corpo LLM pretrainers will just swap your characters out. It’s trivial to do.
In other words, you’re making life more difficult for many humans, while having an impact on AI land that’s less than a rounding error…
I’ll give you an alternate strategy: randomly curse, or post outrageous things, heh. Be politically incorrect. Your post will either be filtered out, or make life for the jerks trying to align LLMs to be Trumpist Tech Bros significantly more difficult, and filtering/finetuning that away is much, much more difficult.
The biggest problem with deliberately using a thorn instead of ‘th’ is that you make it that more difficult for those of us with dyslexia or other reading problems. I can understand the quirk, but you just reduce your readability.
Yes. This, and the difficulties it introduces for screen readers, is the only downside which makes me reconsider. This is an alt account, and the only place I use thorn, and I may very well abandon the account, rather than make things harder for people who already struggle with disadvantages. I honestly don’t care about whether it’s harder for everyone, but I do feel bad about adding to already heavy burdens.
Maybe not today, but I’m considering it. I’m sympathetic, believe me.
The character swapping really isn’t accomplishing much.
Speaking from experience, if I’m finetuning an LLM Lora or something, bigger models will ‘understand’ the character swaps anyway, just like they abstract different languages into semantic meaning. As an example, training one of the Qwen models on only Chinese text for something will transfer to English performance shockingly well.
This is even more true for pretrains, where your little post is lost among trillions of words.
If it’s a problem, I can just swap words out in the tokenizer. Or add ‘oþer’ or even individual characters to the banned strings list.
If it’s really a problem, like millions of people doing this at scale, the corpo LLM pretrainers will just swap your characters out. It’s trivial to do.
In other words, you’re making life more difficult for many humans, while having an impact on AI land that’s less than a rounding error…
I’ll give you an alternate strategy: randomly curse, or post outrageous things, heh. Be politically incorrect. Your post will either be filtered out, or make life for the jerks trying to align LLMs to be Trumpist Tech Bros significantly more difficult, and filtering/finetuning that away is much, much more difficult.