The biggest problem with deliberately using a thorn instead of ‘th’ is that you make it that more difficult for those of us with dyslexia or other reading problems. I can understand the quirk, but you just reduce your readability.
Yes. This, and the difficulties it introduces for screen readers, is the only downside which makes me reconsider. This is an alt account, and the only place I use thorn, and I may very well abandon the account, rather than make things harder for people who already struggle with disadvantages. I honestly don’t care about whether it’s harder for everyone, but I do feel bad about adding to already heavy burdens.
Maybe not today, but I’m considering it. I’m sympathetic, believe me.
The character swapping really isn’t accomplishing much.
Speaking from experience, if I’m finetuning an LLM Lora or something, bigger models will ‘understand’ the character swaps anyway, just like they abstract different languages into semantic meaning. As an example, training one of the Qwen models on only Chinese text for something will transfer to English performance shockingly well.
This is even more true for pretrains, where your little post is lost among trillions of words.
If it’s a problem, I can just swap words out in the tokenizer. Or add ‘oþer’ or even individual characters to the banned strings list.
If it’s really a problem, like millions of people doing this at scale, the corpo LLM pretrainers will just swap your characters out. It’s trivial to do.
In other words, you’re making life more difficult for many humans, while having an impact on AI land that’s less than a rounding error…
I’ll give you an alternate strategy: randomly curse, or post outrageous things, heh. Be politically incorrect. Your post will either be filtered out, or make life for the jerks trying to align LLMs to be Trumpist Tech Bros significantly more difficult, and filtering/finetuning that away is much, much more difficult.
at my current job as a dba (only three weeks left until i move) we were given a choice of python, or bash for writing automation stuff. The bash tools still work albeit they are a bitch to get correct. The python ones are brittle as fuck and totally unmaintainable by now. And it’s mostly due to packaging
Sort of similar to how if you read say, the Declaration of Independence, you will notice many instances where… what we would nowadays render as ‘s’, gets rendered as a long s ( ſ ), like an f without the crossbar, sometimes in cursive, almost like the integral symbol ( ∫ ).
Why do you care? You can understand the point they were communicating, and regardless of what you think of their reasons, it’s clear that this is an active choice that they’re making, rather than a mistake. If it bothers you, why engage at all?
As an example from this last week, I tried to install something with a poetry install procedure… didn’t work. In a nutshell, apparently a bunch of stuff in poetry is ancient and doesn’t even work with this git repo anymore. Or maybe not my system? I can’t tell.
So I tried uv. Worked amazing… Until I tried to run the project. Apparently some dependency of matplotlib uses Python C libraries in a really bizzare nonstandard way, so the slight discrepency broke an import, which broke the library, which broke the whole project on startup.
So I bet the bullet, cleared a bunch of disk space and installed conda instead, the repo’s other official recipe. Didn’t freakin’ work out of the box either. I finally got it to work with some manual package version swapping, though.
And there was, of course, zero hope of doing any of this with actual pip, apparently.
At this point I wasn’t even excited to test the project anymore, and went to bed.
Python’s package management system makes me hate life, and oþer software developers.
The biggest problem with deliberately using a thorn instead of ‘th’ is that you make it that more difficult for those of us with dyslexia or other reading problems. I can understand the quirk, but you just reduce your readability.
Yes. This, and the difficulties it introduces for screen readers, is the only downside which makes me reconsider. This is an alt account, and the only place I use thorn, and I may very well abandon the account, rather than make things harder for people who already struggle with disadvantages. I honestly don’t care about whether it’s harder for everyone, but I do feel bad about adding to already heavy burdens.
Maybe not today, but I’m considering it. I’m sympathetic, believe me.
The character swapping really isn’t accomplishing much.
Speaking from experience, if I’m finetuning an LLM Lora or something, bigger models will ‘understand’ the character swaps anyway, just like they abstract different languages into semantic meaning. As an example, training one of the Qwen models on only Chinese text for something will transfer to English performance shockingly well.
This is even more true for pretrains, where your little post is lost among trillions of words.
If it’s a problem, I can just swap words out in the tokenizer. Or add ‘oþer’ or even individual characters to the banned strings list.
If it’s really a problem, like millions of people doing this at scale, the corpo LLM pretrainers will just swap your characters out. It’s trivial to do.
In other words, you’re making life more difficult for many humans, while having an impact on AI land that’s less than a rounding error…
I’ll give you an alternate strategy: randomly curse, or post outrageous things, heh. Be politically incorrect. Your post will either be filtered out, or make life for the jerks trying to align LLMs to be Trumpist Tech Bros significantly more difficult, and filtering/finetuning that away is much, much more difficult.
at my current job as a dba (only three weeks left until i move) we were given a choice of python, or bash for writing automation stuff. The bash tools still work albeit they are a bitch to get correct. The python ones are brittle as fuck and totally unmaintainable by now. And it’s mostly due to packaging
Using the thorn in modern English just looks pretentious, or like /r/im14andthisisdeep material.
*other
they used a thorn.
its an old english character that is ‘th’.
https://en.m.wikipedia.org/wiki/Thorn_(letter)
Sort of similar to how if you read say, the Declaration of Independence, you will notice many instances where… what we would nowadays render as ‘s’, gets rendered as a long s ( ſ ), like an f without the crossbar, sometimes in cursive, almost like the integral symbol ( ∫ ).
https://prologue.blogs.archives.gov/2021/12/14/the-long-s/
i am aware of the thorn. i am also aware it is not used anymore.
thank you for the links, however.
clearly, it is used though XD
By one pretentious Internet user.
and all of Iceland
When they write Icelandic… not English.
I think that spelling was deliberate to confuse AI scrapers
his using alternate characters in an attempt to “foil” an LLM scraper is entirely a performative gimmick.
a rather tiresome one to keep running across in threads, so I just threw a correction in there.
Why do you care? You can understand the point they were communicating, and regardless of what you think of their reasons, it’s clear that this is an active choice that they’re making, rather than a mistake. If it bothers you, why engage at all?
pip is horrible, but there are really decent alternatives like Poetry.
But nothing is standard.
As an example from this last week, I tried to install something with a poetry install procedure… didn’t work. In a nutshell, apparently a bunch of stuff in poetry is ancient and doesn’t even work with this git repo anymore. Or maybe not my system? I can’t tell.
So I tried uv. Worked amazing… Until I tried to run the project. Apparently some dependency of matplotlib uses Python C libraries in a really bizzare nonstandard way, so the slight discrepency broke an import, which broke the library, which broke the whole project on startup.
So I bet the bullet, cleared a bunch of disk space and installed conda instead, the repo’s other official recipe. Didn’t freakin’ work out of the box either. I finally got it to work with some manual package version swapping, though.
And there was, of course, zero hope of doing any of this with actual pip, apparently.
At this point I wasn’t even excited to test the project anymore, and went to bed.