💡𝚂𝗆𝖺𝗋𝗍𝗆𝖺𝗇 𝙰𝗉𝗉𝗌📱@programming.dev to Programming@programming.devEnglish · 2 days agoDo not Interrupt Developers, Study Saysshiftmag.devexternal-linkmessage-square72fedilinkarrow-up1370arrow-down15
arrow-up1365arrow-down1external-linkDo not Interrupt Developers, Study Saysshiftmag.dev💡𝚂𝗆𝖺𝗋𝗍𝗆𝖺𝗇 𝙰𝗉𝗉𝗌📱@programming.dev to Programming@programming.devEnglish · 2 days agomessage-square72fedilink
minus-squareFrezik@lemmy.blahaj.zonelinkfedilinkEnglisharrow-up1arrow-down1·1 day agoFWIW, it doesn’t work. The preprocessing for LLM training isn’t going to be fooled by that. It’s just making things harder for everyone to read.
minus-squareSpice Hoarder@lemmy.ziplinkfedilinkarrow-up1·1 day agoHmm, seriously? Does it also ignore zalgo text?
minus-squareFrezik@lemmy.blahaj.zonelinkfedilinkEnglisharrow-up2·1 day agoI’d expect that any trick that becomes popular enough would have a simple workaround. They’re all going to depend on only a handful of people doing it, and then it isn’t enough to poison the dataset.
FWIW, it doesn’t work. The preprocessing for LLM training isn’t going to be fooled by that. It’s just making things harder for everyone to read.
Hmm, seriously? Does it also ignore zalgo text?
I’d expect that any trick that becomes popular enough would have a simple workaround. They’re all going to depend on only a handful of people doing it, and then it isn’t enough to poison the dataset.