• oneser@lemmy.zip
      link
      fedilink
      English
      arrow-up
      4
      ·
      5 days ago

      I think the x-axis labels are wrong. Cosine similarity is used to compare vectors in maths. 1 would mean the vectors are going in the same direction and 0 would mean they are going 90° to each other and -1 is opposite.

      • VoodooAardvark@lemmy.zip
        link
        fedilink
        English
        arrow-up
        3
        ·
        5 days ago

        While it seems you’re on to something with the x-axis, I do not believe that was the question. My interpretation, that I share, is wtf am I looking at? Haha

      • starik@lemmy.zip
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 days ago

        So the x-axis should be the same as the y-axis in this case?

        How do you calculate the cosine of a slur? Is that the joke?

        • OhNoMoreLemmy@lemmy.ml
          link
          fedilink
          English
          arrow-up
          3
          ·
          5 days ago

          No, you just use a standard technique like word2vec.

          Basically words are considered similar (and embedded to nearby locations in a high dimensional space) if they are likely to be used in the same context.

          And because slurs are used to indicate that you don’t like someone, they tend to occur in the same kind of context.

          So they’re all very similar. This is actual natural language processing being used, but it’s a shit post and the graphics aren’t very clear.

  • FishFace@piefed.social
    link
    fedilink
    English
    arrow-up
    10
    ·
    5 days ago

    The interpretation here depends on the idea of a word-vector. This is a component of language models which treat each individual word in a language as a vector in a pretty high-dimensional space (how high is up to the model author). The way this is usually described is that if you look at the word pairs “man - woman”, “boy - girl”, “king - queen” and so on, they should differ by a similar vector in word-vector-space, and that vector should correspond to the concept of “male” (or “female” depending on which way round you do it). If you have a word vector model, you should then be able to take the dot product of this gender concept-vector with a word like “actress” or “actor”, and see if it has learnt that “actress” is female and “actor” is kinda male but kinda gender neutral due to changing usage.

    So what this diagram is showing is a measure of similarity between various word vectors. Those vectors are (the vector of) a slur minus a related word. The idea is to see if subtracting “Mexican” from “spic” leaves you with an underlying concept of “slur” that corresponds to these other vectors - just like with gender and man, woman; boy, girl, etc.

    The confusion matrix is actually pretty interesting IMO. There is pretty high similarity between all of the “racial slur - race” vectors, and much less between “cunt - woman” and “fag - homosexual” and the others. So it’s showing that there isn’t that good a concept - in this word vector model at any rate - of “slur” in general, but you could argue pretty strongly that racial slur does exist in that way.

  • verdare@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 days ago

    Aren’t the x-axis labels wrong? If I’m interpreting this correctly, the x and y axes labels should be the same. That might be partly why people are getting confused.

    • andros_rex@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      5 days ago

      Imagine modeling the components of a word mathematically. Each word has a value in some number of dimensions, like maybe how negative the word is, or how much it has to do with fruit or something.

      You’d be able to calculate a set of eigenvectors to describe each dimension, basically unit vectors. You could have an eigen-name or eigen-compliment, basically just a word that other names or compliments could be expressed in units of that word.

      I think 1984’s Newspeak shows some examples of what eigen-words could be. Stuff like “doublepluscold.”

      • Engywook@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 days ago

        Ok, I’m a physicist so math is fine. I just couldn’t get the diagram. Thanks!

  • TheLeadenSea@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    5 days ago

    Is the first one ‘South Mexican’?

    Then we’ve got ‘Crumbly Chinese’

    ‘Fucking homosexuals’ (nice)

    ‘No African’ (rude)

    ‘Clever Woman’

    ‘Kill Jewish? Kippah Jewish? Kangaroo Jewish?’

    And obviously, the ‘Jolly Japanese’

      • MagicShel@lemmy.zip
        link
        fedilink
        English
        arrow-up
        3
        ·
        5 days ago

        I haven’t heard anyone use “Jap” outside my grandfather’s generation — and he fought a fucking war against them so no surprise he had some big feels there. But he’s also been dead about 25 years and I’ve never heard the word since.

        But also I don’t hang out with racists, so what do I know.