So, this one’s likely pretty niche, but I’m hoping someone here might know the answer.

So, I’ve gotten genotype data for myself from 23AndMe (don’t worry, I made them delete it before the acquisition) and AncestryDNA years ago and I’ve been looking into things like SNPs and such more recently. I write code for a living, so I can do some cool things with a little code and the raw data that I’ve gotten to check into what interesting SNPs I might have.

Something I’ve noticed recently is that for some SNPs, I’ve got alleles that aren’t listed as a possibility anywhere on the internet that I can find.

Just to take a random example, rs3746544, part of the SNAP25 gene. According to SNPedia, the available alleles are A and C with A being the major allele and C being the minor. So what is my genotype for that SNP?

[tootsweet@computer genome_raw_data]$ grep rs3746544 23andme_raw_data.txt ancestrydna_raw_data.txt
23andme_raw_data.txt:rs3746544    20      10287084        TT
ancestrydna_raw_data.txt:rs3746544   20      10287084        T       T
[tootsweet@computer genome_raw_data]$

TT? There’s zero mention of “T” being an allele that you can have for rs3746544.

rs3746544 is very much not the only example. Just a few more among many:

I’m hoping some of you folks know enough about genes to know what might be up with these examples. I’m sure it’s just simply something I don’t yet understand about genetics. Thanks in advance!

Edit: So I had a bit of a brain fart after writing this in a comment:

(Side note: oddly of the 23 “mismatch” examples I mentioned, my genotype doesn’t have a single allele in common with the documented possible alleles for the SNP. For example, I don’t have any AT’s where the documented alleles are AA, AC, and CC. My genes either match the documented alleles or have no alleles in common with the documented genotypes. Which seems even stranger.)

A’s match with T’s and C’s with G’s. I’m guessing when I get a “mismatch” like what I’m talking about, what 23andme or AncestryDNA is giving me is the complementary base pairs. So if I see a CT where the documented options are AA, AG, and GG, I should just consider my CT to be equivalent to an AG. (Because the T matches up with an A and the C matches up with a G.)

So I guess that means that sometimes the equiment that 23andme and AncestryDNA use reads the other side of the DNA strand from the one that’s documented in the literature. (This only seems to happen in about 16.5% of cases or therebouts – at least that’s what my napkin math indicates. In most cases, what 23andme and AncestryDNA report in the raw data matches and thus must be measuring/reading/reporting the “same side” of the double helix as the literature talks about.)

At least that theory seems consistent with what I’m seeing. If anybody knows better, I definitely would appreciate any further input!

That said, it does seem kindof odd that any time 23andme reads the “other side” of the DNA molecule, so does AncestryDNA and vice versa. That is, there don’t seem to be any cases where they disagree on my genotype for a given SNP. At least I haven’t seen any examples of that so far. I might have to do some searching now.

Edit 2: I’ve done a little more googling based on the first edit above and found this page. It seems 23andme always goes off of the so-called “+ strand” of the “Genome Reference Consortium Human Build 37” human reference genome. So maybe the 23 examples I’ve found so far are cases where at least some of the literature (or at least SNPedia and EUPedia, if not “the literature”) is based more off of what the “Genome Reference Consortium Human Build 37” considers the “- strand”. So maybe “the literature” (and/or SNPedia/EUPedia) uses a different reference genome? All this is still just a theory, but I definitely know more than I did a few minutes ago.

Edit 3: Some folks are suggesting that 23AndMe and AncestryDNA may just not be accurate. As in, 23AndMe and AncestryDNA may have a very high error rate when reading my genetic data. If that was the case, I wouldn’t expect the inaccuracies to “match” between the two raw data files. So, to test that hypothesis out, I wrote a script to check my 23AndMe raw data against my AncestryDNA data to see how often they disagree. The script is quite slow, but at the moment it’s checked over 35,000 SNPs that are measured by both services and found 12 that disagree for an error rate of roughly 0.0343%. From another comment, I mentioned the instances I’ve found make up about 16.5% of the ones I’ve checked. So it doesn’t seem like that accounts for a very large percentage of these. I’m still leaning pretty heavily toward it just being the “other strand” theory. Thanks again for everyone’s input!

  • CookieOfFortune@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 days ago
    1. These companies don’t try to generate the best results, only adequate non diagnostic results. They combine your sample with as many other people’s and run it through a sequencer. They tune this to optimize how many people there can get though to get the accuracy they find acceptable.

    This means you basically have more errors. Do they tell you how many copies of each sequence they sequenced? If it’s less than like… 10 then I’d that with a grain of salt.

    1. Complements might mean you’re looking at the wrong strand.

    2. Maybe it’s just not documented.

    3. Your body has tons of spontaneous mutations, so could just be one of those. Most are harmless.

  • Tollana1234567@lemmy.today
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    3 days ago

    isnt these dna test unreliable asf. also a layperson isnt going to make use of the raw data anyways. much more useful for a researcher, or a geneticist.

  • Forester@pawb.social
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 days ago

    Well that’s definitely a mutation sounds like a double strand mutation I can’t help more than that

  • j4yc33@piefed.social
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    3 days ago

    All I know is that RS232 has options for Parity and various bitstream options.

  • rowinxavier@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    If you have a variation in your genome it may not manifest in a change in your proteome or phenotype. Many variations of a single nucleotide actually don’t change the selected amino acid. For example in English we have the spellings mom and mum, one American and one other, and both are understood and mean the same thing. Whereas dog and dug mean something very different. In your specific case the change you have likely makes no difference at all. It could be a chunk that is not read, a chunk which is snipped out, and chunk which has almost no impact on expression, and so on.

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    Does “possible” mean “normal values for x% of the population?” Like our chromosomes are only XX and XY, right? Except for the people with XXY and others.

    • TootSweet@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      3 days ago

      Well, just taking rs3746544 as an example, the SNPedia page has this chart at the bottom of the right-hand column:

      An image showing frequency of different genotypes for rs3746544 across several different specific populations

      Notice there are only three colors there. Just eyeballing it, it seems most people have either the AA or AC genotype, with CC making up a pretty small minority.

      But the fact that TT doesn’t register at all on that chart and otherwise just isn’t mentioned anywhere on the page at all makes me think I must be missing something. (Which wouldn’t surprise me. It is genetics, after all. There are definitely tons of rabbit holes to fall down in that area of study.)

      Someone else mentioned in a comment that it must be a mutation. (I assume they mean a mutation that happened recently enough – few enough steps up my ancestral lineage – that sources like SNPedia have just never seen it before.) And if I only had one example, and if it was only one allele and not on both strands, I might be inclined to agree.But I have… a lot of examples of that. Just doing some napkin math, I got a rate of about 16.5% “impossible” or undocumented genotypes. There’s no way that for 16.5% of well-known/well-studied SNPs I happen to have a completely undocumented double-strand mutation, right?

      (Just to explain my methodology for coming up with that 16.5% figure, I took all the SNP’s listed on these two pages: one and two, and found all SNPs for which a) genotypes are documented on the page and b) I’ve got a genotype for that SNP in at least one of the data files I got from either 23andme or AncestryDNA (or both). That got me 139 examples. Of those, 23 of my SNP’s didn’t match any of the genotypes listed on the Eupedia pages. For a rate of 23/139=0.1654676…~=16.5% .)

      (Side note: oddly of the 23 “mismatch” examples I mentioned, my genotype doesn’t have a single allele in common with the documented possible alleles for the SNP. For example, I don’t have any AT’s where the documented alleles are AA, AC, and CC. My genes either match the documented alleles or have no alleles in common with the documented genotypes. Which seems even stranger.)

      (Another side note: if 23andme and AncestryDNA didn’t agree on the genotype, I’d be inclined to think it was an error on one of their parts, but I haven’t found any specific SNPs where they disagree with each other yet.)

      So, to get back to your main question:

      Does “possible” mean “normal values for x% of the population?”

      By “impossible”, I mean I haven’t been able to find any documentation of anyone else having the same genotype as I have for that particular SNP. And that makes me feel like I’m almost definitely not understanding something.

      I kindof doubt that the genotypes I have for these mismatches are actually exceedingly rare or completely undocumented or anything. I think probably someone knows exactly why I’d be seeing the results I’m seeing (and it probably isn’t tons of obscure mutations or anything.) So, honestly, “impossible” is probably bad wording.

      • frongt@lemmy.zip
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 days ago

        I mean there’s how many people on earth now? More than 8 billion? That’s an enormous number of possible variations.