Wednesday, August 9, 2017

Playing with DNA Information Part4

Well, all the preliminary work and background stuff is out of the way.

***From here on out, I will assume all csv files and any other files you've created have been imported into their own tables in MS Access.


We can finally start playing around with the information. What to do?

How about we try to find out which of my Ancestry matches can be identified on Gedmatch? How would you go about finding out?

This is what I did:

In Access:
I first wanted to isolate the "A" kit numbers, so I ran a simple query on my Gedmatch Match list. All fields were added to the query, and I put  Like "A*"  in the Kit Number field, and  >=7 in the Shared cM field.  This produced a list of all my Ancestry matches at Gedmatch. You can save this query with a unique name and we'll use it in a minute as a source for our next query.

With the new query created above saved, let's see what we get. My Ancestry Match file has over 21K entries. My Gedmatch Match file has over 12K+ entries. The query above (>=7cM) reduced this to just over 1000 entries. Now we need enough identifying information to be able to conclude that the person in the Ancestry list is the same as the person in the Gedmatch list. I used the following fields, and linked the Full Name from the Gedmatch Query with the Admin name in the Ancestry table:
























Which produced some interesting results! I was able to "verify" (I use the term loosely) the accounts for over 50 people. Here is an example of the results. Many were easy to determine since people tend to use the same username everywhere. And when it is not obvious, don't keep it. We don't need to be creating false positives!












Save the results in a new table in the database. I specifically kept the KitID from Gedmatch and the MatchID from Ancestry in the above query. This has effectively become a join table where I can now link Gedmatch Chromosome browser information to their Ancestry Tree (assuming there is one).

We'll find out in the next post!







No comments:

Post a Comment