Sunday, November 12, 2017

Visualizing Ancestry Relationships using Google Fusion Tables (Lessons Learned - Part 1 of ?)

One of the really cool things about Fusion tables is that the output is not limited to charts. You can also create maps! And once a map is made, you can output what you've created into Google Earth as a KML file.  You might also be able to use the information in your family tree software if it supports maps or has a mapping feature.

You can also create a map!

Ah, but there is a hidden lurking monster waiting. You know it is there. You've recognized it long ago with the intention of dealing with it "some day". Well, at least for me anyway. What monster might that be? That monster is called "consistency" at its most basic level. Another way of framing it is "Standardization". Specifically, the area of consistency and standardization I am speaking of is location data.

For years, I had used Ancestry's Family Tree Maker software, upgrading to a new version every 5 years or so. Eventually, I didn't see the need for a desktop solution and have been using the Ancestry.com interface online. Just recently, I wanted to try out Legacy Family Tree (Standard - free version), and installed and started playing with it. 

I knew this before, but one difference that caught my eye right off the bat that I was reminded of quickly is, FamilySearch.org doesn't have the Wildwest free-for-all that Ancestry does. There are pros and cons to each model, but if customers are paying (Ancestry) let them be free and wild. However, if the service is free, then some modicum of control is necessary. 

So FamilySearch uses a similar model as WikiTree in that, before you add someone, you have to ensure you aren't creating a duplicate person. If they find a similar person in the existing database they make you compare on the spot and either confirm they are the same, or if there is not enough evidence to make that call, you can create a new person (they at least try to limit the wildwest duplication problem).

Both Ancestry and FamilySearch enforce some standardization with date formats and locations. It is only with playing with Fusion tables that the monster reared its ugly head: "Lo mighty genealogisssst," it hissed with a forked, serpent-like tongue. (my imaginary monster doesn't sound like John Goodman). "You hath reaped that which you hath sewn."

I looked at the monster puzzled. He hissed again, "Garbage in-garbage out, Dumbass."

And the light appeared over my head as the proverbial lightbulb went on. It is not enough to just use standard locations (as provided by Ancestry or FamilySearch), one must also be consistent. 

People can look at two place names and make the connection that they are the same. Machines however, are extremely literal, doing exactly what you tell them to. So while you and I may see these as the same place:

Asheville, NC USA or 
Asheville, North Carolina, USA, or 
Asheville, NC, United States, or 
Asheville, etc etc etc, 

a machine will read these all as different locations.

Please note that all of these are likely considered "Standard" but if you are not consistent in your usage, you will have a headache later (as I do). 

This all came about when I uploaded my pedigree file (with all BMD info) to a fusion table in order to create a map. Here is an example of my ancestors Birth locations:

Map from fusion table showing Birth Locations.
I was curious what a Network Graph comparing Birth Locations to Death Locations would look like (is there any convergence or divergence?), and that is when I noticed how seemingly small differences to us humans, make huge differences to computers interpreting our input quite literally:
Eastham MA is counted twice because it is different.

So why is this such a big deal? Hopefully you can see the ramifications:
1) Your own data will give you poor quality results if they are non-standard
2) Even if standard, if they are inconsistently used, you will have a more difficult time tying people together
3) People who have the same ancestor in their tree will not necessarily be connected to your ancestor
4) On AncestryDNA, your shared ancestor hints won't be as high as it could be

At some point in the future, I am hoping sites will adopt an "authoritative" or "most accurate" model and start locking down people who have been confirmed. That way, the wildwest is slowly tamed with civility. I believe WikiTree has started locking some ancestors for public edit who are more known and established. If the goal is to establish a unified single tree (like WikiTree or FamilySearch), then limiting what can be changed (at some point) makes sense. If the data is always changing, say by some novice who copied someone else's tree without doing the work themselves, then it won't ever fully develop. But I digress, this side topic could be its own post.

QC THY DATA! 
First, use standardized date and location information. 
Second, be consistent in how you apply the standard ("USA" or "United States"? Pick one and change everything in your tree to the same format, yeah, not fun).
Third, make sure every new person added to your tree is formatted to your consistent standard.








No comments:

Post a Comment