Logo

GEDCOM Project

In a prior blog post I discussed how I document my family tree using GEDCOM, an old but widely adopted standard. Today I want to discuss how I use this in another side project of mine: the goal of turning my family tree into a PDF and/or an e-book, complete with pictures, links, and roughly one-two pages per person listing who that individual is related to.

Parsing the GEDCOM Using Python

One of the first steps in the process is to parse the GEDCOM file and make sense of it. For that I use a Python library called python-gedcom. But this only gets me half way to my goal. The next step is to (quickly) discover how everyone in the family tree is related to everyone else. That is, who are an individual's first cousins on their mother's side, 2nd cousins, 3rd cousins, 3rd cousins once removed, etc.

For this task I turned to another programming language called Prolog, an old logic programming language that I was introduced to in college.

Introducing Prolog

I haven't used Prolog at all profesionally but I think it's perfect for this use case.

Prolog Facts

Prolog lets you define a series of facts like the following:

male(sergio).
male(steven).
female(carla)
parent(sergio, steven).
parent(sergio, carla).

The first three statements say that sergio and steven are male, and carla is a female. The next two say that sergio is the parent of steven and carla.

Already with just some set of facts we can turn to Prolog to ask if sergio is a male:

male(sergio)?

And Prolog will respond with Yes. We can also ask Prolog who is a male:

male(X)?

And Prolog will respond first with sergio, then with steven.

Prolog Rules

Lets get Prolog to understand what a sibling is. The following statement is read as "X & Y are siblings if there is a parent P of X and P is also a parent of Y". We then also ask who is a sibling of carla, and we get steven (and carla).

sibling(X, Y) :- parent(P, X), parent(P, Y).
sibling(carla, X)?

So How Does Prolog Help Us With Our Goal?

It turns out that transforming a GEDCOM file into a set of Prolog facts is trivial. After that, it's a matter of fun to create rules such as sibling, 1stcousin, 2ndcousin, 1stcousin_onceremoved, and so on. In fact (see what I did there?) here's a set of relations that I used in the project:

Let's start with some basics. All X are themselves. No suprises there.

self(X, X).

Two individuals are married to each other if they are a member of the same FAMS (spousal family). We add in a requirement that X can't be the same as Y.

married_to(X, Y) :- fams(X, F), fams(Y, F), X \= Y.

A parent P of a child C can be determined if P is a spousal member of a family and C is a chid member of the same family. A mother is a female parent.

parent(P, C) :- fams(P, F), famc(C, F).
mother(M, C) :- parent(M, C), female(M).
father(F, C) :- parent(F, C), male(F).

sibling and grandparent are straightforward:

sibling(X, Y, P) :- parent(P, X), parent(P, Y), X \= Y.
sibling(X, Y) :- sibling(X, Y, _).
sibling_motherside(X, Y) :- mother(M, X), mother(M, Y), X \= Y.
sibling_fatherside(X, Y) :- father(F, X), father(F, Y), X \= Y.
grandparent(G, C) :- parent(G, P), parent(P, C).
grandparent_motherside(G, C) :- parent(G, M), mother(M, C).
grandparent_fatherside(G, C) :- parent(G, F), father(F, C).

N is the niece/nephew of an aunt/uncle A if we can find a parent P of N that is the sibling of A.

nnau(N, A) :- parent(P, N), sibling(P, A).
nnau_motherside(N, A) :- mother(M, N), sibling(M, A).
nnau_fatherside(N, A) :- father(F, N), sibling(F, A).

X & Y are cousins if their parents are siblings.

cousins(X, Y) :- parent(PX, X), parent(PY, Y), sibling(PX, PY).
cousins_motherside(X, Y) :- mother(PX, X), parent(PY, Y), sibling(PX, PY).
cousins_fatherside(X, Y) :- father(PX, X), parent(PY, Y), sibling(PX, PY).

ggp is short for great grand parent.

ggp(GGP, GGC) :- parent(P, GGC), grandparent(GGP, P).
ggp_motherside(GGP, GGC) :- mother(M, GGC), grandparent(GGP, M).
ggp_fatherside(GGP, GGC) :- father(F, GGC), grandparent(GGP, F).
grandauntuncle(N, GA) :- parent(PN, N), nnau(PN, GA).
grandauntuncle_motherside(N, GA) :- mother(PN, N), nnau(PN, GA).
grandauntuncle_fatherside(N, GA) :- father(PN, N), nnau(PN, GA).

Here is where it starts to get a little tricky. First cousins once removed can be found two ways. Your parent's first cousins are your first cousins once removed. But so are your parent's sibling's grandchildren.

cousins1st1rem(X, Y) :- parent(PX, X), cousins(PX, Y).
cousins1st1rem(X, Y) :- parent(PX, X), sibling(PX, GPY), grandparent(GPY, Y).
cousins1st1rem_motherside(X, Y) :- mother(PX, X), cousins(PX, Y).
cousins1st1rem_motherside(X, Y) :- mother(PX, X), sibling(PX, GPY), grandparent(GPY, Y).
cousins1st1rem_fatherside(X, Y) :- father(PX, X), cousins(PX, Y).
cousins1st1rem_fatherside(X, Y) :- father(PX, X), sibling(PX, GPY), grandparent(GPY, Y).

Not surprisingly it gets somewhat repetitive when defining 2nd, 3rd, or 4th cousins.

cousins2nd(X, Y, PX, PY) :- parent(PX, X), parent(PY, Y), cousins(PX, PY).
cousins2nd(X, Y) :- cousins2nd(X, Y, _, _).
cousins2nd_motherside(X, Y) :- mother(PX, X), parent(PY, Y), cousins(PX, PY).
cousins2nd_fatherside(X, Y) :- father(PX, X), parent(PY, Y), cousins(PX, PY).

I have more relations but this should give an idea of what I'm doing.

Wrapping Up

I use SWI-Prolog which has some bindings in Python allowing me to call Prolog and ascertain how everyone is related to everyone else. After that, it's a matter of spitting out the results in a format that can be turned into a PDF, or an e-book. For that task I spit it out in org-mode and then use pandoc to convert it to PDF and epub.

Alternatives

Imagine writing these relations in Python. Yuck. This is Prolog's strong suit. However note that I also considered miniKanren as this has several native python libraries. I just haven't had the time to wrap my mind around how kanren works. Plus SWI-Prolog has a cool swag store.