Teaser #1: How similar are your friends’ names?

Try calculating the overall average similarity of names (e.g. based on co-occurrences in Wikipedia) and the average similarity of you and all your friends’ names. Do these average similarity scores differ significantly?

Here are the results for the 20DC13 Team

Firstly, our team members’ first names are Stephan, Andreas, Robert, Folke and Jürgen (ordered alphabetically by the last name).

We constructed the name co-occurrence graph based on sentences within the English Wikipedia, as described in our papers. Each name can then be represented by its “context” vector, i.e., the corresponding row within the co-occurrence graph’s adjacency matrix. We then calculated the similarity between two names as the cosine similarity between the corresponding context vectors. (These is by the way the similarity which is implemented in nameling and is for the respective top 100 similar pairs of name available for download).

Well, here are the pair-wise similarity scores for the 20DC13 team:

Name1	Name2	Similarity
Stephan	Andreas	0.901121
Stephan	Robert	0.789887
Stephan	Folke	0.549801
Stephan	Jürgen	0.806095
Andreas	Robert	0.688174
Andreas	Folke	0.558555
Andreas	Jürgen	0.849864
Robert	Folke	0.465395
Robert	Jürgen	0.569373
Folke	Jürgen	0.474674

In average, our team member similarity score is accordingly 0.665294. The total average pair-wise similarity is 0.02914, so our team’s similarity score is more than 22 times above average. Additionally, we repeatedly selected random groups of names of the same size as our team (100,000 repetitions) and calculated the respective average group similarity, resulting in the following histogram:

So yes, our team’s average name similarity is significantly larger than expected by chance!

Happy number crunching!
.folke

One comment

Mark March 11, 2013 at 10:58 am

Hi!

That’s pretty interesting and shows that similarity of people from the same country, same generation and education level is much higher than a random one. I ask myself what about other teams who organized these challenge before? What is similarity of their names? For instance, a french-greek team organized challenge 2012. I’m pretty sure, the similarities of “greek” part and “french” part will be bigger than the total common one. What do you think, huh?

15th Discovery Challenge

organized in conjunction with ECML PKDD 2013