Try calculating the overall average similarity of names (e.g. based on co-occurrences in Wikipedia) and the average similarity of you and all your friends’ names. Do these average similarity scores differ significantly?
Here are the results for the 20DC13 Team
Firstly, our team members’ first names are Stephan, Andreas, Robert, Folke and Jürgen (ordered alphabetically by the last name).
We constructed the name co-occurrence graph based on sentences within the English Wikipedia, as described in our papers. Each name can then be represented by its “context” vector, i.e., the corresponding row within the co-occurrence graph’s adjacency matrix. We then calculated the similarity between two names as the cosine similarity between the corresponding context vectors. (These is by the way the similarity which is implemented in nameling and is for the respective top 100 similar pairs of name available for download).
Well, here are the pair-wise similarity scores for the 20DC13 team:
Name1 | Name2 | Similarity |
---|---|---|
Stephan | Andreas | 0.901121 |
Stephan | Robert | 0.789887 |
Stephan | Folke | 0.549801 |
Stephan | Jürgen | 0.806095 |
Andreas | Robert | 0.688174 |
Andreas | Folke | 0.558555 |
Andreas | Jürgen | 0.849864 |
Robert | Folke | 0.465395 |
Robert | Jürgen | 0.569373 |
Folke | Jürgen | 0.474674 |
In average, our team member similarity score is accordingly 0.665294. The total average pair-wise similarity is 0.02914, so our team’s similarity score is more than 22 times above average. Additionally, we repeatedly selected random groups of names of the same size as our team (100,000 repetitions) and calculated the respective average group similarity, resulting in the following histogram:
So yes, our team’s average name similarity is significantly larger than expected by chance!
Happy number crunching!
.folke
Hi!
That’s pretty interesting and shows that similarity of people from the same country, same generation and education level is much higher than a random one. I ask myself what about other teams who organized these challenge before? What is similarity of their names? For instance, a french-greek team organized challenge 2012. I’m pretty sure, the similarities of “greek” part and “french” part will be bigger than the total common one. What do you think, huh?