11th Leaderboard

A new team rushed to the top of our leaderboard. Close behind is team disc who improved their score every week since they started on the leaderboard. With this speed of progress, they will be on top of the leaderboard until next week. We are waiting the next recommender result submission for the coming leaderboard.

Pos Diff Team Name Score
1

all your base 0,0357
2

disc 0,0324
3

Context 0,0318
4

TomFu 0,0309
5

sertão 0,0296
6

Labic 0,0291
7

ibayer 0,0259
8

cadejo 0,0200
9

thalesfc 0,0199
10

TeamUFCG 0,0156
11

PwrInfZC 0,0130
12

persona-non-data 0,0043

10th Leaderboard

We enter the last month for the offline challenge a bit later, but with a leaderboard full of changes. First of all, we welcome the new team Labic, which rushed up to the 3rd place. Team disc was also successful, improving their score from 0.0242 to 0.0261. We are waiting the next recommender result submission for the coming leaderboard.

Pos Diff Team Name Score
1

Context 0,0318
2

TomFu 0,0309
3

Labic 0,0291
4

disc 0,0261
5

sertão 0,0234
6

cadejo 0,0199
7

thalesfc 0,0199
8

TeamUFCG 0,0156
9

PwrInfZC 0,0130
10

persona-non-data 0,0043

9th Leaderboard v2

We are sorry that the leaderboard we posted this morning contained the wrong data. Thanks to everyone who told us about the strange values on the leaderboard. We will be more careful in the future and apologize for the inconvenience.

Pos Diff Team Name Score
1

Context 0,0318
2

TomFu 0,0309
3

thalesfc 0,0261
4

cadejo 0,0261
5

sertão 0,0246
6

disc 0,0242
7

TeamUFCG 0,0156
8

PwrInfZC 0,0130
9

persona-non-data 0,0043

8th Leaderboard

Two new team joined the competition during the last week. Welcome team sertão and disc. We are waiting the next recommender result submission for the coming leaderboard.

Pos Diff Team Name Score
1

Context 0,0318
2

TomFu 0,0309
3

cadejo 0,0261
4

thalesfc 0,0258
5

sertão 0,0196
6

TeamUFCG 0,0156
7

PwrInfZC 0,0130
8

disc 0,0094
9

persona-non-data 0,0043

7th Leaderboard

We welcome the three new teams that made it onto our leaderboard. Especially team thalesfc is doing a pretty good start by getting a score of 0,0259. It will be interesting to see how the new teams perform until next week.
We are waiting the next recommender result submission for the coming leaderboard.

Pos Diff Team Name Score
1

Context 0,0318
2

TomFu 0,0309
3

cadejo 0,0263
4

thalesfc 0,0259
5

TeamUFCG 0,0156
6

PwrInfZC 0,0129
7

persona-non-data 0,0117

P.S.: From now on, we will release the weekly leaderboard on every Monday instead of Friday.

6th Leaderboard

Not much changed last week. There where new submissions by team TeamUFCG and cadejo. Sadly, their score went down a bit. However, continue using this leaderboard to verify your current state of development. We are waiting the next recommender result submission for the coming leaderboard.

Pos Diff Team Name Score
1

Context 0,0318
2

TomFu 0,0309
3

cadejo 0,0195
4

TeamUFCG 0,0156

Note: Thanks to a hint from one of our participant, we managed to identify one little flaw in our evaluation script. A successful recommended name has counted twice if it was recommended twice. Thanks for this information.

5th Leaderboard

There are some changes in the leaderboard of this week. Team Context has taken the first place from TomFu and Team cadejo and TeamUFCG changed places as well. Meanwhile, there are a lot new team registrations. So, hopefully there are some new faces in the next leaderboard. Stay tuned.

Pos Diff Team Name Score
1

Context 0,0318
2

TomFu 0,0309
3

cadejo 0,0209
4

TeamUFCG 0,0160

4th Leaderboard

One more week, one more participant. Team Context joins the competition and was able to take the second place. Meanwhile, team TomFu was able to double its score while team cadejo reduced its score by almost the half. Don’t mind, just the final submission will count. So, keep testing out your ideas. We are waiting the next recommender result submission for the coming leaderboard.

Pos Diff Team Name Score
1

TomFu 0,0309
2

Context 0,0305
3

TeamUFCG 0,0262
4

cadejo 0,0122

3rd Leaderboard

This week ends with a new leader on our leaderboard. Team cadejo joins the competition by taking place number one with just an inch in advance. We are waiting the next recommender result submission for the coming leaderboard.

Pos Diff Team Name Score
1

cadejo 0,0263
2

TeamUFCG 0,0262
3

TomFu 0,0158

Teaser #4: Namelings and ReTweet Links

Once again we have some nice results to share! Today we look at Twitter’s ReTweet graph, based on the same data set which was described in Teaser #3. The ReTweet graph was extracted from Jure Leskovec’s Twitter Sample, applying a simple RT @username filter (thereby ignoring “dark retweets”).

The resulting ReTweet graph comprises 826,104 users with 2,286,416 edges. Just considering the ReTweet frequencies (i.e., how often user A retweeted user B), we aggregated some average similarity scores. Firstly, we collected all hash tags for each user separately and represented the user by the resulting hash tag context vector (i.e. each component of a user’s context vector contains the number of tweets in which the user applied the corresponding hash tag). Thus we can calculated the cosine similarity between pairs of users, based on the corresponding context vectors. Averaging these similarity scores per retweet frequency, we obtained the following plot (excluding self retweets):


As we can see: Pairs of users who retweet one the other more frequently, tend to be more similar with respect to the corresponding hash tag usage. This is not surprising, but nevertheless, nice to see. Please note that the plots are log-log scaled and retweet freqencies are binned logarithmically.

Secondly, we extracted geo locations for Twitter users and calculated the average geographic distance of user pairs, relative to the corresponding retweet count:

These results are not as clear as in the case of hash tag similarity, but nevertheless, we can observe the tendency of user pairs with higher retweet counts being more closely located. It is worth noting that the global average geographic distance of all users is 7,484 Kilometres and thus already low retweet frequencies yield significantly lower average distances. For your convenience, we also show the linear scale plot:

But finally, the interesting part: Again, we heuristically determined given names for Twitter users by matching the user name with our list of known names. We thus collected names for 179,260 users, having 111,204 links in the ReTweet graph (excluding self retweets). We than calculated the average name similarity of user pairs based on the name co-occurrence graph derived from the English Wikipedia corpus (as described in the Nameling papers):

The result is rather unexpected: The average name similarity decreases with increasing retweet counts! That is, spontaneous retweets are more likely among users with similar names and user pairs which retweet often tend to have less similar names.

At this point, further investigation is due. Maybe these results are artefacts induced by the applied name similarity function. But other hypothesis may also support these observations. Higher average name similarity for low retweet counts can be explained by assuming that spontaneous retweets are more likely related to topics which are relevant to the retweeting user’s cultural background (e.g. local events, TV shows, etc.) and for user pairs who retweet often, the name correlated relations are less important, as these users share some focused interest (e.g. Recommender Systems).

These are of course only speculations and we welcome you to discuss these observations either via Twitter or in our forum!

Happy number crunching!