Description of the datasetThe dataset can be uploaded in a mysql database. The CREATE statements for the corresponding tables (each file = one table) can be found in the file tables.sql. The dataset consists of seven files: These are tab-separated files which have the following columns: Files tas and tas_spamTag ASsignments: Fact table; who attached which tag to which resource/content
Files bookmark and bookmark_spamDimension table for bookmark data
Files bibtex and bibtex_spamDimension table for BibTeX data
File userMapping of non-spammer / spammer for each user. This file can be used for spam classification.
Size of FilesNumber of lines in files:
Additional FilesFor the tag recommender competition, the tas table of the test dataset will not contain tags, as it is the task to predict these tags. The tas table of the test dataset contains for every post only one line having the tag null. No information about the actual number of tas will by given. You can download a version of the training tas file converted to the descibed format here: tas_testing_recommender.gz. |