Datasets
Chess
As part of our analysis all the game on Lichess with stockfish analysis were processed into csv files. These can be found here
Gab
This is a nearly complete collection of Gab accounts, collected between August 10, 2016 and October 28, 2018. There are two files:
Users Information (Node List) (1.2GB)
A JSON Lines text format file, with one line per account. This contains many many different fields many are directly from the API, most should be self explanatory. A couple notes though:
username
is globally uniquecreated_at_month_label
is when the account was created- the users bios (
bio
) are auto-populated with quotations, so empty bios are rare hash_tags
,urls
,has_hate_speech
, and a few others are baed on the user’s postshate_probs
is the output of a simple hate-speech detector run on the bio
Edge List (2.0GB)
A CSV edge list file, with the source following and/or reposting the target. The is_follow
column indicates if the edge is a follow, if it’s false than the edge only a reposts. reposts_count
shows how many times the source has reposted the target. Please not that the edges were generated for each user independently, so many edges are present twice.
More coming soon
I’m working a collection of chess engines and will have them available once they’re working