Gab


This is a nearly complete collection of Gab accounts, collected between August 10, 2016 and October 28, 2018. There are two files:

Users Information (Node List) (1.2GB)

A JSON Lines text format file, with one line per account. This contains many many different fields many are directly from the API, most should be self explanatory. A couple notes though:

  • username is globally unique
  • created_at_month_label is when the account was created
  • the users bios (bio) are auto-populated with quotations, so empty bios are rare
  • hash_tags, urls, has_hate_speech, and a few others are baed on the user’s posts
  • hate_probs is the output of a simple hate-speech detector run on the bio

Edge List (2.0GB)

A CSV edge list file, with the source following and/or reposting the target. The is_follow column indicates if the edge is a follow, if it’s false than the edge only a reposts. reposts_count shows how many times the source has reposted the target. Please not that the edges were generated for each user independently, so many edges are present twice.

More coming soon


I’m working a collection of chess engines and will have them available once they’re working