Phase 1 - Report: Maja Gwozdz
In the first phase of GSoC 2018, I started annotating political tweets. The corpus of political tweets includes, for instance, tweets related to US, Canadian, UK, Australian politics and current social affairs. The categories included in the database include information about the author, their gender, the political bias, the polarity of a given entry (I'm using a discrete scale: -1 for a negative utterance, 0 for a neutral one, 1 for a positive entry), speech acts, mood of the tweet (for instance, sarcasm or anger), any swear words / offensive language, and the keywords, that is, concrete parts of the tweet that led to the polarity judgment.
In order to obtain the relevant political tweets, I used Grasp and a list of popular political hashtags (to mention but a few: #MAGA, #TrudeauMustGo, #auspoli, #Brexit, #canpoli, #TheresaMay). I also prepared the annotation guidelines, so that other people interested in the project could offer their own judgment and provide additional annotations. Having more judgments will render the corpus more valuable. In the next stage of GSoC, I hope to have enough judgments from other people to estimate the agreement score and arrive at (more) objective scores.
The database is currently available as a Google Sheet --- this is a relatively easy way to store data and allow for parallel annotation.