Masha Ivenskaya has implemented a text classification approach that checks whether a news article could be considered sensationalist or not. Copy and paste the headline and body of an article in the forms below to have the data analyzed.
The source code can be found on Github.
The classifier considers the following features:
- POS tags (unigrams and bigrams)
- Punctuation counts
- Average sentence length
- Number of all-cap tokens (excluding common abbreviations, and normalized by length of text)
- Number of words that overlap with the Pattern Profanity word list (normalized by length of text)
- Polarity and Subjectivity scores (obtained through the Pattern Sentiment module)