Data
The following data sets are available to any researchers at recognised academic institutions. To get access to the data, please complete and submit the data application form.
- BBC News forum posts: 2,594,745 comments from selected BBC News forums and > 1,000 human classified sentiment strengths with a postive strength of 1-5 and a negative strength of 1-5. The classification is the average of three human classifiers.
- Digg post comments: 1,646,153 comments on Digg posts (typically highlighting news or technology stories) and > 1,000 human classified sentiment strengths with a postive strength of 1-5 and a negative strength of 1-5. The classification is the average of three human classifiers.
- MySpace (social network site) comments: six sets of systematic samples (3 for the US and 3 for the UK) of all comments exchanged between pairs of friends (about 350 pairs for each UK sample and about 3,500 pairs for each US sample) from a total of >100,000 members and > 1,000 human classified sentiment strengths with a postive strength of 1-5 and a negative strength of 1-5. The classification is the average of three human classifiers.
Software
The sentiment strength classifier SentiStrength is free to download.
An unsupervised, lexicon-based classifier, tested on social media (Twitter, Digg, MySpace) is also available for research purposes, as: 1) a command-line application, 2) a Java .jar file with an open API and 3) C++ .dll. The classifier provides ternary classification: objective, positive, negative. For more information, please send an email to g.paltoglou[at]wlv.ac.uk, indicating which version interests you.