Data

This is one of the "70 online databases that define our planet".

The following data sets were available to any researchers at recognised academic institutions but are no longer available, sorry.

  • BBC News forum posts: 2,594,745 comments from selected BBC News forums and > 1,000 human classified sentiment strengths with a postive strength of 1-5 and a negative strength of 1-5. The classification is the average of three human classifiers.
  • Digg post comments: 1,646,153 comments on Digg posts (typically highlighting news or technology stories) and > 1,000 human classified sentiment strengths with a postive strength of 1-5 and a negative strength of 1-5. The classification is the average of three human classifiers.
  • MySpace (social network site) comments: six sets of systematic samples (3 for the US and 3 for the UK) of all comments exchanged between pairs of friends (about 350 pairs for each UK sample and about 3,500 pairs for each US sample) from a total of >100,000 members and > 1,000 human classified sentiment strengths with a postive strength of 1-5 and a negative strength of 1-5. The classification is the average of three human classifiers.

Software

The sentiment strength classifier SentiStrength is free to download.

An unsupervised, lexicon-based classifier, tested on social media (Twitter, Digg, MySpace) is also available for research purposes, as: 1) a command-line application, 2) a Java .jar file with an open API and 3) C++ .dll. The classifier provides ternary classification: objective, positive, negative. For more information, please send an email to g.paltoglou[at]wlv.ac.uk, indicating which version interests you.

 

FET tre
 
Collective emotions in cyberspace
OFAIWarsaw University of TechnologyÉcole Polytechnique Fédérale De LausanneJacobs UniversityStatistical Cybermetrics Research Group
-
GemiusInstitut Jozef StefanETHikm research