Several metadata extractors have been released for version 1.1.1 of the MICO platform. On the language-processing side, there is support for normalisation, parsing, and rudimentary sentiment analysis. Normalisation consist in cleaning up a text by removing, for example, XML formatting and redundant white spaces to make it easier to process algorithmically. Parsing can then be used to identify the grammatical structure of the text, which is a useful step towards a shallow understanding of its meaning. Semantic analysis, finally, aims to derive subjective values from the text, which can basically be any label such as engaging, frightening, provoking, romantic. The first sentiment-analysis component for the MICO platform looks as positive versus negative polarity of sentence, since this is central for many applications.
A week after the platform release, researchers from Umea University and Zooniverse met up in Oxford to discuss the continued development of the textual metadata extractors. Sentiment analysis is relevant for Zooniverse because it might deepen their understanding of the volunteer community, which in turn would let them offer better support and service. However, what Zooniverse needs go beyond standard polarity detection. Instead, they raise questions such as what subjects appeal to the users, or how to recognise the difference between frustration and engagement.
A challenge of going beyond polarity assessment is to find the gold-standard data needed for supervised machine learning. Given a set of forum posts, we would like to know how humans rate them in terms of confidence, frustration, engagement, and so forth. It seems natural enough then to set up a small, internal Zooniverse project and crowd-source sentiment analysis among the project members, thereby bootstrapping automatic classification system. By doing so, Zooniverse becomes both a means and an end in the development of new extractors.