This year I attended the SEMANTiCS 2014 conference in Leipzig and had the opportunity to present our work for the Tourism sector and Linked Open Data as well as our research activities in MICO. This post highlights innovation and use cases I found inspiring for the work we’re doing in the area of cross-media analysis and more specifically for the video news showcase that we’re building to validate MICO’s resulting technologies.
I shared already the overall picture of the event in the form of a collection of stories on the Insideout10 Blog, and for those interested in the details, here is the link: SEMANTiCS 2014 – The Hockey Stick effect.
Our goal in MICO is to develop a platform and a methodology to analyse “media in context” by orchestrating different content extraction tools that can work simultaneously or in a sequence and provide valuable metadata to third party applications. The video news showcase is built around two core modules: a CMS (WordPress) and an online video platform (Helix Cloud a platform we developed at Insideout10 in partnership with RealNetworks for offering cloud video services much like the ones offered by Brightcove, Ooyala and Wistia).
Here is a presentation describing the use cases for the video news: News Video Showcase description.
For these reasons I found extremely relevant the presentation given at SEMANTiCS 2014 by Sofia Angeletou from BBC explaining how they’re using Semantic Web technologies and are starting to open access to their data (news of the day) using their own Linked Data Architecture (yes, they call it platform but, as it has nothing to do with W3C Linked Data Platform recommendations, let’s rename it Linked Data Architecture). Here is the presentation from Sofia.
More importantly for the news sector are the BBC Ontologies in action for News products such as the UK 2014 Election website. These ontologies while proprietary of BBC have been designed specifically to support audience facing applications like the election website mentioned above and other relevant editorial products. Definitely worth to evaluate how these ontologies compare to existing standards like Open Annotation Core Data Model and alike.
Interesting achievements from this project, in relation to the work we’re doing in MICO are related to audio processing (a set of speech processing algorithms BBC called Kiwi):
- Speaker segmentation, identification and gender detection (using LIUM diarization toolkit, diarize-jruby and ruby-lsh). An audio file gets automatically divided into segments according to the identity of the speaker (not bad at all!).
- Speech-to-text for the detected speech segments (using CMU Sphinx). Interesting to know that the algorithm has been trained using models built from a wide range of BBC data.
- Automated tagging with DBpedia identifiers. The automatic tagging process creates the searchable meta-data that ultimately allows us to access the archives much more easily (the tool being developed for the tagging is called MANGO.
We will be keeping a close eye on these developments. For sure it was great to see BBC using Semantic Technologies for building value propositions to their audiences in the news sector and validate the approach of our video news use case.
It’s clear that the increasing demand of a more visual Web, the growth of time spent engaging with audio and video in a multi-screen environment made of SmartTVs, Tablets and Smartphones puts enormous pressure on metadata extraction and management. Key players like BBC are leading the R&D efforts required in this sector to sustain new audience behaviours and can afford a long term innovation strategy. MICO needs to bring these innovations to independent news providers, bloggers and SMEs that cannot otherwise afford to cope with these new trends and…we’re all well motivated to do so, keep following us!