Wouldn’t it be nice to address fragments of video and audio files on the web in a standardized way, so that everybody is able to access it directly without fast-forward and rewind for hours? Wouldn’t it be great to interact with these fragments by adding platform independent comments, descriptions or even links to other web resources in order to combine media with meaning? And wouldn’t it be awesome to use this information to retrieve exactly the seconds and ranges that you are actually interested in? Then Linked Media and SPARQL-MM could be interesting for you.

SPARQL-MM is and extension for SPARQL, the de facto standard query language for the Semantic Web. It introduces spatio-temporal filter and aggregation functions to handle media resources and fragments that follow the W3C standard for Media Fragment URIs. SPARQL-MM is currently available as Sesame function set, which makes it pretty easy to use it with Sesame API e.g. just by adding a dependency to your current project. In combination with the W3Cs Media Annotations Ontology it builds a powerful setup for Semantic Media management and retrieval. In this blog I will show you, how.

An example
I am a big fan of extreme sports videos like Cliff Diving, Free Skiing and Motocross. I know the athletes very well, have of course some favorites and know most of the times exactly what I want to see. But crawling the big bunch of videos is sometimes really annoying. So I query like “Give me the spatio-temporal snippet that shows Lewis Jones right beside Connor Macfarlane” costs me a lot of time. Even if the videos are well annotated (which is the case for these kind of clips) currently I wouldn’t get an answer for this query without a major manual effort.

The background
As mentioned, SPARQL-MM is based on the W3C standards for Semantic Web and (linked) media metadata representation. The picture below shows an annotated snippet using these standards. We use Media Fragment URIs to link annotations to specific spatio-temporal parts of the video. The specification provides a media-format independent, standard means of addressing media fragments on the Web using Uniform Resource Identifiers. It supports particular name-value-pairs, e.g. (t = start, end) for temporal and (xywh = x, y, width, height) for regional fragments. In our example Connor Macfarlane appears from second 194 to 198 on the left side, while Lewis Jones is marked from second 193 to 198 on the right side. Even if the structure seems to be complicated, there are many tools to create annotations like this. So the data is there – but how to get back what I actually want to have (You remember: the 2 guys). That’s where SPARQL-MM comes into the game.

Media Fragments in extreme sports video

A semantic description of Media Fragments

SPARQL-MM
SPARQL adds the missing link to utilize hidden information (inside the Media Fragment URIs) for information retrieval via aggregation and filter functions.  A detailed description of all functions in human- and machine-readable format (following the sparql-service-description extension for describing SPARQL extensions and function libraries) can be found on the source repository of our [reference implementation]. Each function is identified by a unique URI but all together share the same base URI mm: <http://linkedmultimedia.org/sparql-mm/functions#>. For the function set we follow well-known standards (e.g. DE-9IM) for topological and or temporal relations.

Using SPARQL-MM functions we can now formulate the query “Give me the spatio-temporal snippet that shows Lewis Jones right beside Connor Macfarlane”), as a SPARQL query like this:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mm: <http://linkedmultimedia.org/sparql-mm/functions#>
PREFIX ma: <http://www.w3.org/ns/ma-ont#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT (mm:boundingBox(?l1,?l2) AS ?left_right) WHERE {
    ?f1 ma:locator ?l1; dct:subject ?p1.
    ?p1 foaf:name "Lewis Jones".
    ?f2 ma:locator ?l2; dct:subject ?p2.
    ?p2 foaf:name "Connor Macfarlane".

    FILTER mm:rightBeside(?l1,?l2)
    FILTER mm:temporalOverlaps(?l1,?l2)
}

We use mm:temporalOverlaps  to get fragments that appear in the identical temporal sequence, mm:rightBeside handles the spatial relation and mm:boundingBox merges every two fragments that match the filters. You can test this and other examples on our demo page.

Have fun!
Thomas

PS: We presented SPARQL-MM as a demo at the ESWC 2014 in Crete this year. Thanks to all the people there for the good discussions!

PPS: The reference implementation is Apache licensed and free available on GitHub. Check it out!