Organizing Videos Streams for
Clustering and Estimation of Popular Scenes

International Conference on Image Analysis and Processing (ICIAP), 2017

S. Battiato1, G. M. Farinella1, F. L. M. Milotta1,2,*, A. Ortis1,2, F. Stanco1
V. D'Amico2, G. Torrisi2, L. Addesso2

1University of Catania, Department of Mathematics and Computer Science - Italy, {battiato, gfarinella, milotta, ortis, fstanco}@dmi.unict.it
2TIM JOL Catania - Italy, {valeria1.damico, giovanni.torrisi, luca.addesso}@telecomitalia.it

Abstract

The huge diffusion of mobile devices with embedded cameras has opened new challenges in the context of the automatic understanding of video streams acquired by multiple users during events, such as sport matches, expos, concerts. The popularity of a visual content is an important cue exploitable in several fields which include the estimation of the mood of the crowds attending to an event, the estimation of the interest of parts of a cultural heritage, etc.
The popularity of a visual content can be obtained through the ``visual consensus'' among multiple video streams acquired by the different users devices. In this paper we address the problem of detecting and summarizing the ``popular scenes'' captured by users with a mobile camera during events. For this purpose, we have developed a framework called RECfusion in which the key popular scenes of multiple streams are identified and tracked over time. The proposed system is able to automatically generate a video which captures the interests of the crowd starting from a set of the videos by considering scene content popularity. Experiments are performed on a dataset composed by different video sequences (with high variability in terms of resolution) and contexts.

Experimental Results

The experimental results (reported in the following Table) show that the proposed vote-based cluster tracking procedure reaches TPR values much higher than the threshold-based procedure [2], while results on TNR and ACC remain comparable between the two procedures. Just in the Meeting video set the proposed vote-based procedure is slighty outperformed: this is a limitation of the procedure. Indeed, if intraflow analysis labels N scenes, then vote-based procedure is able to distinguish at most N scenes, because the procedure tracks the clusters using the LoggedClustersIDs as defined in [2]. So, differently by threshold-based procedure, that can generate a bunch of small sparse clusters if threshold is not fine tuned, in this case only a limited number of clusters is tracked. In Meeting video set two people are recorded and there are only two distinguished clusters focusing on each one of them. Sometimes interflow analysis generates a cluster containing both guys. This is treated by the cluster tracking vote-based procedure as Noise, since intraflow analysis has never labeled a scene in which the people are recorded together.

Demonstrative Videos

==> Download from here the collection of demonstrative videos showing the comparison between [2] and the new proposed approach.

Notice how the new proposed approach results in a more stable cluster tracking with respect to [2] and how it is now able to track Noise too.

Other works about RECfusion

[1] A. Ortis, G.M. Farinella, V. D’Amico, G. Torrisi, L. Addesso, and S. Battiato, "RECfusion: Automatic Video Curation Driven by Visual Content Popularity, Proc. ACM Multimedia", pg. 1179–1182 (2015). (webpage)
[2] F. L.M. Milotta, S. Battiato, F. Stanco, V. D'Amico, G. Torrisi, L. Addesso, "RECfusion: Automatic Scene Clustering and Tracking in Videos from Multiple Sources", Proceedings of EI - Mobile Devices and Multimedia: Enabling Technologies, Algorithms, and Applications 2016. (webpage)