Egocentric Shopping Cart Localization

Emiliano Spera, Antonino Furnari, Sebastiano Battiato, Giovanni Maria Farinella

This work investigates the new problem of image-based egocentric shopping cart localization in retail stores. The contribution of our work is two-fold. First, we propose a novel large-scale dataset for image-based egocentric shopping cart localization. The dataset has been collected using cameras placed on shopping carts in a large retail store. It contains a total of 19,531 image frames, each labelled with its six Degrees Of Freedom pose. We study the localization problem by analysing how cart locations should be represented and estimated, and how to assess the localization results. Second, we benchmark different image-based techniques to address the task. Specifically, we investigate two families of algorithms: classic methods based on image retrieval and emerging methods based on regression. Experimental results show that methods based on image retrieval largely outperform regression-based approaches. We also point out that deep metric learning techniques allow to learn better visual representations w.r.t. other architectures, and are useful to improve the localization results of both retrieval-based and regression-based approaches. Our findings suggest that deep metric learning techniques can help bridge the gap between retrieval-based and regression-based methods.


The video shows the performance of the 1-NN approach based on image representations obtained using deep metric learning (Triplet Networks). On the right, we report the query image (top) and the closest training image selected by the 1-NN (bottom). On the left, we report all the 2D positions of the training samples (black points) and the ground truth positions of test video (red line). At each time step, a circle indicates the inferred position. The color of the circle indicates the position error committed by the algorithm (see color bar on the right for reference). We also report a segment to link the inferred position to the ground truth one.


@inproceedings{spera2018egocentric, year = { 2018 }, booktitle = {International Conference on Pattern Recognition (ICPR)}, title = { Egocentric Shopping Cart Localization }, author = { Emiliano Spera and Antonino Furnari and Sebastiano Battiato and Giovanni Maria Farinella } }

Paper download


The proposed dataset has been built using frames extracted from nine different videos acquired in a retail store of the south of Italy with an extension of 782 m^2. The videos have been acquired with two different zed-cameras mounted on a shopping cart with their focal axes parallel to the floor of the store. Each video has been temporally sub-sampled at 3 fps. The overall dataset consists of 19,531 labelled images. We divided the dataset into train and test set. The two subsets are obtained considering respectively 6 videos training (13,360 frames) and 3 videos for test (6,171 frames). Each of the subsets contains images covering the entire store. Images are labelled with their 2D position and an angle indicating the orientation of the shopping cart in the 2D plane. Specifically, a camera pose is represented as p=(X,U), where X=(x,y) is a 2D vector representing the position of the shopping cart and U=(u,v) is a unit vector representing the orientation of the cart.

Txt Files Format

test.txt and train.txt present in each row the quintuple (id,x,y,u,v) associed to one image. id is an identifier of image associated to the row. x and y are the position coordinate of the camera. u and v are rispectively the x component and the y component of the unit vector representing camera orientation.


Sample images from proposed dataset. Each colomn show images with similar visual elements acquired in different parts of the store.


We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

Related Work

Related work: