This is a technical report on Multimedia Forensics published on ArXiv at https://arxiv.org/abs/1610.06347.
Image Forensics has already achieved great results for the source camera identification task on images. Standard approaches for data coming from Social Network Platforms cannot be applied due to different processes involved (e.g., scaling, compression, etc.). Over 1 billion images are shared each day on the Internet and obtaining information about their history from the moment they were acquired could be exploited for investigation purposes. In this paper, a classification engine for the reconstruction of the history of an image, is presented. Specifically, exploiting K-NN and decision trees classifiers and a-priori knowledge acquired through image analysis, we propose an automatic approach that can understand which Social Network Platform has processed an image and the software application used to perform the image upload. The engine makes use of proper alterations introduced by each platform as features. Results, in terms of global accuracy on a dataset of 2720 images, confirm the effectiveness of the proposed strategy.
The alterations introduced on images by Social Network Services (SNS) can be thought as a unique fingerprint left by the SNS. The aim of the proposed study is to discover those fingerprints by analyzing the behavior of the most popular SNSs that allow image sharing. Hence, 10 platforms have been selected. First of all, Facebook and Google+ were taken into account as being the two most popular platforms where users can share their statuses and multimedia content to a network of friends.
Twitter and Tumblr were considered as being representative of the micro-blogging concept. We included also Flickr and Instagram as platforms focused on sharing high quality artistic photos with capabilities of image editing and filtering. Imgur and Tinypic were also taken into consideration even if they are not properly SNSs but are very popular platforms for image sharing: users usually link images hosted on them from forums and web sites all over the Internet.
Finally WhatsApp and Telegram were also selected as being the two most popular mobile messaging platforms that, by allowing users to create chat groups, are another big place for image sharing on the Internet. Specifically, the last two services are often involved in forensic investigations.
To discover how SNSs process images, we collected a set of photos with 4 different the camera devices.
Images were acquired representing three different types of scenes: outdoor scenes with buildings (artificial environment), outdoor scenes without buildings (natural environment) and indoor scenes.
When taking a picture, we captured two versions: a High Quality (HQ) photo at the maximum resolution allowed by the device, and a Low Quality (LQ) photo. Capturing images in this way, a dataset with a good variability in terms of contents and resolutions was obtained.
Images collected so far were uploaded to each of the considered platforms with two different methods: with a web browser, and with iOS and Android native apps. No further discrimination is needed for web browsers because we observed that alterations are not browser-dependent.
Each download was performed by searching for the image file URL in the HTML code of the page showing the image itself. At the end of this phase 2400 images were properly collected.
The second upload method was carried out with iOS and Android native apps of each social platform, except for Tinypic that do not possess an official app in stores. Moreover, the upload has been done by choosing images in two ways: by searching in the gallery for a previously acquired image (images from local gallery) and by acquiring the image with the camera app embedded in the app itself (embedded camera app). After uploading all images as described above, all of them were downloaded through the "URL searching technique" previously described. 320 more images processed through 8 platforms were thus obtained.
All uploads were performed with default settings.
The overall dataset consists of 2720 images in JPEG format and it is available for research purposes.
Our Classification Engine performs the task of Image Ballistics with good accuracy by predicting the SNS that processed an image and the corresponding upload method and platform used, with an accuracy respectively of 96% and 97.69%.
Heatmaps for Confusion Matrices obtained from 5-cross validation on our dataset. The reported values, coded in heatmap colors, are the average value between the 5 runs of cross validation. (a) Confusion Matrix for Social platform Classification, (b) Confusion Matrix for upload method classification.
This research was supported by iCTLab s.r.l. and the Image Processing Laboratory (IPLAB) of University of Catania. We thank all those people who helped in building the Social Image Dataset that made this research possible.