Abstract
In Implicit Neural Representations (INRs) a discrete signal is parameterized by a neural network that maps coordinates to the signal samples.
INRs were successfully employed for encoding and compression, but such approaches are in their early stage and are still overcome by traditional codecs and autoencoders. Despite this, they have recently gained the attention of the research community due to their promising results as novel representation strategies for encoding visual content.
In this paper, we propose Neural Imaging Format (NIF), an open-source INR-based image compression codec which takes advantage of a novel neural architecture which consists of two modules: a Genesis network, for mapping coordinates to pixels through bottleneck layers with sinusoidal activation units, and a Modulation network, for varying the period of the sinusoidal activations. Additionally, a final weights quantization step leads to an improvement in the compression ratio.
Our proposal (NIF) consistently outperforms state-of-art INR-based compressors in terms of PSNR, by achieving comparable or better results with an outstanding up to x26 encoding speed. We also show that NIF reduces the gap between INR-based methods with respect to traditional approaches. Interestingly, our approach outperforms established codecs such as JPEG and WebP when one encodes high-resolution images at low-bitrate regimes. Extensive experiments on different datasets, a visual comparison, and an ablation study, prove the validity of the proposed approach.
Introduction to Implicit Neural Representations (INRs)
Implicit Neural Representation (INR) is a very recent paradigm for data encoding, where a data point is reinterpreted as a map from coordinates to some features.
This mapping is then approximated by a neural network, which is purposely trained to overfit the task, resulting in a continuous functional representation of the data point. This concept is the foundation for a variety of works that propose multiple techniques to derive functional representations of various data types such as 3D shapes, radiance fields, videos and images.
Neural Image Format (NIF), the INR-based compression approach we propose, relies on training a representational network that maps input coordinates to pixels.
The network parameters are then quantized and compressed to minimize the size of encoded data.
These methods are penalized by the computational cost to overfit a neural network on the signal and by the time required to obtain a compressed representation.
Our proposal leverages various tweaks to aid the network during training and to reduce the time needed to obtain an optimal approximation of the target signal.
Training process
The training of a neural network typically consists of executing a number of epochs, each containing several iterations.
However, many state-of-the-art implicit image compression approaches lack a clear subdivision of the overall training process.
Our approach leverages a step-wise decomposition of both the fitting and quantization fine-tuning processes, as it enables one to perform some optimisations on the network at specific stages, such as weights restart.
Compression speed improvements
The following results show that our method achieves comparable or better results compared to COIN and the basic version of Strumpler et al. with much lower compression times.
We select configurations that achieve similar bits-per-pixel at full resolution and report the average PSNR and SSIM as reference.
The advantage is evident, especially at lower bitrates, for instance when using an average of 0.3 bits-per-pixel to encode x2 downsampled pictures NIF has about x26 compression speed (2:42 vs 70:15) and a slightly lower SSIM than Strumpler basic (0.891 vs 0.920):
Method |
Time (min:sec) |
PSNR (dB) |
SSIM |
COIN |
26:19 |
28.84dB |
0.777 |
Strumpler |
70:15 |
35.00dB |
0.920 |
NIF (Ours) |
2:42 |
32.33dB |
0.891 |
At 1.0 bits-per-pixels and x4 downsampled images, our proposal NIF achieves the best results in terms of PSNR and SSIM with a large time saving (x10). It is important to note that the running time of NIF is quite similar for all the configurations we investigated, while COIN and Strumpler basic's running time drastically goes up when the image resolution increases.
Method |
Time (min:sec) |
PSNR (dB) |
SSIM |
COIN |
16:04 |
22.53dB |
0.475 |
Strumpler |
24:15 |
46.73dB |
0.993 |
NIF (Ours) |
2:27 |
47.53dB |
0.997 |
Quantitative results
INR-based image compression
Our proposal NIF outperforms previous works on the field in terms of PSNR on Kodak while achieving comparable performance on CelebA.
It is fair to point out that Strumpler et al. approach achieves comparable or better PSNR when using meta-learned initializations on CelebA; this is not surprising since the base parameters are learned on the same CelebA dataset, which is characterized by a limited images variability, and therefore allows the network to leverage the consistent redundancies that are typically absent in real scenarios.
Ground Truth
CelebA #189985, 178x218
NIF (Ours), 0.65bpp
PSNR: 29.36dB, MS-SSIM: 0.96
Strumpler, 0.61bpp
PSNR: 29.70dB, MS-SSIM: 0.96
WebP, 0.66bpp
PSNR: 32.52dB, MS-SSIM: 0.97
In this CelebA picture, NIF reconstruction is notably less noisy than Strumpler's.
Although WebP is able to capture some finer details it suffers from evident block artifacts while INR-based methods do not.
Comparison to traditional approaches
Kodak
On this dataset, our proposal NIF clearly outperforms JPEG and performs comparably to modern codecs, such as BPG and WebP, in terms of PSNR at low bitrates.
In terms of MS-SSIM, NIF performs similarly to JPEG2000.
These results clearly reduce the gap between INR-based compression and classical methods with respect to the previous baseline given by the basic version of Strumpler et al., which is reported in the plot for reference.
Ground Truth
Kodak #8, 768x512
NIF (Ours), 0.28bpp
PSNR: 22.12dB, MS-SSIM: 0.89
Strumpler, 0.29bpp
PSNR: 22.18dB, MS-SSIM: 0.88
JPEG, 0.27bpp
PSNR: 21.33dB, MS-SSIM: 0.88
WebP, 0.27bpp
PSNR: 24.20dB, MS-SSIM: 0.93
BPG, 0.28bpp
PSNR: 25.43dB, MS-SSIM: 0.94
Although state-of-the-art methods, such as BPG and Xie et al. approach, achieve a smoother reconstruction, it is noticeable that they tend to remove a lot of small details, reconstructing some patches as uniform colour blocks.
This is clear on the Kodak picture #8.
In contrast, both NIF and Strumpler basic reconstruct this grainy structure, but NIF better encodes uniform patches.
Image Compression Benchmark (ICB)
On the high-resolution images ICB dataset our proposal NIF outperforms some baselines in terms of PSNR at low bitrates (0.08bpp), with a PSNR gain of +0.41dB compared with the modern WebP codec and a gain of +3.07dB against JPEG.
Concerning higher bitrates (0.22bpp), the results of NIF are comparable to WebP and JPEG, but they are still outperformed by BPG.
Ground Truth
ICB deer, Crop of 4043x2641
NIF (Ours), 0.05bpp
PSNR: 29.08dB, MS-SSIM: 0.85
BPG, 0.05bpp
PSNR: 29.36dB, MS-SSIM: 0.86
WebP, 0.05bpp
PSNR: 28.66dB, MS-SSIM: 0.85
JPEG, 0.07bpp
PSNR: 28.01dB, MS-SSIM: 0.81
Ground Truth
ICB flower foveon, Crop of 2268x1512
NIF (Ours), 0.10bpp
PSNR: 39.83dB, MS-SSIM: 0.98
BPG, 0.10bpp
PSNR: 41.21dB, MS-SSIM: 0.98
WebP, 0.10bpp
PSNR: 37.08dB, MS-SSIM: 0.97
JPEG, 0.18bpp
PSNR: 38.16dB, MS-SSIM: 0.97
On these crops, NIF obtains similar visual results.
In particular, in flower foveon, NIF reconstruction presents fewer blocking artifacts, which we believe is a strong positive factor of INR-based methods with respect to traditional ones.
In deer, NIF tends to reconstruct grainy noise in the background while classical codecs remove these details.
These promising results suggest that INR-based methods could be used as an effective tool to compress high-resolution images.
Conclusion
We have proposed NIF, a novel image compression method based on INRs which quantitatively and qualitatively outperforms current INR-based methods. In addition, the proposed approach drastically improves the computational requirements with a speed-up of x26 for some configurations, while preserving image quality. Visual comparisons show that the decompressed images are less noisy with respect to previous INR-based methods and do not suffer from well-known distortions such as blocking artifacts, which are instead common in traditional approaches.
Likewise standard codecs, the reference encoder produces stand-alone compressed files packed with metadata that can be decompressed using a compliant decoder.
In future works, we plan to put efforts into improving the reconstruction capabilities and speed of these methods, along with exploring their efficiency in multiple use cases.
The source code of this website is available on GitHub