Abstract

In Implicit Neural Representations (INRs) a discrete signal is parameterized by a neural network that maps coordinates to the signal samples. INRs were successfully employed for encoding and compression, but such approaches are in their early stage and are still overcome by traditional codecs and autoencoders. Despite this, they have recently gained the attention of the research community due to their promising results as novel representation strategies for encoding visual content. In this paper, we propose Neural Imaging Format (NIF), an open-source INR-based image compression codec which takes advantage of a novel neural architecture which consists of two modules: a Genesis network, for mapping coordinates to pixels through bottleneck layers with sinusoidal activation units, and a Modulation network, for varying the period of the sinusoidal activations. Additionally, a final weights quantization step leads to an improvement in the compression ratio. Our proposal (NIF) consistently outperforms state-of-art INR-based compressors in terms of PSNR, by achieving comparable or better results with an outstanding up to x26 encoding speed. We also show that NIF reduces the gap between INR-based methods with respect to traditional approaches. Interestingly, our approach outperforms established codecs such as JPEG and WebP when one encodes high-resolution images at low-bitrate regimes. Extensive experiments on different datasets, a visual comparison, and an ablation study, prove the validity of the proposed approach.

Introduction to Implicit Neural Representations (INRs)

Implicit Neural Representation (INR) is a very recent paradigm for data encoding, where a data point is reinterpreted as a map from coordinates to some features. This mapping is then approximated by a neural network, which is purposely trained to overfit the task, resulting in a continuous functional representation of the data point. This concept is the foundation for a variety of works that propose multiple techniques to derive functional representations of various data types such as 3D shapes, radiance fields, videos and images.

Neural Image Format (NIF), the INR-based compression approach we propose, relies on training a representational network that maps input coordinates to pixels. The network parameters are then quantized and compressed to minimize the size of encoded data. These methods are penalized by the computational cost to overfit a neural network on the signal and by the time required to obtain a compressed representation. Our proposal leverages various tweaks to aid the network during training and to reduce the time needed to obtain an optimal approximation of the target signal.

Network architecture

The architecture consists of two modules, named Genesis network and Modulation network. The genesis network is a multi-layer representational perceptron with sinusoidal activations (SIREN) responsible for calculating features of a pixel when fed with its coordinates. In contrast to traditional SIRENs, the period of each sinusoidal activation is altered based on the period variation provided by the Modulation network, a dedicated module that adjusts the hidden feature period, thereby adapting to variations in frequency across different regions of the image.

Training process

The training of a neural network typically consists of executing a number of epochs, each containing several iterations. However, many state-of-the-art implicit image compression approaches lack a clear subdivision of the overall training process. Our approach leverages a step-wise decomposition of both the fitting and quantization fine-tuning processes, as it enables one to perform some optimisations on the network at specific stages, such as weights restart.

Compression speed improvements

The following results show that our method achieves comparable or better results compared to COIN and the basic version of Strumpler et al. with much lower compression times. We select configurations that achieve similar bits-per-pixel at full resolution and report the average PSNR and SSIM as reference. The advantage is evident, especially at lower bitrates, for instance when using an average of 0.3 bits-per-pixel to encode x2 downsampled pictures NIF has about x26 compression speed (2:42 vs 70:15) and a slightly lower SSIM than Strumpler basic (0.891 vs 0.920):

Method Time (min:sec) PSNR (dB) SSIM
COIN 26:19 28.84dB 0.777
Strumpler 70:15 35.00dB 0.920
NIF (Ours) 2:42 32.33dB 0.891

At 1.0 bits-per-pixels and x4 downsampled images, our proposal NIF achieves the best results in terms of PSNR and SSIM with a large time saving (x10). It is important to note that the running time of NIF is quite similar for all the configurations we investigated, while COIN and Strumpler basic's running time drastically goes up when the image resolution increases.

Method Time (min:sec) PSNR (dB) SSIM
COIN 16:04 22.53dB 0.475
Strumpler 24:15 46.73dB 0.993
NIF (Ours) 2:27 47.53dB 0.997

Quantitative results

INR-based image compression

Our proposal NIF outperforms previous works on the field in terms of PSNR on Kodak while achieving comparable performance on CelebA.

It is fair to point out that Strumpler et al. approach achieves comparable or better PSNR when using meta-learned initializations on CelebA; this is not surprising since the base parameters are learned on the same CelebA dataset, which is characterized by a limited images variability, and therefore allows the network to leverage the consistent redundancies that are typically absent in real scenarios.

Ground Truth

CelebA #189985, 178x218

NIF (Ours), 0.65bpp

PSNR: 29.36dB, MS-SSIM: 0.96

Strumpler, 0.61bpp

PSNR: 29.70dB, MS-SSIM: 0.96

WebP, 0.66bpp

PSNR: 32.52dB, MS-SSIM: 0.97


In this CelebA picture, NIF reconstruction is notably less noisy than Strumpler's. Although WebP is able to capture some finer details it suffers from evident block artifacts while INR-based methods do not.


Comparison to traditional approaches

Kodak

On this dataset, our proposal NIF clearly outperforms JPEG and performs comparably to modern codecs, such as BPG and WebP, in terms of PSNR at low bitrates. In terms of MS-SSIM, NIF performs similarly to JPEG2000.

These results clearly reduce the gap between INR-based compression and classical methods with respect to the previous baseline given by the basic version of Strumpler et al., which is reported in the plot for reference.

Ground Truth

Kodak #8, 768x512

NIF (Ours), 0.28bpp

PSNR: 22.12dB, MS-SSIM: 0.89

Strumpler, 0.29bpp

PSNR: 22.18dB, MS-SSIM: 0.88

JPEG, 0.27bpp

PSNR: 21.33dB, MS-SSIM: 0.88

WebP, 0.27bpp

PSNR: 24.20dB, MS-SSIM: 0.93

BPG, 0.28bpp

PSNR: 25.43dB, MS-SSIM: 0.94


Although state-of-the-art methods, such as BPG and Xie et al. approach, achieve a smoother reconstruction, it is noticeable that they tend to remove a lot of small details, reconstructing some patches as uniform colour blocks. This is clear on the Kodak picture #8. In contrast, both NIF and Strumpler basic reconstruct this grainy structure, but NIF better encodes uniform patches.


Image Compression Benchmark (ICB)

On the high-resolution images ICB dataset our proposal NIF outperforms some baselines in terms of PSNR at low bitrates (0.08bpp), with a PSNR gain of +0.41dB compared with the modern WebP codec and a gain of +3.07dB against JPEG. Concerning higher bitrates (0.22bpp), the results of NIF are comparable to WebP and JPEG, but they are still outperformed by BPG.

Ground Truth

ICB deer, Crop of 4043x2641

NIF (Ours), 0.05bpp

PSNR: 29.08dB, MS-SSIM: 0.85

BPG, 0.05bpp

PSNR: 29.36dB, MS-SSIM: 0.86

WebP, 0.05bpp

PSNR: 28.66dB, MS-SSIM: 0.85

JPEG, 0.07bpp

PSNR: 28.01dB, MS-SSIM: 0.81

Ground Truth

ICB flower foveon, Crop of 2268x1512

NIF (Ours), 0.10bpp

PSNR: 39.83dB, MS-SSIM: 0.98

BPG, 0.10bpp

PSNR: 41.21dB, MS-SSIM: 0.98

WebP, 0.10bpp

PSNR: 37.08dB, MS-SSIM: 0.97

JPEG, 0.18bpp

PSNR: 38.16dB, MS-SSIM: 0.97


On these crops, NIF obtains similar visual results. In particular, in flower foveon, NIF reconstruction presents fewer blocking artifacts, which we believe is a strong positive factor of INR-based methods with respect to traditional ones. In deer, NIF tends to reconstruct grainy noise in the background while classical codecs remove these details.


These promising results suggest that INR-based methods could be used as an effective tool to compress high-resolution images.

Conclusion

We have proposed NIF, a novel image compression method based on INRs which quantitatively and qualitatively outperforms current INR-based methods. In addition, the proposed approach drastically improves the computational requirements with a speed-up of x26 for some configurations, while preserving image quality. Visual comparisons show that the decompressed images are less noisy with respect to previous INR-based methods and do not suffer from well-known distortions such as blocking artifacts, which are instead common in traditional approaches. Likewise standard codecs, the reference encoder produces stand-alone compressed files packed with metadata that can be decompressed using a compliant decoder. In future works, we plan to put efforts into improving the reconstruction capabilities and speed of these methods, along with exploring their efficiency in multiple use cases.

Citation

BibTeX
ACM Ref

The source code of this website is available on GitHub