Deep learning-based generative models are an active area of research with numerous advancements in recent years. Most widely, generative models are based on convolutional neural network (CNN) architectures. In signal and image processing tasks, such as superresolution, 3D modeling, and more, implicit neural representations (INRs) can represent an image as a continuous function of its coordinate locations where each pixel is synthesized independently. Such a function is approximated by using a deep neural network. INR provides flexibility for easy image transformations and high-resolution up-sampling through the use of a coordinate grid. Thus, INRs have become effective for 3D scene reconstruction and rendering from very few training images. INRs are usually trained to represent a single given scene, signal, or image. Recently, INRs have been implemented as a generative model to generate entire image datasets. They perform comparable to CNN-based generative models on perfectly curated datasets (e.g., on human faces). However, they have yet to be scaled to large, diverse datasets with complex imagery and a diverse set of object categories.
Researchers at Arizona State University have developed a deep learning model that generates high-quality polynomial INRs for large diverse datasets (e.g., for large diverse image datasets). This approach represents an image with a polynomial function and eliminates the need for positional encodings. This model captures high-frequency information and performs comparably to the state-of-the-art CNN-based generative models without using convolution, normalization, upsampling, or self-attention layers. The model outperforms positional embedding-based INR GAN models. Model has been demonstrated for various tasks like interpolation, style-mixing, extrapolation, high-resolution sampling, and image inversion.
Related Publication: Polynomial Implicit Neural Representations For Large Diverse Datasets
Potential Applications:
- Generative AI method for media content creation
- 3D scene reconstruction and rendering
Benefits and Advantages:
- Performs comparable to CNN-based generative models (e.g., StyleGAN-XL) with 3x to 4x fewer trainable parameters (depending on output resolution)
- Outperforms current INR models using a significantly smaller model
- Reduction in trainable parameters lends to faster training time with fewer GPU hardware resources