Fields for Learning a Natural Illumination Prior

^{1} The University of York
^{2} Friedrich-Alexander-Universität Erlangen-Nürnberg

Inverse rendering is an ill-posed problem. Previous work has sought to resolve this by focussing on priors for object or scene shape or appearance. In this work, we instead focus on a prior for natural illuminations. Current methods rely on spherical harmonic lighting or other generic representations and, at best, a simplistic prior on the parameters. We propose a conditional neural field representation based on a variational auto-decoder with a SIREN network and, extending Vector Neurons, build equivariance directly into the network. Using this we develop a rotation-equivariant, high dynamic range neural illumination model that is compact and able to express complex, high-frequency features of natural environment maps. Training our model on a curated dataset of 1.6K HDR environment maps of natural scenes, we compare it against traditional representations, demonstrate its applicability for an inverse rendering task and show environment map completion from partial observations. Code, trained models and the dataset are available at the links above.

We extend Vector Neurons to a neural field representation for spherical images, constructing a generative model of spherical signals that is rotation-equivariant with respect to the latent representation of the signal. Since our neural field is equivariant we do not need to augment our training data over the space of rotations. Observing a spherical signal once means we can reconstruct it with the same accuracy in any rotation.

Applying our rotation-equivariant conditional spherical neural field we create a statistical model of natural illumination. Natural environments have a canonical “up” direction (defined by gravity) but arbitrary rotation about this vertical axis. We therefore restricted our invariance to rotations only about the vertical (y)-axis. Training on our curated dataset of 1.6K HDR equirectangular images of natural scenes we demonstrate better performance than both Spherical Harmonics and Spherical Gaussians at reconstructing natural illumination environments using a latent dimensionality of equal size.

Sampling from and interpolating through RENI's latent space will only produce plausible illumination environments. This can be used to constrain inverse rendering problems or as a generator of realistic synthetic data.

We implemented a normalised Blinn-Phong environment map shader in PyTorch3D, enabling fully differentiable rendering. Fitting RENI to a render of a 3D object with fixed geometry, pose, camera and material parameters such that only lighting in the scene is unknown.

RENI can hallucinate plausible completions of the environments when provided only a small cutout in it's training loss, making sensible estimations about the possible colours and shapes of land and sky and often predicts accurate sun locations despite the sun being outside the image crop.

We have curated a dataset of 1694 HDR equirectangular images of outdoor natural illumination environment obtained with either a CC0 1.0 public domain license [Poly-Haven, iHDRI, GiantCowFilms] or with written permission to redistribute a low-resolution version of their dataset [HDRI Skies, Textures.com, HDRMaps, Whitemagus 3D]. The dataset can be downloaded here.

```
@misc{https://doi.org/10.48550/arxiv.2206.03858,
title = {Rotation-Equivariant Conditional Spherical Neural Fields for Learning a Natural Illumination Prior},
author = {Gardner, James A. D. and Egger, Bernhard and Smith, William A. P.},
publisher = {arXiv},
year = {2022},
doi = {10.48550/ARXIV.2206.03858},
url = {https://arxiv.org/abs/2206.03858}
}
```