Freeze-Frame with StaticNeRF: Uncertainty-Guided NeRF Map Reconstruction in Dynamic Scenes

Abstract

Recent advances in neural representations have shown great promise for enabling high-fidelity dense mapping in robotics. Given the inherently dynamic nature of real-world environments, many studies have focused on learning static scene representations from dynamic observations. However, existing methods often fail to remove subtly moving objects and struggle to recover occluded static backgrounds, leading to critical limitations in practice. Furthermore, when static neural maps are used for localization, dynamic content in query images must be handled effectively. To overcome these challenges, we propose a static neural mapping framework that is robust to diverse dynamic environments and capable of processing dynamic content during localization. We evaluate our approach through extensive experiments on both public and in-house datasets. Our method improves both dynamic object removal and localization robustness under dynamic conditions, representing a significant step toward resilient robot navigation in real-world environments.

WildGaussians overview — **Overview of our proposed static neural rendering pipeline.** The top row illustrates stage-wise scene evolution during training, and the bottom row presents the corresponding method components aligned with each stage. Our framework follows a curriculum learning strategy consisting of three functional stages: (1) **Initial Training** involves learning coarse geometric and photometric representations through uniform sampling. (2) **Uncertainty Compensation** incorporates a CNN-based uncertainty network to address ambiguities that NeRF alone cannot resolve, enabling better disentanglement of static and transient fields through joint optimization. (3) **High-fidelity Rendering** adopts a data-driven sampling strategy and focuses on high-quality rendering.

Static Neural Representation

We compared our method against baseline methods and demonstrated that our approach reliably removes dynamic objects while maintaining high rendering quality.

OursNeRF-W

OursWildGaussians

OursNeRF On-the-go

In addition, we observed that our method performs consistently well across a variety of datasets.

InputOurs

We simulated low-light environments through pixel scaling during preprocessing, and trained an appearance embedding to capture illumination variations. Based on this design, we demonstrated that our method achieves robust dynamic object removal even under diverse lighting conditions.

Normal

Low Light

Darkest

Acknowledgements

We sincerely thank the creators of the Replica, On-the-go, Wild-SLAM, and Bonn datasets for providing high-quality resources that made our evaluations possible. We also acknowledge the authors of NeRF-W and the other baseline methods for their great work and for openly releasing their code, which significantly supported our study. Finally, we would like to express our gratitude to the SPARO lab members for their support and effort during the collection of our own dataset. ❤️