Unexposed environments, such as lava tubes, mines, and tunnels, are among the most complex yet strategically significant domains for scientific exploration and infrastructure development. Accurate and real-time 3D meshing of these environments is essential for applications including automated structural assessment, robotic-assisted inspection, and safety monitoring. Implicit neural Signed Distance Fields (SDFs) have shown promising capabilities in online meshing; however, existing methods often suffer from large projection errors and rely on fixed reconstruction parameters, limiting their adaptability to complex and unstructured underground environments such as tunnels, caves, and lava tubes.
To address these challenges, this paper proposes ARMOR, a scene-adaptive and reinforcement learning-based framework for real-time 3D meshing in unexposed environments. ARMOR consists of three primary components: (1) a spatio-temporal geometry smoothing module that enhances point cloud density by leveraging temporal information and integrates a context-aware normal vector orientation strategy to improve SDF training in underground settings; (2) a reinforcement learning-based parameter optimization module that dynamically adjusts reconstruction parameters in response to complex environmental characteristics; and (3) ARMOR learns online parameter tuning strategies from prior point cloud datasets through simulation and practice, further improving its adaptability to unseen real environments.
The proposed method was validated across more than 3,000 meters of underground environments, including engineered tunnels, natural caves, and lava tubes. Experimental results demonstrate that ARMOR achieves superior performance in real-time mesh reconstruction, reducing geometric error by 3.96% compared to state-of-the-art baselines, while maintaining real-time efficiency. The method exhibits improved robustness, accuracy, and adaptability, indicating its potential for advanced 3D monitoring and mapping in challenging unexposed scenarios.
Architecture of ARMOR. The ARMOR pipeline begins with the preprocessing of sequential LiDAR and IMU data, followed by a spatio-temporal geometry smoothing module that leverages consistency constraints to generate high-quality normal estimations. A reinforcement learning agent then analyzes the characteristics of local neural maps from the previous state Ti-1 and computes a multi-discrete probability distribution to infer optimal mapping parameters. These parameters guide the update of the local neural map and drive the adaptive meshing process. The updated neural representation is then used as the observation for the next timestamp Ti+1, forming a continuous loop for adaptive reconstruction.
Scanblock formation process. Trajctory T(t) shows the continuous movement of LiDAR over interval [t0, tK). In each interval, consecutive scan frames will integrate through coordinate transformation, enhancing point cloud density and geometric completeness for improved normal estimation in the unexposed environment.
The architecture of agent network. The network extracts the latent local feature and then uses an MLP to select the optimal actions.
Simulation Dataset Result. Visual comparison of different methods in synthetic datasets.
SuperLoc Dataset Result. Comparative reconstruction results for the SuperLoc Cave01 dataset. The figure presents an incremental comparison among the baseline approach (PIN-SLAM), our method without reinforcement learning (w/o RL), and our complete method, with both the mesh reconstructions and corresponding error maps visualized. Our complete method demonstrates superior geometric accuracy and detail preservation in complex cave environments.
WHU-Helmet Dataset Result. Qualitative reconstruction and error visualization results on the WHU-Helmet unexposed tunnel dataset are presented. The figure shows (top) the reconstructed mesh using our method, (middle) a comparison with the ground truth, and (bottom left) a schematic of the data acquisition process.
An in-house helmet-based mapping system. To validate the effectiveness of our method, we develop a portable scanner. (a) Deployment illustration showing operator wearing the integrated helmet system during experiment; (b) Close-up view of the core mapping equipment; (c) Hardware architecture diagram highlighting key components including Livox Mid360 LiDAR, IMU, camera, WiFi and RK3588S processor.
Qualitative comparison of reconstruction results for the Xianren Lava Tube. Significant differences between our method and the PIN-SLAM baseline are highlighted by bounding boxes.
This research was inspired by several outstanding works.
PIN-SLAM is a state-of-the-art SLAM framework that seamlessly integrates implicit neural maps to achieve enhanced efficiency and accuracy.
Reinforcement Learning Meets Visual Odometry is a pioneering approach that leverages reinforcement learning to optimize visual odometry performance for robust navigation.
@article{zhang2025armor,
author = {Zhang, Yizhe and Li, Jianping and Zhao, Xin and Liang, Fuxun and Dong, Zhen and Yang, Bisheng},
title = {ARMOR: Adaptive Meshing with Reinforcement Optimization for Real-time 3D Monitoring in Unexposed Scenes},
journal = {arXiv preprint},
year = {2025},
}