Forests and plantations play a vital role in carbon sequestration, yet accurately monitoring their growth remains costly and labor-intensive. Researchers have developed an advanced artificial intelligence (AI) model that produces high-resolution canopy height maps using only standard RGB imagery. By integrating a large vision foundation model with self-supervised enhancement, this method achieves near-lidar accuracy, enabling precise, low-cost monitoring of forest biomass and carbon storage over large areas.
Monitoring forest canopy structure is essential for understanding global carbon cycles, assessing tree growth, and managing plantation resources. Traditional lidar systems provide accurate height data but are limited by high costs and technical complexity, while optical remote sensing often lacks the structural precision required for small-scale plantations. Deep learning methods have improved canopy estimation but still demand massive labeled datasets and often lose fine spatial details. Moreover, global models struggle to adapt to fragmented plantation landscapes with uniform tree structures. Due to these challenges, developing a cost-effective, high-resolution, and generalizable approach for mapping canopy height and biomass has become an urgent research priority.
A joint research team from Beijing Forestry University, Manchester Metropolitan University, and Tsinghua University has developed a new artificial intelligence (AI)-driven vision model that delivers sub-meter accuracy in estimating tree heights from RGB satellite images. Published (DOI: 10.34133/remotesensing.0880) in the Journal of Remote Sensing on October 20, 2025, the study introduces a novel framework that combines large vision foundation models (LVFMs) with self-supervised learning. The approach addresses the long-standing problem of balancing cost, precision, and scalability in forest monitoring—offering a promising tool for managing plantations and tracking carbon sequestration under initiatives such as China's Certified Emission Reduction program.
The researchers created a canopy height estimation network composed of three modules: a feature extractor powered by the DINOv2 large vision foundation model, a self-supervised feature enhancement unit to retain fine spatial details, and a lightweight convolutional height estimator. The model achieved a mean absolute error of only 0.09 m and an R² of 0.78 when compared with airborne lidar measurements, outperforming traditional CNN and transformer-based methods. It also enabled over 90 % accuracy in single-tree detection and strong correlations with measured above-ground biomass (AGB). Beyond its accuracy, the model demonstrated strong generalization across forest types, making it suitable for both regional and national-scale carbon accounting.
The model was tested in the Fangshan District of Beijing, an area with fragmented plantations primarily composed of Populus tomentosa, Pinus tabulaeformis, and Ginkgo biloba. Using one-meter-resolution Google Earth imagery and lidar-derived references, the AI model produced canopy height maps closely matching ground truth data. It significantly outperformed global CHM products, capturing subtle variations in tree crown structure that existing models often missed. The generated maps supported individual-tree segmentation and plantation-level biomass estimation with R² values exceeding 0.9 for key species. Moreover, when applied to a geographically distinct forest in Saihanba, the network maintained robust accuracy, confirming its cross-regional adaptability. The ability to reconstruct annual growth trends from archived satellite imagery provides a scalable solution for long-term carbon sink monitoring and precision forestry management. This innovation bridges the gap between expensive lidar surveys and low-resolution optical methods, enabling detailed forest assessment with minimal data requirements.
"Our model demonstrates that large vision foundation models (LVFMs) can fundamentally transform forestry monitoring," said Dr. Xin Zhang, corresponding author at Manchester Metropolitan University. "By combining global image pretraining with local self-supervised enhancement, we achieved lidar-level precision using ordinary RGB imagery. This approach drastically reduces costs and expands access to accurate forest data for carbon accounting and environmental management."
The team employed an end-to-end deep-learning framework combining pre-trained LVFM features with a self-supervised enhancement process. High-resolution Google Earth imagery (2013–2020) was used as input, and UAV-based lidar data served as reference for training and validation. The model was implemented in PyTorch and trained using the fastai framework on an NVIDIA RTX A6000 GPU. Comparative experiments with conventional networks (U-Net and DPT) and global CHM datasets confirmed superior accuracy and efficiency, validating the model's potential for scalable canopy height mapping and biomass estimation.
The AI-based mapping framework offers a powerful and affordable approach for tracking forest growth, optimizing plantation management, and verifying carbon credits. Its adaptability across ecosystems makes it suitable for global afforestation and reforestation monitoring programs. Future research will extend this method to natural and mixed forests, integrate automated species classification, and support real-time carbon monitoring platforms. As the world advances toward net-zero goals, such intelligent, scalable mapping tools could play a central role in achieving sustainable forestry and climate-change mitigation.