Accompanied by the continuous declines of PM2.5, O3pollution has become increasingly prominent and has been targeted by the Government of China to protect climate, ecosystem, and human health. Although satellite retrievals of column O3have been operated for decades and nationwide monitoring of ground-level O3has been offered since 2013 in China, climatological variability of ground-level O3remains unknown, which impedes understanding of the long-term driver and impacts of O3pollution in China. Here we develop an eXtreme Gradient Boosting (XGBoost) model integrating high-resolution meteorological data, satellite retrievals of trace gases, etc. to provide reconstructed daily ground-level O3 over 2005–2021 in China. Model validation confirms the robustness of this dataset, with R2of 0.89 for sample-based cross-validation. The accuracy of the long-term variations has also been confirmed with independent historical observations covering the same period from urban, rural and background sites. Our dataset covers the long time period of 2005–2021 with 0.1°×0.1°gap-free grids, which can facilitate climatological, ecological, and health research.
| collect time | 2005/01/01 - 2021/12/31 |
|---|---|
| collect place | China |
| data size | 100.8 MiB |
| data format | nc |
| Coordinate system |
The hourly observation data of ground ozone in Chinese Mainland from 2013 to 2021 in this study are from the network of China National Environmental Monitoring Center. Starting from about 900 monitoring stations in 2013, to about 1600 monitoring stations in 2021. We excluded negative values of O3and then calculated the daily maximum 8-hour average concentration (MDA8) of O3 at each monitoring point. Due to the influence of emissions (anthropogenic and natural) and meteorological conditions on the abundance of O3in the troposphere, meteorological variables, anthropogenic emission inventories, altitude, land use, normalized difference vegetation index (NDVI), etc. are used as input variables for the machine learning model.
We employed the extreme gradient boosting (XGBoost) (Chen and Guestrin 2016) algorithm to predict ground-level ozone (O3) concentrations using a set of related predictor variables. XGBoost is a highly efficient machine learning algorithm based on gradient tree boosting and has been widely applied in many tasks. Previously, we adopted it to correct systematic bias of chemical transport model (Yin et al., 2021). XGBoost is one of the ensemble learning techniques that combine several weak models (e.g., decision trees) to generate a strong model for better performance. The combination ways in ensemble learning To evaluate the overall model performance in estimating daily MDA8 O3concentrations, we adopted both sample-based (out-of-sample) and station-based (out-of-station) 10-fold cross-validation (CV).
The data quality is good.
This work is licensed under a
Creative
Commons Attribution 4.0 International License.
| # | title | file size |
|---|---|---|
| 1 | _ncdc_meta_.json | 5.5 KiB |
| 2 | monthly_O3.zip | 92.0 MiB |
| 3 | yearly_O3.zip | 8.8 MiB |
| # | category | title | author | year |
|---|---|---|---|---|
| 1 | paper | Reconstructed daily ground-level O3 in China over 2005--2021 for climatological, ecological, and health research | C,Zhou,F,Wang,Y,Guo,C,Liu,D,Ji,Y,Wang,X,Xu,X,Lu,G,Carmichael,M,Gao | 2022 |
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
©Copyright 2005-. Northwest Institute of Eco-Environment and Resources, CAS.
Donggang West Road 320, Lanzhou, Gansu, China (730000)

