不同特征选择方法于区域地震滑坡危险性预测结果的差异性分析——以汶川地震区为例

黑龙江科技大学建筑工程学院,哈尔滨 150022

汶川地震;地震滑坡危险性;主成分分析;Gini指数;人工神经网络

Difference in Regional Seismic Landslide Risk Prediction Results Based on Different Feature Selection Methods—A Case Study of Wenchuan Earthquake Area
AI Xiao,ZHANG Jian,FU Jimin

School of Architecture and Civil Engineering,Heilongjiang University of Science and Technology, Harbin 150022, China

Wenchuan earthquake; Seismic landslide risk; Principal component analysis; Gini index; Artificial neural network

DOI: 10.13512/j.hndz.2024.02.06

备注

区域性地震滑坡危险性评估模型是一个用于评定特定地区在地震发生时滑坡概率及其危害程度的关键工具。目前,以机器学习为代表的数学建模方法成为了构建该评估模型的主要手段。但是,由于影响因子自身的复杂性与多样性所产生的评估模型预测结果的差异性这一问题却少有研究。此次研究结合汶川地震区内11个影响因子,分别运用相关性系数、主成分分析及Gini指数三种特征选择方法形成三类数据集。结合人工神经网络模型构建了该区域地震滑坡危险性评估模型,并详细地分析了上述三类方法形成的数据集构建的评估模型于预测结果的差异性。结果表明:基于主成分分析法形成的数据集构建的评估模型,对于非常高危险性等级区域的划分精度最高,并且,频率比精度达到了92%,ROC曲线的预测精度达到了93.3%,预测精度均为三组评估模型中的最高值。此次研究旨在为相关研究人员在地震滑坡危险性评估模型的构建方面提供一定的思路,并为后续综合多个地震区、多组特征组成的不同维度的数据集构建一个具有普适性的特征选择方法提供一定的理论基础。
The regional seismic landslide risk assessment model is a key tool for evaluating the probability and sever⁃ity of landslides in specific areas when an earthquake occurs. Currently, machine learning-based mathematical modeling methods have become the primary means to construct the assessment model. However, limited research has been conducted on the difference in prediction results of the assessment model caused by the complex and di⁃verse nature of influencing factors. This study considered 11 influencing factors in the Wenchuan earthquake area and used three feature selection methods,namely correlation coefficient,principal component analysis,and Gini index,to create three types of datasets. Combined with the artificial neural network model,seismic landslide risk assessment models for the Wenchuan earthquake area were constructed based on different datasets obtained by the above three methods and the difference in the prediction results was meticulously analyzed. The results indicate that the assessment model based on the datasets obtained by the principal component analysis method achieves the high⁃est accuracy in identifying areas with a very high risk level. In addition,it demonstrates a frequency ratio accuracy of 92% and a prediction accuracy of the receiver operating characteristic (ROC) curve of 93.3%. Therefore,it ex⁃hibits the highest prediction accuracy among the three groups of assessment models. This research aims to provide valuable insights for researchers involved in the construction of seismic landslide risk assessment models. Addition⁃ally,it provides a theoretical basis for developing a universal feature selection method that integrates multidimen⁃sional datasets from multiple seismic regions and sets of features.
·