Application of shale TOC prediction model using the XGBoost machine learning algorithm: a case study of the Qiongzhusi Formation in central Sichuan Basin
摘要:Total organic carbon (TOC) content, indicative of organic richness, serves as a pivotal metric for evaluating the hydrocarbon generative capacity of source rocks. A well-defined correlation exists between well-logging parameters and TOC content. However, conventional TOC prediction models, such as multiple linear regression and the Delta logR method, often exhibit lower prediction accuracy in complex geological settings. This study aims to introduce an advanced TOC content prediction approach using the Extreme Gradient Boosting (XGBoost) decision tree model, validated through application to the Cambrian Qiongzhusi formation shale in the central Sichuan Basin. Initially, the characteristics of the dataset were delineated, and XGBoost, along with two prevalent machine learning models, was selected and trained using effective training strategies. Subsequently, geological data were normalized using the Gaussian initialization method to compile the dataset, leading to the development of three predictive machine learning models for shale TOC content. Ultimately, a comparative analysis of the prediction accuracies between XGBoost and the other two models was conducted. Experimental findings indicate that the geological dataset is characterized by a limited sample size and an uneven distribution of data. Within this dataset, the training efficacy of the XGBoost algorithm markedly exceeds that of other models. Compared with prevalent models such as the K-Nearest Neighbors (KNN) and backpropagation (BP) neural network, XGBoost demonstrates remarkably superior predictive precision. This is evidenced by a high determination coefficient (R2) of up to 0.81, a comparatively lower mean squared error (MSE), and smaller error fluctuations, indicating greater accuracy and reliability of the model. Conversely, both the KNN and BP neural networks exhibit inferior predictive accuracy. The KNN model is substantially impacted by data imbalance. Moreover, disparities in distance metrics and inappropriate selections of the K-value can significantly degrade the predictive accuracy of the model. Concurrently, the BP neural network may suffer from local optima during its initialization phase due to unstable connection weights and thresholds, thereby diminishing the model's overall predictive efficacy. The findings of this study are crucial for developing an intelligent TOC content forecasting model, promoting further innovation and development of oil exploration technologies, and reducing human labor costs in oilfield exploration.
关键字:TOC prediction; Shale; Well logging; XGBoost; Qiongzhusi formation; Central Sichuan Basin
ISSN号:0891-2556
卷、期、页:卷: 40期: 1
发表日期:2025-02-01
期刊分区(SCI为中科院分区):四区
收录情况:SCI(科学引文索引印刷版),SCIE(科学引文索引网络版)
发表期刊名称:CARBONATES AND EVAPORITES
参与作者:张本健,武鲁亚
通讯作者:吴秋龙,马奎友,火勋港
第一作者:庞宏,姜福杰,陈君青
论文类型:期刊论文
论文概要:吴秋龙,庞宏,张本健,姜福杰,武鲁亚,陈君青,马奎友,火勋港,Application of shale TOC prediction model using the XGBoost machine learning algorithm: a case study of the Qiongzhusi Formation in central Sichuan Basin,CARBONATES AND EVAPORITES,2025,卷: 40期: 1
论文题目:Application of shale TOC prediction model using the XGBoost machine learning algorithm: a case study of the Qiongzhusi Formation in central Sichuan Basin