[1]张世超,王建宾,孟 浩.基于语义特征和TextRank算法的科研成果论文中文文本关键词提取方法[J].华南地震,2025,(03):188-194.[doi:10.13512/j.hndz.2025.09.16]
 ZHANG Shichao,WANG Jianbin,MENG Hao.A Keyword Extraction Method for Chinese Text of Scientific Research Papers Based on Semantic Features and TextRank Algorithm[J].,2025,(03):188-194.[doi:10.13512/j.hndz.2025.09.16]
点击复制

基于语义特征和TextRank算法的科研成果论文中文文本关键词提取方法()
分享到:

华南地震[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2025年03期
页码:
188-194
栏目:
应急管理与实践
出版日期:
2025-09-30

文章信息/Info

Title:
A Keyword Extraction Method for Chinese Text of Scientific Research Papers Based on Semantic Features and TextRank Algorithm
文章编号:
1001-8662(2025)03-0188-7
作者:
张世超1王建宾2孟 浩2
1.山东思极科技有限公司,济南 250001;2.国网山东省电力公司,济南 250001
Author(s):
ZHANG Shichao1WANG Jianbin2MENG Hao2
1.Shandong SGIT Technology Co., Ltd., Jinan 250001, China;2.State Grid Shandong Electric Power Company , Jinan 250001, China
关键词:
语义特征TextRank算法科研成果论文中文文本关键词提取卷积神经网络
Keywords:
Semantic feature TextRank algorithm Scientific research paper Chinese text Keyword extractionConvolutional neural network
分类号:
TP391
DOI:
10.13512/j.hndz.2025.09.16
文献标志码:
A
摘要:
为准确提取科研成果论文中文文本关键词,并准确排列,研究基于语义特征和TextRank算法的科研成果论文中文文本关键词提取方法。基于语义特征的科研成果论文中文文本候选关键词筛选方法,在Word2Vec工具中,将中文文本转换为词向量,作为论文中文文本语义特征;将语义特征输入卷积神经网络中,以分类的方式,提取属于候选关键词类型的语义特征,将其所属文本词语作为候选关键词;通过基于TextRank算法的科研成果论文中文文本关键词提取方法,在候选关键词中,以候选关键词的平均信息熵、词性、位置三种特征,为关键词提取指标,构建提取关键词的图模型,运算候选关键词综合权重,以从大到小的方式排列候选关键词,将排名靠前的候选关键词,作为最终提取的关键词,完成科研成果论文中文文本关键词提取。经测试,此方法可提高科研成果论文中文文本关键词提取精度、提高关键词排名准确性。
Abstract:
To accurately extract and arrange keywords from the Chinese text of scientific research papers, a keyword extraction method for Chinese text of scientific research papers based on semantic features and the TextRank algorithm was studied. A semantic feature-based method for selecting candidate keywords from Chinese text of scientific research papers was used. In the Word2Vec tool,the Chinese text was converted into a word vector as the semantic features of the Chinese text of the paper. The semantic features were input into convolutional neural networks, and the semantic features belonging to candidate keyword types were extracted through classification. The text words they belong to were used as candidate keywords. By using the TextRank algorithm-based keyword extraction method for Chinese text of scientific research papers, a graph model for extracting keywords was constructed by using the average information entropy, part of speech, and position of the candidate keywords as the keyword extraction indicators. The comprehensive weights of the candidate keywords were calculated,and the candidate keywords were arranged in descending order. The top-ranked candidate keywords were used as the final extracted keywords to complete keyword extraction from Chinese text of scientific research papers. The tests show that this method can improve the accuracy of keyword extraction and keyword ranking in Chinese text of scientific research papers.

参考文献/References:

[1]王晓宇,王芳.基于语义文本图的论文摘要关键词抽取算法[J].情报学报,2021,40(08):854-868.
[2]叶子诚,闫桂英.基于图模型的关键词提取算法研究[J].系统科学与数学,2021,41(04):967-975.
[3]孙新,盖晨,申长虹,等.基于短语向量和主题加权的关键词抽取方法[J].电子学报,2021,49(09):1682-1690.
[4]杨朝举,葛唯益,王羽,等.KEK:基于k-truss的短文本关键词提取方法[J].计算机应用研究,2021,38(04):1022-1026+1032.
[5]祖弦,谢飞,刘啸剑.融合词和文档嵌入的关键词抽取算法[J].计算机科学与探索,2021,15(02):294-304.
[6]魏玉梅,滕广青,马卓,等.基于网络全局结构关系的领域重要关键词提取与分析[J].图书馆杂志,2021,40(02):20-28.
[7]李昭奇,黎塔.基于wav2vec预训练的样例关键词识别[J].计算机科学,2022,49(01):59-64.
[8]毛湘科,黄少滨,余秦勇.一种基于图的文档关键词和摘要协同抽取方法研究[J].计算机科学,2021,48(10):44-50.
[9]孙佳佳.融合多维度属性的重要关键词识别方法研究[J].情报理论与实践,2022,45(07):188-195.
[10]王永剑,孙亚茹,杨莹.自适应短文本关键词生成模型[J].北京航空航天大学学报,2022,48(02):199-208.
[11]于尊瑞,毛震东,王泉,等.基于预训练语言模型的关键词感知问题生成[J].计算机工程,2022,48(02):125-131.
[12] 陈可嘉,黄思翌.中文短文本自动关键词提取的改进RAKE算法[J].小型微型计算机系统,2021,42(06):1171-1175.
[13]毛湘科,黄少滨,余秦勇.一种基于图的文档关键词和摘要协同抽取方法研究[J].计算机科学,2021,48(10):44-50.
[14]毛立琦,石拓,吴林,等.基于领域自适应的无监督文本关键词提取模型——以“人工智能风险”领域文本为例[J].情报理论与实践,2022,45(03):182-187.
[15]李楚贞,吴新玲,余育文.复杂文本多标签分类算法的设计与仿真[J].计算机仿真,2022,39(05):299-303.

备注/Memo

备注/Memo:
收稿日期:2024-08-31
基金项目:国网山东省电力公司科技项目(2024A-158)
作者简介:张世超(1988-),男,高级工程师,主要研究方向为网络安全、人工智能。E-mail:ayleeiyuan@sina.com
更新日期/Last Update: 2025-09-30