HMMER Extractor:基于隐马尔可夫模型识别基因组大分子代谢物的辅助工具包

HMMER-Extractor: an auxiliary toolkit for identifying genomic macromolecular metabolites based on Hidden Markov Models

ES评分 0

DOI 10.1016/j.ijbiomac.2024.137666
刊名
年,卷(期) 2024, 283(Pt 2)
作者
作者单位 1 山西医科大学;
2 北京协和医院

摘要
人体微生物组包含各种具有重要生物学功能的微生物大分子。隐马尔可夫模型(HMM)可以克服具有距离关系的低相似性序列的问题,并在各种序列比对软件中得到广泛应用。然而,基于HMM的序列比对可以产生大量结果,如何快速筛选和批量提取微生物组中的靶同源物是主要的难点。有必要开发一个集成的基因过滤器和提取管道,以快速准确地筛选同源物。在这里,我们介绍了用于氨基酸或核苷酸序列提取的HMMER Extractor,它是一个通过提供过滤分数和迭代关键字匹配(IKM)逻辑的支持工具包。为了使其更加用户友好和易于访问,我们进一步提出了一个可视化的web服务器平台。交互式HTML输出提供了一种用户友好的方式来浏览同源注释和序列提取。网络服务器为社区提供了一个简化和用户友好的界面来分析微生物组。通过HMMER Extractor,我们基于46699个人类肠道细菌基因组构建了大分子代谢产物三甲胺(TMA)和脂多糖(LPS)的心血管疾病相关基因数据集。分别鉴定出约21014和1961株细菌菌株含有TMA的cnt或切割操纵子和LPS的waa基因簇。大肠杆菌在所有细菌种类中所占比例最大,属于厚壁菌门。HMMER Extractor工具包是一个集成的管道,已被证明在从微生物基因组中提取目标大分子编码基因方面准确快速。
Abstract
Human microbiome contains various microbial macromolecules with important biological functions. The Hidden Markov Models (HMMs) can overcome the problem of low similarity sequences with distant relationships and are widely implemented within various sequence alignment softwares. However, the HMM-based sequence alignments can generate a large number of results, how to quickly screen and batch extract target homologs from microbiomes is the major sticking points. It is necessary to develop an integrated gene filter and extraction pipeline to quickly and accurately screen homologs. Here, we introduced the HMMER-Extractor for amino acids or nucleotide sequences extraction, which was a supporting toolkit through provided filtering scores and an iterative keyword matching (IKM) logic. To make it more user-friendly and accessible, we further presented a visualized web server platform. An interactive HTML output provided a user-friendly way to browse homologous annotations and sequence extraction. The web server provided the community with a streamlined and user-friendly interface to analyze microbiomes. Through the HMMER-Extractor, we constructed a cardiovascular disease related gene dataset of the macromolecular metabolite trimethylamine (TMA) and lipopolysaccharide (LPS) based on 46,699 bacterial genomes from human gut. Approximately 21,014 and 1961 bacterial strains were identified to contain the cnt or cut operon of TMA, and the waa gene cluster of LPS, respectively. The Escherichia coli occupied the largest proportion among all the bacterial species, which belonged to the phyla Firmicutes. The HMMER-Extractor toolkit is an integrated pipeline and has been proven to be accurate and fast in extracting target macromolecular encoding genes from microbial genomes.
关键词
隐马尔可夫模型;关键词逻辑;大分子代谢物;同源基因;阈值
KeyWord
Hidden Markov Models (HMMs); Keyword logic; Macromolecular metabolites; Orthologous genes; Threshold
基金项目
页码 137666-
  • 参考文献
  • 相关文献
  • 引用本文

Jing Yang , Siqi Sun , Ning Sun , Li Lu , Chengwu Zhang , Wanyu Shi , Yunhe Zhao , Shulei Jia. HMMER Extractor:基于隐马尔可夫模型识别基因组大分子代谢物的辅助工具包 [J]. Int J Biol Macromol. 2024; 283; (Pt 2). 137666 - .

  • 文献评论

相关学者

相关机构