实时特征湖在大规模风控中的落地:高并发写入与低时延召回

Application of real-time feature lakes in large-scale risk control: high-concurrency writing and low-latency recall

ES评分 0

DOI 10.12208/j.jer.20250394
刊名
Journal of Engineering Research
年,卷(期) 2025, 4(9)
作者
作者单位

Devz AI technologies Inc., America

摘要
随着电子商务与金融业务的快速发展,实时风控系统对数据处理的时效性、一致性与可扩展性提出了更高要求。传统基于Hive、Kafka和Flink的流批分离架构在特征管理、数据更新与查询效率方面存在明显瓶颈。本文基于本人在多家互联网企业的大数据与风控平台的实战建设经验,提出并实践了一种基于Apache Paimon 构建的实时特征湖架构,实现了高并发数据写入与低时延特征召回的平衡。该系统已在一些全球电商的风控场景中成功落地,支持日均千亿级事件处理,特征更新延迟降至秒级,查询性能提升3倍以上,为大规模实时风控提供了可复用的架构范式。本文详细阐述了系统架构设计、关键技术选型与优化策略,并对落地过程中的挑战与解决方案进行了深入总结。
Abstract
With the rapid development of e-commerce and financial services, real-time risk control systems have placed higher demands on data processing timeliness, consistency, and scalability. Traditional stream-batch separation architectures based on Hive, Kafka, and Flink exhibit significant bottlenecks in feature management, data updates, and query efficiency. Building on my hands-on experience in constructing big data and risk control platforms for multiple internet enterprises, this paper proposes and implements a real-time feature lake architecture based on Apache Paimon. This architecture achieves a balance between high-concurrency data writes and low-latency feature recall. Successfully deployed in risk control scenarios for global e-commerce platforms, the system supports daily processing of hundreds of billions of events, reduces feature update latency to seconds, and enhances query performance by over threefold, providing a reusable architectural paradigm for large-scale real-time risk control. The paper elaborates on system architecture design, key technology selection and optimization strategies, and provides an in-depth summary of challenges and solutions encountered during implementation.
关键词
实时特征湖;风控系统;Apache Paimon;高并发写入;低时延查询;流批一体;特征工程
KeyWord
Real-time feature lake; Risk control system; Apache Paimon; High-concurrency writes; Low-latency queries; Stream-batch integration; Feature engineering
基金项目
页码 18-22
  • 参考文献
  • 相关文献
  • 引用本文

刘超. 实时特征湖在大规模风控中的落地:高并发写入与低时延召回 [J]. 工程学研究. 2025; 4; (9). 18 - 22.

  • 文献评论

相关学者

相关机构