登录 | 注册
用于文本引导音效生成特征增强扩散模型
It is used to generate feature enhancement diffusion model for text-guided sound effects
ES评分 0 浏览量:61 下载量:0
湖南工商大学 湖南长沙
[1] 刘天羽. 物理建模合成在游戏音效制作中的应用研究——以水流声合成为例[J]. 电声技术, 2022, 46 (11): 45-48.
[2] 王珏,李洽楠.AI音频技术在电影对白和音效制作中的应用探究[J].现代电影技术,2024,(12):13-21.
[3] Yang D, Yu J, Wang H, et al. Diffsound: Discrete diffusion model for te-xt-to-sound generation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1720-1733
[4] Liu H, Chen Z, Yuan Y, et al. Audioldm: Text-to-audio generation with latent diffusion models[J]. arXiv preprint arXiv:2301.12503, 2023.
[5] Sohl-Dickstein J, Weiss E, Maheswaranathan N, et al. Deep unsupervised learning using nonequilibrium thermodynamics[C]//International conference on machine learning. pmlr, 2015: 2256-2265.
[6] Oord A, Li Y, Vinyals O. Representation learning with contrastive predictive coding[J]. arXiv preprint arXiv: 1807. 03748, 2018.
[7] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.
[8] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4681-4690.
[9] Kim C D, Kim B, Lee H, et al. Audiocaps: Generating captions for audios in the wild[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 119-132.
[10] Kreuk F, Synnaeve G, Polyak A, et al. Audiogen: Textually guided audio generation[J]. arXiv preprint arXiv: 2209. 15352, 2022.
苗向阳. 用于文本引导音效生成特征增强扩散模型 [J]. 国际计算机科学进展. 2025; 5; (1). 18 - 22.
Copyright © 2023 CSCIED科技核心评价数据库 版权所有 京ICP备
Email:info@cscied.com网址:www.cscied.com
互联网出版许可证违法和不良信息举报中心举报邮箱:jubao@cscied.com