AudioToken:Text-to-Image Generation 모델을 사용한 Audio-to-Image Generation
Sep 16, 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation Overview Pre-trained text-to-image generation model (Stable Diffusion) Pre-trained audio-encoder (BEATs) MLP + Attentive Pooling 학습시킴 (by conditional diffusion loss + classification loss) Contribution Pre-trained text-conditioned generative model을 이용해서 audio-based conditioning 구현 이를 위한 새로운 evaluation framework 제시 Audio-Image Similarity, Image-Image Similarity,...