avatar
Articles
296
Tags
89
Categories
6
Home
Archives
Tags
Categories
Link
About
Others
  • Music
  • Gallery
Vines' LogPaper Review - Audio-Visual Related Research (WIP)
Search
Home
Archives
Tags
Categories
Link
About
Others
  • Music
  • Gallery

Paper Review - Audio-Visual Related Research (WIP)

Created2025-02-25|ML/CV/NLP
|Post Views:
Author: Vines
Link: http://vinesmsuic.github.io/paper-survey-av/
Copyright Notice: All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.
Literature Review
cover of previous post
Previous
Implementing RAG for Code Library Documentation
I tried to implement RAG for Code Library Documentation. This note help me to remind the important steps in setting up a RAG.
cover of next post
Next
Paper Review - Pinal for De novo Protein Design
Quick notes on Pinal
Related Articles
cover
2022-02-11
Paper Review - AnimeGAN
Studying image-to-image translation. Overview of 2019 ISICA paper "AnimeGAN - A Novel Lightweight GAN for Photo Animation".
cover
2022-01-21
Paper Review - CartoonGAN
Studying image-to-image translation. Overview of 2018 CVPR paper "CartoonGAN- Generative Adversarial Networks for Photo Cartoonizations".
cover
2022-08-18
Paper Review - MUNIT
Studying image-to-image translation. Overview of 2018 ECCV paper "Multimodal Unsupervised Image-to-Image Translation".
cover
2022-08-18
Paper Review - Pix2Pix, CycleGAN
Studying image-to-image translation. Overview of 2017 CVPR paper "Image-to-Image Translation with Conditional Adversarial Networks" and 2017 ICCV paper "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks".
cover
2022-01-21
Paper Review - White-box Cartoonization
Studying image-to-image translation. Overview of 2020 CVPR paper "Learning to Cartoonize Using White-box Cartoon Representations".
cover
2024-06-19
Paper Review - AlphaFold2 and AlphaFold3
Let's try to figure out whats inside AlphaFold! AlphaFold can accurately predict structures of biomolecular interactions.
avatar
Vines
Vines' Learning Journey
Articles
296
Tags
89
Categories
6
Announcement
Breaking Change - :year/:month/:day/:title/ => :title/
Contents
  1. 1. Sounding Video Editing
    1. 1.1. Why does it matters
    2. 1.2. Challenges and Opportunities
    3. 1.3. What people have do previously that can be used
    4. 1.4. What editing tasks need to change both visuals and audio
      1. 1.4.1. Edit both visuals and audio
        1. 1.4.1.1. 1. Object-Level Editing
        2. 1.4.1.2. 2. Character & Face Editing
        3. 1.4.1.3. 3. Scene & Environment Editing
        4. 1.4.1.4. 4. Motion Editing
        5. 1.4.1.5. 5. Audio-Visual Synchronization
  2. 2. Any-length Video Editing
    1. 2.1. VideoPainter
      1. 2.1.1. Method
    2. 2.2. DynVFX, No code
      1. 2.2.1. Method
      2. 2.2.2. Insight
  3. 3. Text-to-Sound Effect (Foley)
  4. 4. Video-To-Audio
    1. 4.1. LVAS-Agent (2025)
    2. 4.2. MMAudio (CVPR 2025)
    3. 4.3. MultiFoley (CVPR 2025)
    4. 4.4. Diff-Foley (NeurIPS 2023)
  5. 5. Binaural Audio Generation based on Video
    1. 5.1. CCStereo (2025)
    2. 5.2. PseudoBinaural (CVPR 2021)
  6. 6. Audio Editing Based on Visuals
    1. 6.1. AVEdit (ACCV 2024), No Code
      1. 6.1.1. Method
      2. 6.1.2. Limitations
      3. 6.1.3. Other Contributions
  7. 7. Text-guided Audio Editing
    1. 7.1. WavCraft (ICLR 2024 Workshop on LLM Agents)
      1. 7.1.1. Method
      2. 7.1.2. Tasks
      3. 7.1.3. Code Study
    2. 7.2. ZETA (ICML 2024)
      1. 7.2.1. Code Study
    3. 7.3. PPAE (ICML 2024), No Code
    4. 7.4. AUDIT (NeurIPS 2023)
      1. 7.4.1. Method
      2. 7.4.2. Limitations
      3. 7.4.3. Baseline Systems
      4. 7.4.4. Code Study
©2019 - 2025 By Vines
Framework Hexo 5.4.0|Theme Butterfly 5.3.3
The journey is many times better than the end.
Search
Loading Database