README.md 1.9 KB

Video Dubbing with SoftVC VITS Singing Voice Conversion

This is a deep-learning-based tool to clone the voice of a singer/narrator from a source video.

It uses vocal-remover to remove the voice from the source video, and then uses SoftVC VITS Singing Voice Conversion to convert the voice.

Installation

Requirements

Setup

  1. Clone this repository with submodules

    git clone --recursive https://gogs.justprojects.de/subDesTagesMitExtraKaese/video-dubbing-svc.git
    cd video-dubbing-svc
    
  2. Create the data folder

    mkdir -p data/output data/ingest data/models
    
  3. Download the pretrained so-vits-svc model and place it in the data/models folder

  4. Download the vocal-remover release and copy the pretrained model models/baseline.pth into the vocal-remover/models folder

    curl https://github.com/tsurumeso/vocal-remover/releases/download/v5.1.0/vocal-remover-v5.1.0.zip -o /tmp/vocal-remover.zip
    unzip /tmp/vocal-remover.zip -d /tmp/vocal-remover
    cp /tmp/vocal-remover/models/baseline.pth vocal-remover/models/
    rm -rf /tmp/vocal-remover /tmp/vocal-remover.zip
    
  5. Build the docker image

    docker compose build
    
  6. Insert your source video into the data/ingest folder

  7. Run the docker image

    docker compose up
    
  8. The output video will be in the data/output folder