Video Dubbing with SoftVC VITS Singing Voice Conversion

This is a deep-learning-based tool to clone the voice of a singer/narrator from a source video.

It uses vocal-remover to remove the voice from the source video, and then uses SoftVC VITS Singing Voice Conversion to convert the voice.

Installation

Requirements

Docker (https://docs.docker.com/get-docker/)
A pretrained so-vits-svc model (https://huggingface.co/models?search=so-vits-svc)
The pretrained vocal-remover model (https://github.com/tsurumeso/vocal-remover/)

Setup

Clone this repository with submodules

git clone --recursive https://gogs.justprojects.de/subDesTagesMitExtraKaese/video-dubbing-svc.git
cd video-dubbing-svc

Create the data folder

mkdir -p data/output data/ingest data/models

Download the pretrained so-vits-svc model and place it in the data/models folder
- The path to the G_*.pth file should be given as the MODEL_PATH environment variable
- The path to the config.json file should be given as the MODEL_CONFIG_PATH environment variable

Download the vocal-remover release and copy the pretrained model models/baseline.pth into the vocal-remover/models folder

curl https://github.com/tsurumeso/vocal-remover/releases/download/v5.1.0/vocal-remover-v5.1.0.zip -o /tmp/vocal-remover.zip
unzip /tmp/vocal-remover.zip -d /tmp/vocal-remover
cp /tmp/vocal-remover/models/baseline.pth vocal-remover/models/
rm -rf /tmp/vocal-remover /tmp/vocal-remover.zip

Build the docker image
```
docker compose build
```
Insert your source video into the data/ingest folder
Run the docker image
```
docker compose up
```
The output video will be in the data/output folder