# Video Dubbing with SoftVC VITS Singing Voice Conversion This is a deep-learning-based tool to clone the voice of a singer/narrator from a source video. It uses [vocal-remover](https://github.com/tsurumeso/vocal-remover) to remove the voice from the source video, and then uses [SoftVC VITS Singing Voice Conversion](https://github.com/voicepaw/so-vits-svc-fork) to convert the voice. ## Installation ### Requirements - Docker (https://docs.docker.com/get-docker/) - A pretrained *so-vits-svc* model (https://huggingface.co/models?search=so-vits-svc) - The pretrained *vocal-remover* model (https://github.com/tsurumeso/vocal-remover/) ### Setup 1. Clone this repository with submodules ```bash git clone --recursive https://gogs.justprojects.de/subDesTagesMitExtraKaese/video-dubbing-svc.git cd video-dubbing-svc ``` 2. Create the `data` folder ```bash mkdir -p data/output data/ingest data/models ``` 3. Download the pretrained *so-vits-svc* model and place it in the `data/models` folder - The path to the `G_*.pth` file should be given as the `MODEL_PATH` [environment variable](./docker-compose.yml) - The path to the `config.json` file should be given as the `MODEL_CONFIG_PATH` [environment variable](./docker-compose.yml) 4. Download the *vocal-remover* release and copy the pretrained model `models/baseline.pth` into the `vocal-remover/models` folder ```bash curl https://github.com/tsurumeso/vocal-remover/releases/download/v5.1.0/vocal-remover-v5.1.0.zip -o /tmp/vocal-remover.zip unzip /tmp/vocal-remover.zip -d /tmp/vocal-remover cp /tmp/vocal-remover/models/baseline.pth vocal-remover/models/ rm -rf /tmp/vocal-remover /tmp/vocal-remover.zip ``` 5. Build the docker image ```bash docker compose build ``` 6. Insert your source video into the `data/ingest` folder 7. Run the docker image ```bash docker compose up ``` 8. The output video will be in the `data/output` folder