# Video Dubbing with SoftVC VITS Singing Voice Conversion

This is a deep-learning-based tool to clone the voice of a singer/narrator from a source video.

It uses [vocal-remover](https://github.com/tsurumeso/vocal-remover) to remove the voice from the source video, and then uses
[SoftVC VITS Singing Voice Conversion](https://github.com/voicepaw/so-vits-svc-fork) to convert the voice.

## Installation

### Requirements

- Docker (https://docs.docker.com/get-docker/)
- A pretrained *so-vits-svc* model (https://huggingface.co/models?search=so-vits-svc)
- The pretrained *vocal-remover* model (https://github.com/tsurumeso/vocal-remover/)

### Setup

1. Clone this repository with submodules
   ```bash
   git clone --recursive https://gogs.justprojects.de/subDesTagesMitExtraKaese/video-dubbing-svc.git
   cd video-dubbing-svc
   ```
2. Create the `data` folder
   ```bash
   mkdir -p data/output data/ingest data/models
   ```
3. Download the pretrained *so-vits-svc* model and place it in the `data/models` folder
   - The path to the `G_*.pth` file should be given as the `MODEL_PATH` [environment variable](./docker-compose.yml)
   - The path to the `config.json` file should be given as the `MODEL_CONFIG_PATH` [environment variable](./docker-compose.yml)
4. Download the *vocal-remover* release and copy the pretrained model `models/baseline.pth` into the `vocal-remover/models` folder
   ```bash
   curl https://github.com/tsurumeso/vocal-remover/releases/download/v5.1.0/vocal-remover-v5.1.0.zip -o /tmp/vocal-remover.zip
   unzip /tmp/vocal-remover.zip -d /tmp/vocal-remover
   cp /tmp/vocal-remover/models/baseline.pth vocal-remover/models/
   rm -rf /tmp/vocal-remover /tmp/vocal-remover.zip
   ```

5. Build the docker image
   ```bash
   docker compose build
   ```
6. Insert your source video into the `data/ingest` folder
7. Run the docker image
   ```bash
    docker compose up
   ```
8. The output video will be in the `data/output` folder