Open Voice V1 돌리기

Notice

2025년 매일 기록

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

Wookang makes AI

Open Voice V1 돌리기 본문

AI 음성

Open Voice V1 돌리기

푸른깃발🏳️ 2024. 5. 10. 17:53

https://github.com/myshell-ai/OpenVoice

GitHub - myshell-ai/OpenVoice: Instant voice cloning by MyShell.

Instant voice cloning by MyShell. Contribute to myshell-ai/OpenVoice development by creating an account on GitHub.

github.com

OpenVoice에 관심을 가지게 된 이유는, 하나다.

"목소리 복제를 하고 싶다. 내가 좋아하는 목소리로 한글 텍스트를 무제한으로 읽어줬으면 좋겠다"

기존 tts는 길이제한이 있고 사용하기도 복잡했기에 이것을 도다리처럼 아주 간단하게

1. 대용량 텍스트 파일 첨부

2. 목소리 생성 버튼 클릭

3. wav파일 생성

해주는 앱을 만들고 싶었다. 현재까지 r&d 결과로는 부정적이지만, 이 부정적 결론까지 도달한 과정을 기록으로 남기려한다.

일단, V1은 다국어 지원이 안된다.

영어와 중국어만 된다. 만약 한국어 를 읽게 하면 아래와 같이 읽어준다. 네이티브 미국인이 한국어 말하는거 같다.

안녕하세요! 오늘은 날씨가 정말 좋네요.

* 깃헙에서 설치하기

git clone https://github.com/myshell-ai/OpenVoice.git open_voice
cd open_voice

* 환경만들기 - 파이썬 버전이 꼬여서 conda를 이용했다.

conda create -n ov python=3.9
conda activate ov
pip install -r requirements.txt

* conda환경에서 ffmpeg가 없다고 뜬다면 아래처럼 꼭 ffmpeg를 설치해줘야 한다.

conda install ffmpeg

* cpu로 돌리기 - cuda를 사용한다면 pass!

- 현재 open voice에서는 cpu를 사용할수가 없다. 코드가 누락되어있고 이 부분을 수정해주고 있지 않아서 직접 수정해줘야 한다.

먼저 se_extractor.py 파일로 간후 22번째 줄의 아래 코드를

device = "cuda" if torch.cuda.is_available() else "cpu"
model = WhisperModel(model_size, device=device, compute_type="float16")

아래와 같이 바꿔준다.

device, compute_type = ("cuda","float16") if torch.cuda.is_available() else ("cpu", "int8")
model = WhisperModel(model_size, device=device, compute_type=compute_type)

* 다 됐다. 이제 돌려보자. 한국어를 읽어보게 했다. 영어를 사용하고 싶다면 아래 주석을 풀면된다.

import os
import torch
from openvoice import se_extractor
from openvoice.api import BaseSpeakerTTS, ToneColorConverter

ckpt_base = 'checkpoints/base_speakers/EN'
ckpt_converter = 'checkpoints/converter'
device="cuda:0" if torch.cuda.is_available() else "cpu"
output_dir = 'outputs'

base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)
base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')

tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

os.makedirs(output_dir, exist_ok=True)

source_se = torch.load(f'{ckpt_base}/en_default_se.pth').to(device)


# reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone
reference_speaker = 'resources/lympe.mp3' # This is the voice you want to clone

target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, target_dir='processed', vad=True)

# inference
save_path = f'{output_dir}/output_en_default.wav'

# Run the base speaker tts
# text = "This audio is generated by OpenVoice."
text = "안녕하세요! 오늘은 날씨가 정말 좋네요."

src_path = f'{output_dir}/tmp.wav'
base_speaker_tts.tts(text, src_path, speaker='default', language='English', speed=1.0)

# Run the tone color converter
encode_message = "@MyShell"
tone_color_converter.convert(
    audio_src_path=src_path, 
    src_se=source_se, 
    tgt_se=target_se, 
    output_path=save_path,
    message=encode_message)
    
source_se = torch.load(f'{ckpt_base}/en_style_se.pth').to(device)
save_path = f'{output_dir}/output_whispering.wav'

# Run the base speaker tts
# text = "This audio is generated by OpenVoice."
text = "안녕하세요! 오늘은 날씨가 정말 좋네요."

src_path = f'{output_dir}/tmp.wav'
base_speaker_tts.tts(text, src_path, speaker='whispering', language='English', speed=0.9)

# Run the tone color converter
encode_message = "@MyShell"
tone_color_converter.convert(
    audio_src_path=src_path, 
    src_se=source_se, 
    tgt_se=target_se, 
    output_path=save_path,
    message=encode_message)


ckpt_base = 'checkpoints/base_speakers/ZH'
base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)
base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')

source_se = torch.load(f'{ckpt_base}/zh_default_se.pth').to(device)
save_path = f'{output_dir}/output_chinese.wav'

# Run the base speaker tts
# text = "今天天气真好，我们一起出去吃饭吧。"
text = "안녕하세요! 오늘은 날씨가 정말 좋네요."

src_path = f'{output_dir}/tmp.wav'
base_speaker_tts.tts(text, src_path, speaker='default', language='Chinese', speed=1.0)

# Run the tone color converter
encode_message = "@MyShell"
tone_color_converter.convert(
    audio_src_path=src_path, 
    src_se=source_se, 
    tgt_se=target_se, 
    output_path=save_path,
    message=encode_message)

* 결론

V1은 윈도우나 맥, 모두에서 잘 돌아갔다. 영어 성능은 v1도 충분히 좋았다.
다음 포스트에서 정리할 V2는 한국어의 경우 cuda 환경에서만 가능하고 - 영어나 중국어는 여전히 cpu에서 돌아간다 - 여기에 목소리 트레이닝도 가능하다. 물론 성능은 그다지 만족스럽지 않지만 여러 테스트를 해보니 어떤 목소리는 꽤나 잘 복제해냈다.

자세한 사항은 V2에 남기겠다.

저작자표시 비영리 변경금지 (새창열림)

'AI 음성' 카테고리의 다른 글

xtts-webui로 coqui 설치하기 (1)	2024.07.01
coqui tts(xtts) v2 사용기 정리 (0)	2024.05.17
xtts가 훨씬 좋다 (1)	2024.05.11
[Whisper-WebUI] 자동 자막 생성 및 추출 & 번역까지 한번에 (2)	2023.06.27

'AI 음성' Related Articles

Wookang makes AI

Open Voice V1 돌리기 본문

Open Voice V1 돌리기

'AI 음성' 카테고리의 다른 글

티스토리툴바