File size: 4,441 Bytes
6bf0fa9
bdb4f02
019e2a3
 
8c00071
019e2a3
 
 
 
88cfb3e
8331c06
4345759
5067878
 
6bf0fa9
5d9a91a
67e4641
88cfb3e
ec9d730
f33f084
 
5d9a91a
a1338da
f44507b
02bf1ff
229707e
02bf1ff
5d9a91a
70184a3
5d9a91a
ec9d730
02bf1ff
5d9a91a
9146509
 
02bf1ff
 
 
c86304f
7d11d0a
23acb9e
5067878
 
70184a3
5067878
83098ea
02bf1ff
67e4641
c8dd13e
8a23304
7fa53df
 
 
 
8a23304
7fa53df
5067878
02bf1ff
 
 
c8dd13e
794578f
 
02bf1ff
c8dd13e
794578f
02bf1ff
c8dd13e
 
02bf1ff
eb2f44b
17a68db
02bf1ff
17a68db
 
dd2e93a
f44507b
 
9d6172b
972caea
67e4641
7184f5f
972caea
bb2cd38
972caea
 
 
23acb9e
 
 
 
2a2d5c1
 
23acb9e
 
7184f5f
 
4377106
 
2639eaf
794578f
4377106
 
bb2cd38
6ab4672
4377106
 
dd13de0
 
bb2cd38
2a2d5c1
bb2cd38
4377106
 
 
c4effd2
bb2cd38
c4effd2
bb2cd38
c4effd2
 
bb2cd38
c4effd2
 
bb2cd38
c4effd2
 
bb2cd38
5067878
 
b71acb9
25b87f7
02bf1ff
5ffcd95
 
267f0b7
86b9ce4
5ffcd95
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
license: cc-by-nc-sa-4.0
language:
- en
pipeline_tag: text-to-audio
tags:
- audiocraft
- audiogen
- styletts2
- shift-tts
- sound
- audio-generation
- text-to-speech
- mimic3
---

Audionar - Phonetic variation of StyleTTS2 blend to AudioGen SoundScapes

[![Beta Text 2 Speech Tool](assets/shift_banner.png?raw=true)](https://shift-europe.eu/)

##

# SHIFT TTS / Audionar

Phonetic variation of [SHIFT TTS](https://audeering.github.io/shift/) blend to [AudioGen soundscapes](https://huggingface.co/dkounadis/artificial-styletts2/discussions/3)
  - [Analysis of emotion of SHIFT TTS](https://huggingface.co/dkounadis/artificial-styletts2/discussions/2)
  - [Listen Also foreign languages](https://huggingface.co/dkounadis/artificial-styletts2/discussions/4)

## Listen Voices


<a href="https://huggingface.co/dkounadis/artificial-styletts2/discussions/1">Native English</a> / <a href="https://huggingface.co/dkounadis/artificial-styletts2/discussions/1#6783e3b00e7d90facec060c6">Non-native English: Accents</a> / <a href="https://huggingface.co/dkounadis/artificial-styletts2/discussions/1#6782c5f2a2f852eeb1027a32">Foreign languages</a>

##

```
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=/data/.hf7/ CUDA_VISIBLE_DEVICES=0 python demo.py
```

## Flask API

<details>
<summary>
Build virtualenv & run api.py
</summary>

Above [TTS Demo](https://huggingface.co/dkounadis/artificial-styletts2/blob/main/demo.py) is a standalone script that loads TTS & AudioGen models and synthesizes a txt. We also provide a Flask `api.py` that allows faster inference with
loading only once TTS & [AudioGen](https://huggingface.co/dkounadis/artificial-styletts2/tree/main/audiocraft)

Clone

```
git clone https://huggingface.co/dkounadis/artificial-styletts2
```
Install

```
cd artificial-styletts2
virtualenv --python=python3.10 .env0
source .env0/bin/activate
pip install -r requirements.txt
```

Flask API - open a 2nd terminal

```
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=/data/.hf7/ CUDA_VISIBLE_DEVICES=0 python api.py
```

Following examples need `api.py` to be running. [Set this IP](https://huggingface.co/dkounadis/artificial-styletts2/blob/main/tts.py#L93) to the IP shown when starting `api.py`.

```
# git lfs pull # to download assets/ocr.jpg
python tts.py --text assets/ocr.txt --image assets/ocr.jpg --soundscape "battle hero" --voice romanian
```

</details>

## Landscape 2 Soundscapes

The following needs `api.py` to be already running.

```python
# TTS & soundscape - output .mp4 saved in ./out/
python landscape2soundscape.py
```

For SHIFT demo / Collaboration with [SMB](https://www.smb.museum/home/)
  - YouTube Videos


[![01](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____01_Schick_AII840_001.jpg)](https://youtu.be/SSi3gUO4GtY)

[![02](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____02_Constable_AI555_001.jpg)](https://youtu.be/2YjxAPkdXIc)

[![03](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____03_Schinkel_WS200-002.jpg)](https://youtu.be/BhMh02knkco)



[![05](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____05_Blechen_FV40_001.jpg)](https://youtu.be/a3qk9S87v60)

[![06](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____06_Menzel_AI900_001.jpg)](https://youtu.be/3M0y9OYzDfU)

[![07](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____07_Courbet_AI967_001.jpg)](https://youtu.be/56MH7zOHrNQ)

[![08](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____08_Monet_AI1013_001.jpg)](https://youtu.be/gnGCYLcdLsA)

[![10](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____10_Boecklin_967648_NG2-80_001_rsz.jpg)](https://www.youtube.com/watch?v=Y8QyYUgLaCg)

[![11](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____11_Liebermann_NG4-94_001.jpg)](https://youtu.be/RhUuS9HMLhg)

[![12](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____12_Slevogt_AII1022_001.jpg)](https://youtu.be/NzzhhrUeKVY)




# SoundScape Live Demo - Paplay

Flask API for playing sounds live

```python
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=/data/dkounadis/.hf7/ CUDA_VISIBLE_DEVICES=4 python api.py
```

Describe any sound via text, the tts & soundscape is played back

```python
python live_demo.py  # type text & plays AudioGen sound & TTS
```

# Audiobook

Create audiobook from `.docx`. Listen to it - YouTube [male voice](https://youtu.be/fUGpfq_o_CU) / [female voice](https://www.youtube.com/watch?v=tlRdRV5nm40)

```python
#  audiobook will be saved in ./tts_audiobooks
python audiobook.py
```