Spaces:

PhoenixStormJr
/

RVC_V2_Docker_Translated_might-be-broken

Sleeping

App Files Files Community

RVC_V2_Docker_Translated_might-be-broken / Changelog_CN.md

PhoenixStormJr

Update Changelog_CN.md

0a99765 verified 3 months ago

preview code

raw

history blame contribute delete

5.71 kB

	### 20230618 Update
	- v2 adds two new pre-trained models, 32k and 48k
	- Fix non-f0 model inference error
	- For the indexing phase of training sets longer than one hour, automatically kmeans reduces feature processing to speed up index training, joining and querying
	- Comes with a voice-to-guitar toy warehouse
	- Data processing removes outlier slices
	- onnx export tab

	Failed experiments:
	- ~~Feature retrieval adds time series dimension: sent, no effect~~
	- ~~Feature retrieval adds PCAR dimension reduction option: sent, use kmeans to reduce the data volume for large data, and the dimension reduction operation takes more time than the saved matching time for small data~~
	- ~~Support onnx reasoning (with a small compressed package for reasoning only): sent, pytorch is still needed to generate nsf~~
	- ~~Randomly enhance the input in terms of pitch, gender, eq, noise, etc. during training: sent, no effect~~

	todolist:
	- Connect to small vocoder research
	- Training set pitch recognition supports crepe
	- Crepe's accuracy supports synchronization with RVC-config
	- Connect to F0 editor


	### 20230528 Update
	- Added v2 jupyter notebook, Korean changelog, and some environment dependencies
	- Added breathing, clear consonant, and sibilant protection modes
	- Support crepe-full reasoning
	- UVR5 vocal accompaniment separation plus 3 de-delay models and MDX-Net dereverberation model, and added HP3 vocal extraction model
	- Added version and experiment name to the index name
	- Added audio export format options for vocal accompaniment separation and reasoning batch export
	- Abandoned the training of 32k model

	### 20230513 Update
	- Clear the remaining infer_pack and uvr5_pack in the old version of runtime in the one-click package
	- Fixed the bug of pseudo multi-process in training set preprocessing
	- Added harvest recognition pitch option to reduce mute phenomenon through median filtering, and adjustable median filter radius
	- Export audio with post-processing resampling
	- Changed the number of training n_cpu processes from "Adjust f0 extraction only" to "Adjust data preprocessing and f0 extraction"
	- Automatically detect the index path under the logs folder and provide a drop-down list function
	- Add "FAQ" to the tab page (also refer to github-rvc-wiki)
	- Added pitch cache for input audio inference on the same path (Purpose: When using harvest pitch extraction, the entire pipeline will go through a long and repeated pitch extraction process. If cache is not used, users who experiment with different timbres, indices, and pitch median filter radius parameters will have a very painful wait for the results after the first test)

	### 20230514 Update
	- Volume envelope alignment input mixing (can alleviate the problem of "input silence output small noise". If the input audio background noise is large, it is not recommended to turn it on. It is not turned on by default (the value is 1, which can be regarded as not turned on))
	- Support saving the extracted small model at a specified frequency (if you want to try the inference effect under different epochs, but don't want to save all the large checkpoints and manually process the extracted small model every time, this function will be very useful)
	- Solve the problem of browser connection errors caused by the server opening the system global proxy by setting environment variables
	- Support v2 pre-trained models (currently only the 40k version is open for testing, and the other 2 sampling rates have not been fully trained)
	- Limit the excessive volume of more than 1 before inference
	- Fine-tune data preprocessing parameters


	### 20230409 Update
	- Corrected training parameters, improved average graphics card utilization, A100 increased from 25% to about 90%, V100: 50%->90%, 2060S: 60%->85%, P40: 25%->95%, training speed significantly improved
	- Corrected parameters: total batch_size changed to batch_size of each card
	- Corrected total_epoch: maximum limit 100 unlocked to 1000; default 10 increased to default 20
	- Fixed the problem of ckpt extraction recognition with pitch error causing inference abnormality
	- Fixed the problem of ckpt being saved once for each rank in distributed training
	- Feature extraction performs nan feature filtering
	- Fixed the problem of silent input outputting random consonants or noise (old version model needs to redo training set retraining)

	### 20230416 Update
	- Added local real-time voice change mini GUI, double-click go-realtime-gui.bat to start
	- Training and inference all filter the frequency band <50Hz
	- Training and inference pitch extraction pyworld's lowest pitch is reduced from the default 80 to 50, and the male bass between 50-80hz will not be muffled
	- WebUI supports changing the language according to the system region (currently supports en_US, ja_JP, zh_CN, zh_HK, zh_SG, zh_TW, and the default en_US is not supported)
	- Corrected some graphics card recognition (for example, V100-16G recognition failed, P4 recognition failed)

	### 20230428 Update
	- Upgrade faiss index settings, faster and higher quality
	- Cancel total_npy dependency, no need to fill in total_npy for subsequent model sharing
	- Unlock 16 series restrictions. 4G memory GPU gives 4G inference settings.
	- Fix the bug of UVR5 vocal accompaniment separation under some audio formats
	- Real-time voice change mini gui adds support for non-40k and non-slack pitch models

	### Follow-up plan:
	Function:
	- Support multi-player training tab (up to 4 people)

	Base model:
	- Collect breathing wav and add it to the training set to fix the problem of breathing voice changing electronic music
	- We are training the base model of the singing training set, which will be made public in the future