[fast_inference] 回退策略，减少padding影响，开放选项，同步代码 (#986)

* Update README * Optimize-English-G2P * docs: change akward expression * docs: update Changelog_KO.md * Fix CN punc in EN,add 's match * Adjust normalize and g2p logic * Update zh_CN.json * Update README (#827) Update README.md Update some outdated file paths and commands * 修复英文多音字,调整字典热加载,新增姓名匹配 (#869) * Fix homograph dict * Add JSON in dict * Adjust hot dict to hot reload * Add English name dict * Adjust get name dict logic * Make API Great Again (#894) * Add zh/jp/en mix * Optimize code readability and formatted output. * Try OGG streaming * Add stream mode arg * Add media type arg * Add cut punc arg * Eliminate punc risk * Update README (#895) * Update README * Update README * update README * update README * fix typo s/Licence /License (#904) * fix reformat cmd (#917) Co-authored-by: starylan <starylan@outlook.com> * Update README.md * Normalize chinese arithmetic operations (#947) * 改变训练和推理时的mask策略，以修复当batch_size>1时，产生的复读现象 * 同步main分支代码，增加“保持随机”选项 * 在colab中运行colab_webui.ipynb发生的uvr5模型缺失问题 (#968) 在colab中使用git下载uvr5模型时报错： fatal: destination path 'uvr5_weights' already exists and is not an empty directory. 通过在下载前将原本从本仓库下载的uvr5_weights文件夹删除可以解决问题。 * [ASR] 修复FasterWhisper遍历输入路径失败 (#956) * remove glob * rename * reset mirror pos * 回退mask策略；回退pad策略；在T2SBlock中添加padding_mask，以减少pad的影响；开放repetition_penalty参数，让用户自行调整重复惩罚的强度；增加parallel_infer参数，用于开启或关闭并行推理，关闭时与0307版本保持一致；在webui中增加“保持随机”选项；同步main分支代码。 * 删除无用注释 --------- Co-authored-by: Lion <drain.daters.0p@icloud.com> Co-authored-by: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com> Co-authored-by: KamioRinn <snowsdream@live.com> Co-authored-by: Pengoose <pengoose_dev@naver.com> Co-authored-by: Yuan-Man <68322456+Yuan-ManX@users.noreply.github.com> Co-authored-by: XXXXRT666 <157766680+XXXXRT666@users.noreply.github.com> Co-authored-by: KamioRinn <63162909+KamioRinn@users.noreply.github.com> Co-authored-by: Lion-Wu <130235128+Lion-Wu@users.noreply.github.com> Co-authored-by: digger yu <digger-yu@outlook.com> Co-authored-by: SapphireLab <36986837+SapphireLab@users.noreply.github.com> Co-authored-by: starylan <starylan@outlook.com> Co-authored-by: shadow01a <141255649+shadow01a@users.noreply.github.com>
2024-04-19 14:35:28 +08:00
parent 959269b5ae
commit 29f22115fb
25 changed files with 119437 additions and 114148 deletions
--- a/api_v2.py
+++ b/api_v2.py
@@ -22,7 +22,7 @@ POST:
 ```json
 {
    "text": "",                   # str.(required) text to be synthesized
-    "text_lang": "",               # str.(required) language of the text to be synthesized
+    "text_lang": "",              # str.(required) language of the text to be synthesized
    "ref_audio_path": "",         # str.(required) reference audio path.
    "prompt_text": "",            # str.(optional) prompt text for the reference audio
    "prompt_lang": "",            # str.(required) language of the prompt text for the reference audio
@@ -32,12 +32,14 @@ POST:
    "text_split_method": "cut5",  # str.(optional) text split method, see text_segmentation_method.py for details.
    "batch_size": 1,              # int.(optional) batch size for inference
    "batch_threshold": 0.75,      # float.(optional) threshold for batch splitting.
-    "split_bucket": true,          # bool.(optional) whether to split the batch into multiple buckets.
+    "split_bucket": true,         # bool.(optional) whether to split the batch into multiple buckets.
    "speed_factor":1.0,           # float.(optional) control the speed of the synthesized audio.
    "fragment_interval":0.3,      # float.(optional) to control the interval of the audio fragment.
    "seed": -1,                   # int.(optional) random seed for reproducibility.
    "media_type": "wav",          # str.(optional) media type of the output audio, support "wav", "raw", "ogg", "aac".
    "streaming_mode": false,      # bool.(optional) whether to return a streaming response.
+    "parallel_infer": True,       # bool.(optional) whether to use parallel inference.
+    "repetition_penalty": 1.35    # float.(optional) repetition penalty for T2S model.
 }
 ```

@@ -159,6 +161,8 @@ class TTS_Request(BaseModel):
    seed:int = -1
    media_type:str = "wav"
    streaming_mode:bool = False
+    parallel_infer:bool = True
+    repetition_penalty:float = 1.35

 ### modify from https://github.com/RVC-Boss/GPT-SoVITS/pull/894/files
 def pack_ogg(io_buffer:BytesIO, data:np.ndarray, rate:int):
@@ -287,6 +291,8 @@ async def tts_handle(req:dict):
                "seed": -1,                   # int. random seed for reproducibility.
                "media_type": "wav",          # str. media type of the output audio, support "wav", "raw", "ogg", "aac".
                "streaming_mode": False,      # bool. whether to return a streaming response.
+                "parallel_infer": True,       # bool.(optional) whether to use parallel inference.
+                "repetition_penalty": 1.35    # float.(optional) repetition penalty for T2S model.          
            }
    returns:
        StreamingResponse: audio stream response.
@@ -354,6 +360,8 @@ async def tts_get_endpoint(
                        seed:int = -1,
                        media_type:str = "wav",
                        streaming_mode:bool = False,
+                        parallel_infer:bool = True,
+                        repetition_penalty:float = 1.35
                        ):
    req = {
        "text": text,
@@ -373,6 +381,8 @@ async def tts_get_endpoint(
        "seed":seed,
        "media_type":media_type,
        "streaming_mode":streaming_mode,
+        "parallel_infer":parallel_infer,
+        "repetition_penalty":float(repetition_penalty)
    }
    return await tts_handle(req)