Open
Conversation
Contributor
|
其实还有个方案:从 |
Author
這個是最好的選擇了,不過仍然要把模型加載的邏輯調出來,要不然的話還是全部load進 GPU0 裏。我這麽修改的目的是修改最小化以爭取PR的機會(作者似乎不太accept PR),不過好像沒什麽用就是了。 |
Author
|
不過我已經暫時從這個多卡推理抽身了,所以隨緣吧。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
现存问题
现在如果想在GPTSoVITS的推理模组上开发小工具,几乎必须import inference_webui.py。但是模型会在import 的时候加载进GPU里,会导致很多问题,例如:
改动
此PR通过将所有的模型加载逻辑合进同一个function(def load_models(device_override):)里面解决上述问题。并且在load_model 和get_tts_wav上也加上device_override 来指定推理用卡。device_override 应为cuda序号,如“cuda:0”/"cuda:1",而不设置override的情况下默认为None,将使用默认逻辑(使用global,现时为"cuda"或"cpu")。未改动其他function名和文件名。
兼容
小幅度修改inference_cli 和 inference_gui,以兼容此PR。inference_webui_fast 方面,因未见复用 inference_webui 逻辑而没有做出任何更动。
使用及未来优化
单卡推理需要在推理之前加上一行“load_models()”,无需其他改动。多卡推理可以为每GPU开一个process,每一个process均需要调用load_models()+get_tts_wav(),并加上相应的cuda序号。考虑兼容性及 @XXXXRT666 提及“inference_webui 不是用import读的”,现时采用改动最小的方法。后续需优化代码逻辑,包括分开推理逻辑,争取可以在threading跑以减少CPU缓存占用。