How to consolidate the weight matrix from model parallelism?


#1

Hi,

The group context can be leveraged to implement the model parallelism. A common scenario is to split weight matrix into sub-matrix and distribute among different GPUs. However the generated model will also be represented with the sub-matrix distributed on different GPUs. This will lead to the requirement of multi-GPU for prediction scenario. Is there a way to consolidate the model by merging split matrix such that prediction can be done in single GPU?


#2

You can set context to the same gpu for all groups


#3

Just modify the checkpoint json file right? The params file doesn’t need to be modified?