I want to optimize two different objective functions alternately w.r.t. two different but overlapping parameter sets, and the method I currently adopt is to create two Trainers (SGD). However, it seems that with two Trainers the GPU memory could easily run out. Could you explain what exactly happen when I use two Trainers? Does using an additional Trainer really mean much more memory consumption? If so, what is the best way to achieve the goal aforementioned? Thanks!