Prune a trained model and retrain

WIll-Xu35 · April 29, 2019, 7:39am

Hi all,

I have a trained network (in fp32) and I want to optimize it for mobile device.

I tried to do int8 quantization using ncnn platform, it can bring 30% speedup. But it is not very impressive and it has to use floating point operations for the first and the last layer, otherwise the performance drop is massive. (By the way, will full int8 computation harm the performance so bad? the model size is around 20MB and I see similar sized model gives good full-int8 performance) So I’m now considering pruning my model.

I’ve gone through this forum and the only information about prune is that there is several pruned resnet model using gluon api. However, my model uses module api and it is not exactly a ResNet structure. So is there any guide for pruning a trained model (using module api) and then retrain it?

Moreover, what is the order of quantization and prune? Quantization first or prune first?

Any help or discussion is appreciated, Thanks.

sad · April 29, 2019, 7:59pm

Hi,

Model pruning is very much an art rather than a science, so if you wanted to prune a custom model of your own it would take a bit of time to meddle around because you can’t just port the learnings from a paper. However, here’s a paper you that delves a little bit into it. https://arxiv.org/pdf/1512.08571.pdf

With regards to order, I would say prune first and then quantization.

WIll-Xu35 · April 30, 2019, 1:37am

Hi,

Thank you for your reply. I’ve read papers regarding pruning and quantization but not this one. I’ll take a close look at it:)

WIll-Xu35 · April 30, 2019, 9:11am

@sad

I now understand what is the operations that needs to be taken to prune a trained model and then retrain. Now the problem becomes how, how to implement those operations.

The only related coding resource I can find is the DSD training example on MLP. What would you suggest? Treat that as a starting point to start from somewhere else?

Many thanks:)

WIll-Xu35 · May 5, 2019, 1:22am

Seems that DSD training’s final product is also a dense network, so it is dead end.

Any other suggestions?

QueensGambit · May 5, 2019, 7:55am

A different approach is to a select an optimized architecture of your choice for mobile devices, like:

Mobilenet v2: https://arxiv.org/pdf/1801.04381.pdf
ShuffleNet V2: https://arxiv.org/abs/1807.11164
EffNet: https://arxiv.org/abs/1801.06434v1

Afterwards, make use of network distillation by transferring the knowledge of your current trained network:

Distilling the Knowledge in a Neural Network: https://arxiv.org/pdf/1503.02531.pdf
Deep Mutual Learning: https://arxiv.org/pdf/1706.00384.pdf

This is done by training against the final feature representation of your current model instead of the actual labels which typically yields higher performance.

WIll-Xu35 · May 6, 2019, 6:32am

Thank you for your reply.

During development, I already used compact models. And when applying compact models to embedded devices, using those dense models are not enough. Normal approach seems to be pruning and quantization.

And distillation seems to be one of the compact dense models, so it might not be good enough.

Anyway, thank you so much for your advice:)

Topic		Replies	Views
The quantization result does not converge Discussion	0	339	March 5, 2019
Gradient Compression (2bit) with FP16 training Discussion	1	512	June 17, 2019
Any network suit for detect thousands object on mobile phone? Gluon	2	410	November 13, 2019
Train speed is weird! Performance	1	397	July 28, 2018
Nan in loss after several epochs in SemSeg problem Gluon	4	3288	May 7, 2018

Prune a trained model and retrain

Related Topics