Abstract : This article aims to share Pytorch->Caffe->om Model transformation process .
Standard network
Baseline:PytorchToCaffe
The main function code is :
PytorchToCaffe
+-- Caffe
| +-- caffe.proto
| +-- layer_param.py
+-- example
| +-- resnet_pytorch_2_caffe.py
+-- pytorch_to_caffe.py
Direct use can refer to resnet_pytorch_2_caffe.py, If the operation in the network Baseline Zhongdu has been realized , Can be converted directly to Caffe Model .
Add custom actions
If you encounter an operation that is not implemented , There are two situations to consider .
Caffe There are corresponding operations in
With arg_max For example, let's share how to add operations .
The first thing to look at is Caffe The parameters of the corresponding layer in :caffe.proto For the corresponding version caffe The definition of layers and parameters , You can see ArgMax Defined out_max_val、top_k、axis Three parameters :
message ArgMaxParameter {
// If true produce pairs (argmax, maxval)
optional bool out_max_val = 1 [default = false];
optional uint32 top_k = 2 [default = 1];
// The axis along which to maximise -- may be negative to index from the
// end (e.g., -1 for the last axis).
// By default ArgMaxLayer maximizes over the flattened trailing dimensions
// for each index of the first / num dimension.
optional int32 axis = 3;
}
And Caffe Operator boundary The parameters in are consistent .
layer_param.py Build the instance of parameter class in concrete transformation , Realized the operation parameter from Pytorch To Caffe The transfer :
def argmax_param(self, out_max_val=None, top_k=None, dim=1):
argmax_param = pb.ArgMaxParameter()
if out_max_val is not None:
argmax_param.out_max_val = out_max_val
if top_k is not None:
argmax_param.top_k = top_k
if dim is not None:
argmax_param.axis = dim
self.param.argmax_param.CopyFrom(argmax_param)
pytorch_to_caffe.py It defines Rp class , Used to implement Pytorch Operate to Caffe The transformation of operation :
class Rp(object):
def __init__(self, raw, replace, **kwargs):
self.obj = replace
self.raw = raw
def __call__(self, *args, **kwargs):
if not NET_INITTED:
return self.raw(*args, **kwargs)
for stack in traceback.walk_stack(None):
if 'self' in stack[0].f_locals:
layer = stack[0].f_locals['self']
if layer in layer_names:
log.pytorch_layer_name = layer_names[layer]
print('984', layer_names[layer])
break
out = self.obj(self.raw, *args, **kwargs)
return out
When adding operations , To use Rp Class replacement operation :
torch.argmax = Rp(torch.argmax, torch_argmax)
Next , To implement this operation :
def torch_argmax(raw, input, dim=1):
x = raw(input, dim=dim)
layer_name = log.add_layer(name='argmax')
top_blobs = log.add_blobs([x], name='argmax_blob'.format(type))
layer = caffe_net.Layer_param(name=layer_name, type='ArgMax',
bottom=[log.blobs(input)], top=top_blobs)
layer.argmax_param(dim=dim)
log.cnet.add_layer(layer)
return x
It is realized. argmax operation Pytorch To Caffe Transformation .
Caffe There is no direct corresponding operation in
If the operation to be converted is in Caffe There is no directly corresponding layer implementation in , There are two main solutions :
1) stay Pytorch Decompose unsupported operations into supported operations :
Such as nn.InstanceNorm2d, Instance normalization is used in conversion BatchNorm It's done , I won't support it affine=True perhaps track_running_stats=True, Default use_global_stats:false, but om On conversion use_global_stats It has to be for true, So you can go to Caffe, But turn around om unfriendly .
InstanceNorm Is in featuremap Each Channel Normalization operation is carried out on , therefore , Can achieve nn.InstanceNorm2d by :
class InstanceNormalization(nn.Module):
def __init__(self, dim, eps=1e-5):
super(InstanceNormalization, self).__init__()
self.gamma = nn.Parameter(torch.FloatTensor(dim))
self.beta = nn.Parameter(torch.FloatTensor(dim))
self.eps = eps
self._reset_parameters()
def _reset_parameters(self):
self.gamma.data.uniform_()
self.beta.data.zero_()
def __call__(self, x):
n = x.size(2) * x.size(3)
t = x.view(x.size(0), x.size(1), n)
mean = torch.mean(t, 2).unsqueeze(2).unsqueeze(3).expand_as(x)
var = torch.var(t, 2).unsqueeze(2).unsqueeze(3).expand_as(x)
gamma_broadcast = self.gamma.unsqueeze(1).unsqueeze(1).unsqueeze(0).expand_as(x)
beta_broadcast = self.beta.unsqueeze(1).unsqueeze(1).unsqueeze(0).expand_as(x)
out = (x - mean) / torch.sqrt(var + self.eps)
out = out * gamma_broadcast + beta_broadcast
return out
But in verification HiLens Caffe Operator boundary Found in , om Model transformation does not support Channle Sum or mean operations outside of dimensions , To circumvent this operation , We can reimplement it with supported operators nn.InstanceNorm2d:
class InstanceNormalization(nn.Module):
def __init__(self, dim, eps=1e-5):
super(InstanceNormalization, self).__init__()
self.gamma = torch.FloatTensor(dim)
self.beta = torch.FloatTensor(dim)
self.eps = eps
self.adavg = nn.AdaptiveAvgPool2d(1)
def forward(self, x):
n, c, h, w = x.shape
mean = nn.Upsample(scale_factor=h)(self.adavg(x))
var = nn.Upsample(scale_factor=h)(self.adavg((x - mean).pow(2)))
gamma_broadcast = self.gamma.unsqueeze(1).unsqueeze(1).unsqueeze(0).expand_as(x)
beta_broadcast = self.beta.unsqueeze(1).unsqueeze(1).unsqueeze(0).expand_as(x)
out = (x - mean) / torch.sqrt(var + self.eps)
out = out * gamma_broadcast + beta_broadcast
return out
After verification , Equivalent to the original operation , It can be changed to Caffe Model
2) stay Caffe By using existing operations to achieve :
stay Pytorch turn Caffe In the process of discovery , If there is featuremap + 6 This involves constant operations , In the process of conversion, it will appear that blob The problem of . Let's first look at pytorch_to_caffe.py in add Specific conversion method of operation :
def _add(input, *args):
x = raw__add__(input, *args)
if not NET_INITTED:
return x
layer_name = log.add_layer(name='add')
top_blobs = log.add_blobs([x], name='add_blob')
if log.blobs(args[0]) == None:
log.add_blobs([args[0]], name='extra_blob')
else:
layer = caffe_net.Layer_param(name=layer_name, type='Eltwise',
bottom=[log.blobs(input),log.blobs(args[0])], top=top_blobs)
layer.param.eltwise_param.operation = 1 # sum is 1
log.cnet.add_layer(layer)
return x
You can see that for blob There is no case to judge , All we need to do is log.blobs(args[0]) == None To modify under certain conditions , A natural idea is to use Scale Layer implementation add operation :
def _add(input, *args):
x = raw__add__(input, *args)
if not NET_INITTED:
return x
layer_name = log.add_layer(name='add')
top_blobs = log.add_blobs([x], name='add_blob')
if log.blobs(args[0]) == None:
layer = caffe_net.Layer_param(name=layer_name, type='Scale',
bottom=[log.blobs(input)], top=top_blobs)
layer.param.scale_param.bias_term = True
weight = torch.ones((input.shape[1]))
bias = torch.tensor(args[0]).squeeze().expand_as(weight)
layer.add_data(weight.cpu().data.numpy(), bias.cpu().data.numpy())
log.cnet.add_layer(layer)
else:
layer = caffe_net.Layer_param(name=layer_name, type='Eltwise',
bottom=[log.blobs(input), log.blobs(args[0])], top=top_blobs)
layer.param.eltwise_param.operation = 1 # sum is 1
log.cnet.add_layer(layer)
return x
Allied ,featuremap * 6 This simple multiplication can be done in the same way .
The pit of tread
- Pooling:Pytorch Default ceil_mode=false,Caffe Default ceil_mode=true, It can lead to dimensional changes , If there is a size mismatch, you can check Pooling Is the parameter correct . in addition , Although not seen in the document , however kernel_size > 32 Although the post model can be transformed , But reasoning will report errors , This can be done in two layers Pooling operation .
- Upsample :om In the boundary operator Upsample layer scale_factor Parameter must be int, It can't be size. If the existing model parameter is size It's going to run normally Pytorch turn Caffe The process of , But this time Upsample Parameter is empty . Parameter is size It can be considered that scale_factor Or use Deconvolution To achieve .
- Transpose2d:Pytorch in output_padding Parameters are added to the size of the output , but Caffe Can't , The output feature map will be smaller , Now, after deconvolution featuremap It's going to get bigger , Can pass Crop The layers are cut , Make it the same size as Pytorch The corresponding layers are consistent . in addition ,om The speed of deconvolution reasoning is slow , It's better not to use , It can be used Upsample+Convolution replace .
- Pad:Pytorch in Pad There are many different operations , but Caffe Can only be carried out in H And W Symmetry in dimensions pad, If Pytorch There is h = F.pad(x, (1, 2, 1, 2), "constant", 0) This asymmetry pad operation , The solution is as follows :
- If it's asymmetric pad There is no subsequent dimension mismatch problem in the layer of , You can judge first pad The effect on the result , Some tasks are affected by pad The impact is very small , Then there's no need to modify .
- If there is a dimension mismatch problem , It can be considered that according to the larger parameters pad after Crop, Or the front and back (0, 0, 1, 1) And (1, 1, 0, 0) Of pad In one (1, 1, 1, 1), It depends on the specific network structure .
- If it is Channel Dimensionally pad Such as F.pad(x, (0, 0, 0, 0, 0, channel_pad), "constant", 0), We can consider zero convolution cat To featuremap On :
zero = nn.Conv2d(in_channels, self.channel_pad, kernel_size=3, padding=1, bias=False)
nn.init.constant(self.zero.weight, 0)
pad_tensor = zero(x)
x = torch.cat([x, pad_tensor], dim=1)
- Some operations can go to Caffe, but om Standards are not supported Caffe All operations , If you want to go to om Check the boundary operator against the document .
This article is shared from Huawei cloud community 《Pytorch->Caffe Model transformation 》, Original author : Du Fu built a house .
Click to follow , The first time to learn about Huawei's new cloud technology ~