Abstract : This article aims to share Pytorch->Caffe->om Model transformation process .
Baseline:PytorchToCaffe
The main function code is :
PytorchToCaffe
+-- Caffe
| +-- caffe.proto
| +-- layer_param.py
+-- example
| +-- resnet_pytorch_2_caffe.py
+-- pytorch_to_caffe.py
Direct use can refer to resnet_pytorch_2_caffe.py, If the operation in the network Baseline Zhongdu has been realized , Can be converted directly to Caffe Model .
If you encounter an operation that is not implemented , There are two situations to consider .
With arg_max For example, let's share how to add operations .
The first thing to look at is Caffe The parameters of the corresponding layer in :caffe.proto For the corresponding version caffe The definition of layers and parameters , You can see ArgMax Defined out_max_val、top_k、axis Three parameters :
message ArgMaxParameter {
// If true produce pairs (argmax, maxval)
optional bool out_max_val = 1 [default = false];
optional uint32 top_k = 2 [default = 1];
// The axis along which to maximise -- may be negative to index from the
// end (e.g., -1 for the last axis).
// By default ArgMaxLayer maximizes over the flattened trailing dimensions
// for each index of the first / num dimension.
optional int32 axis = 3;
}
And Caffe Operator boundary The parameters in are consistent .
layer_param.py Build the instance of parameter class in concrete transformation , Realized the operation parameter from Pytorch To Caffe The transfer :
def argmax_param(self, out_max_val=None, top_k=None, dim=1):
argmax_param = pb.ArgMaxParameter()
if out_max_val is not None:
argmax_param.out_max_val = out_max_val
if top_k is not None:
argmax_param.top_k = top_k
if dim is not None:
argmax_param.axis = dim
self.param.argmax_param.CopyFrom(argmax_param)
pytorch_to_caffe.py It defines Rp class , Used to implement Pytorch Operate to Caffe The transformation of operation :
class Rp(object):
def __init__(self, raw, replace, **kwargs):
self.obj = replace
self.raw = raw
def __call__(self, *args, **kwargs):
if not NET_INITTED:
return self.raw(*args, **kwargs)
for stack in traceback.walk_stack(None):
if 'self' in stack[0].f_locals:
layer = stack[0].f_locals['self']
if layer in layer_names:
log.pytorch_layer_name = layer_names[layer]
print('984', layer_names[layer])
break
out = self.obj(self.raw, *args, **kwargs)
return out
When adding operations , To use Rp Class replacement operation :
torch.argmax = Rp(torch.argmax, torch_argmax)
Next , To implement this operation :
def torch_argmax(raw, input, dim=1):
x = raw(input, dim=dim)
layer_name = log.add_layer(name='argmax')
top_blobs = log.add_blobs([x], name='argmax_blob'.format(type))
layer = caffe_net.Layer_param(name=layer_name, type='ArgMax',
bottom=[log.blobs(input)], top=top_blobs)
layer.argmax_param(dim=dim)
log.cnet.add_layer(layer)
return x
It is realized. argmax operation Pytorch To Caffe Transformation .
If the operation to be converted is in Caffe There is no directly corresponding layer implementation in , There are two main solutions :
1) stay Pytorch Decompose unsupported operations into supported operations :
Such as nn.InstanceNorm2d, Instance normalization is used in conversion BatchNorm It's done , I won't support it affine=True perhaps track_running_stats=True, Default use_global_stats:false, but om On conversion use_global_stats It has to be for true, So you can go to Caffe, But turn around om unfriendly .
InstanceNorm Is in featuremap Each Channel Normalization operation is carried out on , therefore , Can achieve nn.InstanceNorm2d by :
class InstanceNormalization(nn.Module):
def __init__(self, dim, eps=1e-5):
super(InstanceNormalization, self).__init__()
self.gamma = nn.Parameter(torch.FloatTensor(dim))
self.beta = nn.Parameter(torch.FloatTensor(dim))
self.eps = eps
self._reset_parameters()
def _reset_parameters(self):
self.gamma.data.uniform_()
self.beta.data.zero_()
def __call__(self, x):
n = x.size(2) * x.size(3)
t = x.view(x.size(0), x.size(1), n)
mean = torch.mean(t, 2).unsqueeze(2).unsqueeze(3).expand_as(x)
var = torch.var(t, 2).unsqueeze(2).unsqueeze(3).expand_as(x)
gamma_broadcast = self.gamma.unsqueeze(1).unsqueeze(1).unsqueeze(0).expand_as(x)
beta_broadcast = self.beta.unsqueeze(1).unsqueeze(1).unsqueeze(0).expand_as(x)
out = (x - mean) / torch.sqrt(var + self.eps)
out = out * gamma_broadcast + beta_broadcast
return out
But in verification HiLens Caffe Operator boundary Found in ,om Model transformation does not support Channle Sum or mean operations outside of dimensions , To circumvent this operation , We can reimplement it with supported operators nn.InstanceNorm2d:
class InstanceNormalization(nn.Module):
def __init__(self, dim, eps=1e-5):
super(InstanceNormalization, self).__init__()
self.gamma = torch.FloatTensor(dim)
self.beta = torch.FloatTensor(dim)
self.eps = eps
self.adavg = nn.AdaptiveAvgPool2d(1)
def forward(self, x):
n, c, h, w = x.shape
mean = nn.Upsample(scale_factor=h)(self.adavg(x))
var = nn.Upsample(scale_factor=h)(self.adavg((x - mean).pow(2)))
gamma_broadcast = self.gamma.unsqueeze(1).unsqueeze(1).unsqueeze(0).expand_as(x)
beta_broadcast = self.beta.unsqueeze(1).unsqueeze(1).unsqueeze(0).expand_as(x)
out = (x - mean) / torch.sqrt(var + self.eps)
out = out * gamma_broadcast + beta_broadcast
return out
After verification , Equivalent to the original operation , It can be changed to Caffe Model
2) stay Caffe By using existing operations to achieve :
stay Pytorch turn Caffe In the process of discovery , If there is featuremap + 6 This involves constant operations , In the process of conversion, it will appear that blob The problem of . Let's first look at pytorch_to_caffe.py in add Specific conversion method of operation :
def _add(input, *args):
x = raw__add__(input, *args)
if not NET_INITTED:
return x
layer_name = log.add_layer(name='add')
top_blobs = log.add_blobs([x], name='add_blob')
if log.blobs(args[0]) == None:
log.add_blobs([args[0]], name='extra_blob')
else:
layer = caffe_net.Layer_param(name=layer_name, type='Eltwise',
bottom=[log.blobs(input),log.blobs(args[0])], top=top_blobs)
layer.param.eltwise_param.operation = 1 # sum is 1
log.cnet.add_layer(layer)
return x
You can see that for blob There is no case to judge , All we need to do is log.blobs(args[0]) == None To modify under certain conditions , A natural idea is to use Scale Layer implementation add operation :
def _add(input, *args):
x = raw__add__(input, *args)
if not NET_INITTED:
return x
layer_name = log.add_layer(name='add')
top_blobs = log.add_blobs([x], name='add_blob')
if log.blobs(args[0]) == None:
layer = caffe_net.Layer_param(name=layer_name, type='Scale',
bottom=[log.blobs(input)], top=top_blobs)
layer.param.scale_param.bias_term = True
weight = torch.ones((input.shape[1]))
bias = torch.tensor(args[0]).squeeze().expand_as(weight)
layer.add_data(weight.cpu().data.numpy(), bias.cpu().data.numpy())
log.cnet.add_layer(layer)
else:
layer = caffe_net.Layer_param(name=layer_name, type='Eltwise',
bottom=[log.blobs(input), log.blobs(args[0])], top=top_blobs)
layer.param.eltwise_param.operation = 1 # sum is 1
log.cnet.add_layer(layer)
return x
Allied ,featuremap * 6 This simple multiplication can be done in the same way .
zero = nn.Conv2d(in_channels, self.channel_pad, kernel_size=3, padding=1, bias=False)
nn.init.constant(self.zero.weight, 0)
pad_tensor = zero(x)
x = torch.cat([x, pad_tensor], dim=1)
This article is shared from Huawei cloud community 《Pytorch->Caffe Model transformation 》, Original author : Du Fu built a house .
Click to follow , The first time to learn about Huawei's new cloud technology ~