Generally speaking , The objective function of supervised learning consists of loss function and regularization term .(Objective = Loss + Regularization)
Pytorch In general, the loss function in the training model is specified .
Be careful Pytorch The parameters of the built-in loss function and tensorflow Different , yes y_pred before ,y_true After , and Tensorflow yes y_true before ,y_pred After .
about The regression model , The commonly used built-in loss function is the mean square loss function nn.MSELoss .
about A dichotomous model , The binary cross entropy loss function is usually used nn.BCELoss ( The input is already sigmoid The result of activating the function )
perhaps nn.BCEWithLogitsLoss ( The input has not passed through nn.Sigmoid Activation function ) .
about Multiclassification model , Cross entropy loss function is generally recommended nn.CrossEntropyLoss.
(y_true It needs to be one-dimensional , It's a category code .y_pred Not passed nn.Softmax Activate .)
Besides , If there are more categories y_pred After nn.LogSoftmax Activate , have access to nn.NLLLoss Loss function (The negative log likelihood loss).
This method and direct use of nn.CrossEntropyLoss Equivalent .
If necessary , It's fine too Custom loss function , The custom loss function needs to receive two tensors y_pred,y_true As input parameter , And output a scalar as the loss function value .
Pytorch The regularization term in is usually added together with the loss function in a custom way as the objective function .
If you just use L2 Regularization , You can also use the optimizer's weight_decay Parameters to achieve the same effect .
import numpy as np
import pandas as pd
import torch
from torch import nn
import torch.nn.functional as F
y_pred = torch.tensor([[10.0,0.0,-10.0],[8.0,8.0,8.0]])
y_true = torch.tensor([0,2])
# Direct call cross entropy loss
ce = nn.CrossEntropyLoss()(y_pred,y_true)
print(ce)
# It is equivalent to calculating first nn.LogSoftmax Activate , Call again NLLLoss
y_pred_logsoftmax = nn.LogSoftmax(dim = 1)(y_pred)
nll = nn.NLLLoss()(y_pred_logsoftmax,y_true)
print(nll)
tensor(0.5493) tensor(0.5493)
The built-in loss function has two forms: the realization of class and the realization of function .
Such as :nn.BCE and F.binary_cross_entropy They're all binary cross entropy loss functions , The former is an implementation of a class , The latter is the implementation form of the function .
In fact, the implementation form of a class is usually the implementation form of calling function and using nn.Module After encapsulation, we get .
In general, we usually use the implementation form of class . They are packaged in torch.nn Under module , And the class name is Loss ending .
Some common built-in loss functions are described as follows .
nn.MSELoss( Loss of mean square error , It's also called L2 Loss , For regression )
nn.L1Loss (L1 Loss , It's also called absolute error loss , For regression )
nn.SmoothL1Loss ( smooth L1 Loss , When the input is in -1 To 1 Between time , Smooth to L2 Loss , For regression )
nn.BCELoss ( Binary cross entropy , Used for dichotomies , The input has passed nn.Sigmoid Activate , For unbalanced data sets, you can use weigths Parameter adjustment category weight )
nn.BCEWithLogitsLoss ( Binary cross entropy , Used for dichotomies , The input has not passed through nn.Sigmoid Activate )
nn.CrossEntropyLoss ( Cross entropy , For multiple categories , requirement label Encode for sparsity , The input has not passed through nn.Softmax Activate , For unbalanced data sets, you can use weigths Parameter adjustment category weight )
nn.NLLLoss ( Negative log likelihood loss , For multiple categories , requirement label Encode for sparsity , Input through nn.LogSoftmax Activate )
nn.CosineSimilarity( Cosine similarity , It can be used for multi classification )
nn.AdaptiveLogSoftmaxWithLoss ( A loss function which is suitable for very many categories and whose distribution is very uneven , It can adaptively combine multiple small categories into one cluster)
For more introduction of loss function, please refer to the following article :
《PyTorch The eighteen loss functions of 》
https://zhuanlan.zhihu.com/p/61379965
The custom loss function takes two tensors y_pred,y_true As input parameter , And output a scalar as the loss function value .
Also can be nn.Module Subclass , rewrite forward Method to realize the calculation logic of loss , So we can get the class implementation of loss function .
Here's a Focal Loss Custom implementation demonstration of .Focal Loss Is a kind of right binary_crossentropy The improved loss function form of .
It is compared when the sample is unbalanced and there are more easily classified samples binary_crossentropy With obvious advantages .
It has two adjustable parameters ,alpha Parameters and gamma Parameters . among alpha The parameter is mainly used to attenuate the weight of negative samples ,gamma The parameter is mainly used to attenuate the weight of easy training samples .
This makes the model more focused on positive samples and difficult samples . That's why this loss function is called Focal Loss.
See 《5 Minute comprehension Focal Loss And GHM—— A sharp weapon to solve the sample imbalance 》
https://zhuanlan.zhihu.com/p/80594704
class FocalLoss(nn.Module):
def __init__(self,gamma=2.0,alpha=0.75):
super().__init__()
self.gamma = gamma
self.alpha = alpha
def forward(self,y_pred,y_true):
bce = torch.nn.BCELoss(reduction = "none")(y_pred,y_true)
p_t = (y_true * y_pred) + ((1 - y_true) * (1 - y_pred))
alpha_factor = y_true * self.alpha + (1 - y_true) * (1 - self.alpha)
modulating_factor = torch.pow(1.0 - p_t, self.gamma)
loss = torch.mean(alpha_factor * modulating_factor * bce)
return loss
# Difficult samples
y_pred_hard = torch.tensor([[0.5],[0.5]])
y_true_hard = torch.tensor([[1.0],[0.0]])
# Easy to sample
y_pred_easy = torch.tensor([[0.9],[0.1]])
y_true_easy = torch.tensor([[1.0],[0.0]])
focal_loss = FocalLoss()
bce_loss = nn.BCELoss()
print("focal_loss(hard samples):", focal_loss(y_pred_hard,y_true_hard))
print("bce_loss(hard samples):", bce_loss(y_pred_hard,y_true_hard))
print("focal_loss(easy samples):", focal_loss(y_pred_easy,y_true_easy))
print("bce_loss(easy samples):", bce_loss(y_pred_easy,y_true_easy))
# so focal_loss Let the weight of the easy sample decay to the original 0.0005/0.1054 = 0.00474
# And let the weight of the difficult samples only decay to the original 0.0866/0.6931=0.12496
# So relatively speaking ,focal_loss You can attenuate the weight of easy samples .
focal_loss(hard samples): tensor(0.0866) bce_loss(hard samples): tensor(0.6931) focal_loss(easy samples): tensor(0.0005) bce_loss(easy samples): tensor(0.1054)
FocalLoss You can refer to the following for a complete example Customize L1 and L2 Regularization term
The example in , This example demonstrates how to customize the regularization item , It also demonstrates FocalLoss How to use .
It is commonly believed L1 Regularization can produce sparse weight matrix , That is to produce a sparse model , Can be used for feature selection .
and L2 Regularization can prevent model over fitting (overfitting). a certain extent ,L1 Can also prevent over fitting .
Let's take a dichotomy problem as an example , Show how to add customization to the objective function of the model L1 and L2 The method of regularizing terms .
This example also demonstrates the previous part of FocalLoss Use .
1, Prepare the data
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import torch
from torch import nn
import torch.nn.functional as F
from torch.utils.data import Dataset,DataLoader,TensorDataset
import torchkeras
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
# Number of positive and negative samples
n_positive,n_negative = 200,6000
# Generate positive samples , Small circles
r_p = 5.0 + torch.normal(0.0,1.0,size = [n_positive,1])
theta_p = 2*np.pi*torch.rand([n_positive,1])
Xp = torch.cat([r_p*torch.cos(theta_p),r_p*torch.sin(theta_p)],axis = 1)
Yp = torch.ones_like(r_p)
# Generating negative samples , Big circle distribution
r_n = 8.0 + torch.normal(0.0,1.0,size = [n_negative,1])
theta_n = 2*np.pi*torch.rand([n_negative,1])
Xn = torch.cat([r_n*torch.cos(theta_n),r_n*torch.sin(theta_n)],axis = 1)
Yn = torch.zeros_like(r_n)
# Aggregate samples
X = torch.cat([Xp,Xn],axis = 0)
Y = torch.cat([Yp,Yn],axis = 0)
# visualization
plt.figure(figsize = (6,6))
plt.scatter(Xp[:,0],Xp[:,1],c = "r")
plt.scatter(Xn[:,0],Xn[:,1],c = "g")
plt.legend(["positive","negative"]);
ds = TensorDataset(X,Y)
ds_train,ds_valid = torch.utils.data.random_split(ds,[int(len(ds)*0.7),len(ds)-int(len(ds)*0.7)])
dl_train = DataLoader(ds_train,batch_size = 100,shuffle=True,num_workers=2)
dl_valid = DataLoader(ds_valid,batch_size = 100,num_workers=2)
2, Defining models
class DNNModel(torchkeras.Model):
def __init__(self):
super(DNNModel, self).__init__()
self.fc1 = nn.Linear(2,4)
self.fc2 = nn.Linear(4,8)
self.fc3 = nn.Linear(8,1)
def forward(self,x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
y = nn.Sigmoid()(self.fc3(x))
return y
model = DNNModel()
model.summary(input_shape =(2,))
3, Training models
# Accuracy rate
def accuracy(y_pred,y_true):
y_pred = torch.where(y_pred>0.5,torch.ones_like(y_pred,dtype = torch.float32),
torch.zeros_like(y_pred,dtype = torch.float32))
acc = torch.mean(1-torch.abs(y_true-y_pred))
return acc
# L2 Regularization
def L2Loss(model,alpha):
l2_loss = torch.tensor(0.0, requires_grad=True)
for name, param in model.named_parameters():
if 'bias' not in name: # In general, we don't use regular for offset terms
l2_loss = l2_loss + (0.5 * alpha * torch.sum(torch.pow(param, 2)))
return l2_loss
# L1 Regularization
def L1Loss(model,beta):
l1_loss = torch.tensor(0.0, requires_grad=True)
for name, param in model.named_parameters():
if 'bias' not in name:
l1_loss = l1_loss + beta * torch.sum(torch.abs(param))
return l1_loss
# take L2 Regular and L1 Regular addition to FocalLoss Loss , Together as the objective function
def focal_loss_with_regularization(y_pred,y_true):
focal = FocalLoss()(y_pred,y_true)
l2_loss = L2Loss(model,0.001) # Pay attention to setting the regularization coefficient
l1_loss = L1Loss(model,0.001)
total_loss = focal + l2_loss + l1_loss
return total_loss
model.compile(loss_func =focal_loss_with_regularization,
optimizer= torch.optim.Adam(model.parameters(),lr = 0.01),
metrics_dict={"accuracy":accuracy})
dfhistory = model.fit(30,dl_train = dl_train,dl_val = dl_valid,log_step_freq = 30)
If you just need to use L2 Regularization , You can also use the optimizer's weight_decay Parameter to implement .
weight_decay Parameters can set the attenuation of parameters during training , This sum L2 The effect of regularization is equivalent
Pytorch The optimizer supports a type of optimizer called Per-parameter options The operation of , It's a specific learning rate for each parameter , The weight decay rate specifies , To meet more detailed requirements .
weight_params = [param for name, param in model.named_parameters() if "bias" not in name]
bias_params = [param for name, param in model.named_parameters() if "bias" in name]
optimizer = torch.optim.SGD([{'params': weight_params, 'weight_decay':1e-5},
{'params': bias_params, 'weight_decay':0}],
lr=1e-2, momentum=0.9)