Accelerated Python learning: 7-day model layers

The sky is full of stars_ 2020-11-13 00:38:58
accelerated python learning 7-day day


Deep learning model is usually composed of various model layers .

torch.nn There are a lot of model layers built in . They all belong to nn.Module Subclasses of , With parameter management function .

for example :

  • nn.Linear, nn.Flatten, nn.Dropout, nn.BatchNorm2d

  • nn.Conv2d,nn.AvgPool2d,nn.Conv1d,nn.ConvTranspose2d

  • nn.Embedding,nn.GRU,nn.LSTM

  • nn.Transformer

If these built-in model layers don't meet the requirements , We can also through inheritance nn.Module The base class builds a custom model layer .

actually ,pytorch No distinction between model and model layer , It's all through inheritance nn.Module Build .

therefore , We just have to inherit nn.Module Base class and implement forward Method to customize the model layer .

One , Built in model layer

import numpy as np
import torch
from torch import nn 

Some commonly used built-in model layers are described as follows .

Base layer

  • nn.Linear: Fully connected layer . Number of parameters = Input layer characteristic number × The number of output layer characteristics (weight)+ The number of output layer characteristics (bias)

  • nn.Flatten: Flattening layer , It is used to compress multi-dimensional tensor samples into one-dimensional tensor samples .

  • nn.BatchNorm1d: One dimensional batch standardization layer . The input batch is scaled to a stable mean and standard deviation by linear transformation . It can enhance the adaptability of the model to different input distributions , Speed up model training , There is a slight regularization effect . Usually used before activating a function . It can be used afine Parameter sets whether the layer contains parameters that can be trained .

  • nn.BatchNorm2d: Two dimensional batch standardization layer .

  • nn.BatchNorm3d: Three dimensional batch standardization layer .

  • nn.Dropout: One dimensional random drop layer . A means of regularization .

  • nn.Dropout2d: Two dimensional random drop layer .

  • nn.Dropout3d: Three dimensional random drop layer .

  • nn.Threshold: Limiting layer . When the input is greater than or less than the threshold range , Cut it off .

  • nn.ConstantPad2d: Two dimensional constant filling layer . For two-dimensional tensor samples, the constant extension length is filled .

  • nn.ReplicationPad1d: One dimensional copy fill layer . For one-dimensional tensor samples, the extended length is filled by copying the edge value .

  • nn.ZeroPad2d: Two dimensional zero value filling layer . For two-dimensional tensor samples, fill in the edge 0 value .

  • nn.GroupNorm: Group normalization . An alternative to batch normalization , Divide the channels into groups and normalize . Not subject to batch Size limit , It is said that the performance and effect are better than BatchNorm.

  • nn.LayerNorm: Layer normalization . Use fewer .

  • nn.InstanceNorm2d: Sample normalization . Use fewer .

Various normalization techniques refer to the following articles 《FAIR He Kaiming et al. Proposed group normalization : Substitution batch normalization , Not limited by lot size 》

https://zhuanlan.zhihu.com/p/34858971

Convolution network correlation layer

  • nn.Conv1d: Ordinary one-dimensional convolution , Often used in text . Number of parameters = Enter the number of channels × Convolution kernel size ( Such as 3)× Number of convolution kernels + Convolution kernel size ( Such as 3)

  • nn.Conv2d: Ordinary two-dimensional convolution , Often used in images . Number of parameters = Enter the number of channels × Convolution kernel size ( Such as 3 ride 3)× Number of convolution kernels + Convolution kernel size ( Such as 3 ride 3)
    Through adjustment dilation Parameter is greater than the 1, It can become a hollow convolution , Increase the receptive field of convolution kernel .
    Through adjustment groups Parameters for 1, It can be called packet convolution . In packet convolution, different packets use the same convolution kernel , Significantly reduce the number of parameters .
    When groups When the parameter is equal to the number of channels , amount to tensorflow The two-dimensional depth convolution layer in tf.keras.layers.DepthwiseConv2D.
    Using grouping convolution and 1 ride 1 The combination operation of convolution , It can be constructed to be equivalent to Keras A two-dimensional depth separable convolution layer in the tf.keras.layers.SeparableConv2D.

  • nn.Conv3d: Convolution ordinary three-dimensional , Often used in video . Number of parameters = Enter the number of channels × Convolution kernel size ( Such as 3 ride 3 ride 3)× Number of convolution kernels + Convolution kernel size ( Such as 3 ride 3 ride 3) .

  • nn.MaxPool1d: One dimensional maximum pooling .

  • nn.MaxPool2d: Two dimensional maximum pooling . A down sampling method . There are no parameters that need training .

  • nn.MaxPool3d: Three dimensional maximum pooling .

  • nn.AdaptiveMaxPool2d: Two dimensional adaptive maximum pooling . No matter how the size of the input image changes , The output image size is fixed .
    The implementation principle of this function , The pooling operator is estimated by the size of the input image and the size of the output image to be obtained padding,stride Equal parameter .

  • nn.FractionalMaxPool2d: Two dimensional fractional maximum pooling . The input size is usually an integer multiple of the output . The maximum pooling of fractions does not have to be an integer . Fractional maximum pooling uses some random sampling strategies , It has a certain regularity effect , It can be used instead of normal maximum pooling and Dropout layer .

  • nn.AvgPool2d: Two dimensional average pooling .

  • nn.AdaptiveAvgPool2d: Two dimensional adaptive average pooling . No matter how the dimensions of the input change , The dimensions of output are fixed .

  • nn.ConvTranspose2d: Two dimensional convolution transpose layer , It is commonly known as reverse convolution . It's not the inverse of convolution , But with the same convolution kernel , When the input size is the output size of convolution operation , The output size of the convolution transpose is exactly the input size of the convolution operation . It can be used to upsample in semantic segmentation .

  • nn.Upsample: Upper sampling layer , The operation effect is the opposite of pooling . Can pass mode The upper sampling strategy of parameter control is "nearest" Nearest neighbor strategy or "linear" Linear interpolation strategy .

  • nn.Unfold: Slide window extraction layer . Its parameters and convolution operations nn.Conv2d identical . actually , Convolution operations can be equivalent to nn.Unfold and nn.Linear as well as nn.Fold A combination of .
    among nn.Unfold The operation can extract the numerical matrix of each sliding window from the input , And flatten it into one dimension . utilize nn.Linear take nn.Unfold After multiplying the output of the convolution kernel , Reuse
    nn.Fold The operation converts the result to the output image shape .

  • nn.Fold: Reverse sliding window extraction layer .

Loop network related layer

  • nn.Embedding: Embedded layer . A comparison Onehot A more effective method to encode discrete features . It is generally used to map words in input to dense vectors . The parameters of the embedded layer need to be learned .

  • nn.LSTM: Long and short memory loop network layer 【 Multi tier support 】. The most commonly used cyclic network layer . It has a carrying track , Oblivion gate , Update door , Output gate . It can effectively alleviate the problem of gradient disappearance , So it can be applied to the problem of long-term dependence . Set up bidirectional = True You can get two-way LSTM. Need to pay attention to when , The default input and output shapes are (seq,batch,feature), If necessary batch The dimension is placed in the first place 0 dimension , Then set batch_first Parameter set to True.

  • nn.GRU: Gated loop network layer 【 Multi tier support 】.LSTM Low profile version of , No carrying track , The number of parameters is less than LSTM, Faster training .

  • nn.RNN: Simple loop network layer 【 Multi tier support 】. It's easy to have gradients disappear , Can't apply to long-term dependency problems . Generally, it is less used .

  • nn.LSTMCell: Long short memory loop network unit . and nn.LSTM Iterative comparison over the entire sequence , It iterates only one step over the sequence . Generally, it is less used .

  • nn.GRUCell: Gated loop network unit . and nn.GRU Iterative comparison over the entire sequence , It iterates only one step over the sequence . Generally, it is less used .

  • nn.RNNCell: Simple loop network unit . and nn.RNN Iterative comparison over the entire sequence , It iterates only one step over the sequence . Generally, it is less used .

Transformer The correlation layer

  • nn.Transformer:Transformer Network structure .Transformer The network structure is a kind of structure which can replace the circular network , It solves the problem of loop network parallel , It's hard to capture the flaws of long-term dependence . It is the present. NLP The main components of the mainstream model of the task .Transformer The network structure consists of TransformerEncoder Encoder and TransformerDecoder Decoder composition . The core of encoder and decoder is MultiheadAttention Multi attention layer .

  • nn.TransformerEncoder:Transformer Encoder structure . By multiple nn.TransformerEncoderLayer The encoder layer consists of .

  • nn.TransformerDecoder:Transformer Decoder structure . By multiple nn.TransformerDecoderLayer Decoder layer composition .

  • nn.TransformerEncoderLayer:Transformer The encoder layer of .

  • nn.TransformerDecoderLayer:Transformer The decoder layer of .

  • nn.MultiheadAttention: Multi attention layer .

Transformer The principle introduction can refer to the following article 《 Detailed explanation Transformer(Attention Is All You Need)》

https://zhuanlan.zhihu.com/p/48508221

Two , Custom model layer

If Pytorch The built-in model layer of cannot meet the requirements , We can also through inheritance nn.Module The base class builds a custom model layer .

actually ,pytorch No distinction between model and model layer , It's all through inheritance nn.Module Build .

therefore , We just have to inherit nn.Module Base class and implement forward Method to customize the model layer .

Here is Pytorch Of nn.Linear Layer source code , We can follow it from the definition model layer .

import torch
from torch import nn
import torch.nn.functional as F
class Linear(nn.Module):
__constants__ = ['in_features', 'out_features']
def __init__(self, in_features, out_features, bias=True):
super(Linear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = nn.Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self):
nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
nn.init.uniform_(self.bias, -bound, bound)
def forward(self, input):
return F.linear(input, self.weight, self.bias)
def extra_repr(self):
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
)
linear = nn.Linear(20, 30)
inputs = torch.randn(128, 20)
output = linear(inputs)
print(output.size())
torch.Size([128, 30])

 

版权声明
本文为[The sky is full of stars_]所创,转载请带上原文链接,感谢

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database