[Self-study] Introduction to deep learning Python-based theory and implementation LESSON10

Rachel MuZy 2022-09-09 01:24:58 阅读数:922

self-studyselfstudyintroductiondeep

目录

前言

一、参数的更新

1. SGD的缺点

2. Momentum

3. AdaGrad

4. Adam

5. 基于MNISTData set to update method is

二、权重的初始值

1. The initial weights value cannot be set as0

 

总结


前言

This section describes the weight parameter optimization method,The search for the optimal weighting parameters optimization method.


一、参数的更新

1. SGD的缺点

If the function of the shape to fly army,比如呈延伸状,搜索的路径会非常低效.Investigate the root reason is the direction of the gradient and there is no point to the direction of the minimum.

 

2. Momentum

W表示权重参数,yita表示学习率,vSaid on the gradient of stress.

av这一项表示在物体不受任何力时,Receive the resistance of the.

代码实现如下:

class Momentum:
"""Momentum SGD"""
def __init__(self, lr=0.01, momentum=0.9):
self.lr = lr
self.momentum = momentum
self.v = None
def update(self, params, grads):
if self.v is None:
self.v = {}
for key, val in params.items():
self.v[key] = np.zeros_like(val)
for key in params.keys():
self.v[key] = self.momentum*self.v[key] - self.lr*grads[key]
params[key] += self.v[key]

 分析:

(1)初始化时,v中什么都不保存,当第一次调用update()时,vWill be saved in the form of dictionary variable and parameter structure of the same data.

(2)np.zeros_like(val):

import numpy as np
a = np.arange(12)
a = a.reshape(2,2,3)
b = np.zeros_like(a)
print(a)
print(b)

结果:

[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
[[[0 0 0]
[0 0 0]]
[[0 0 0]
[0 0 0]]]

 (3)

What is the significance of this part of the code?params.items()是什么?此处存疑

...........................................

Momentum的优点 :

与SGD相比,它的“之”Words can alleviate.这是因为虽然xShaft by force is very small,But has been in the same direction,So will accelerate in a way;yAxis while loading large,But interactively by positive and negative direction of the force,积分为0.与SGD相比,Can more quickly find the minimum.

 

3. AdaGrad

AdaGradCan adjust the learning rate for the parameter of each element,与此同时进行学习.即随着学习的进行,使学习率逐渐减小.

Said the corresponding matrix element to product.

Parameter changes from elements of the earth element vector to smaller.

代码实现:

class AdaGrad:
"""AdaGrad"""
def __init__(self, lr=0.01):
self.lr = lr
self.h = None
def update(self, params, grads):
if self.h is None:
self.h = {}
for key, val in params.items():
self.h[key] = np.zeros_like(val)
for key in params.keys():
self.h[key] += grads[key] * grads[key]
params[key] -= self.lr * grads[key] / (np.sqrt(self.h[key]) + 1e-7)

 Weights update path as shown in the figure below:

 Function efficiently to the minimum value of mobile.

4. Adam

AdamIs the first two(AdaGrad和Momentum)融合在一起.

class Adam:
"""Adam (http://arxiv.org/abs/1412.6980v8)"""
def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
self.lr = lr
self.beta1 = beta1
self.beta2 = beta2
self.iter = 0
self.m = None
self.v = None
def update(self, params, grads):
if self.m is None:
self.m, self.v = {}, {}
for key, val in params.items():
self.m[key] = np.zeros_like(val)
self.v[key] = np.zeros_like(val)
self.iter += 1
lr_t = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)
for key in params.keys():
#self.m[key] = self.beta1*self.m[key] + (1-self.beta1)*grads[key]
#self.v[key] = self.beta2*self.v[key] + (1-self.beta2)*(grads[key]**2)
self.m[key] += (1 - self.beta1) * (grads[key] - self.m[key])
self.v[key] += (1 - self.beta2) * (grads[key]**2 - self.v[key])
params[key] -= lr_t * self.m[key] / (np.sqrt(self.v[key]) + 1e-7)
#unbias_m += (1 - self.beta1) * (grads[key] - self.m[key]) # correct bias
#unbisa_b += (1 - self.beta2) * (grads[key]*grads[key] - self.v[key]) # correct bias
#params[key] += self.lr * unbias_m / (np.sqrt(unbisa_b) + 1e-7)

AdamWill set three super parameters:学习率、beta1、beta2.根据论文,beta1 = 0.9, beta2 = 0.999.In most case, it can go smoothly.

5. 基于MNISTData set to update method is

# coding: utf-8
import os
import sys
sys.path.append(os.pardir) # 为了导入父目录的文件而进行的设定
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from common.util import smooth_curve
from common.multi_layer_net import MultiLayerNet
from common.optimizer import *
# 0:读入MNIST数据==========
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True)
train_size = x_train.shape[0] #把x_trainInto a one-dimensional
batch_size = 128
max_iterations = 2000
# 1:进行实验的设置==========
optimizers = {}
optimizers['SGD'] = SGD()
optimizers['Momentum'] = Momentum()
optimizers['AdaGrad'] = AdaGrad()
optimizers['Adam'] = Adam()
#optimizers['RMSprop'] = RMSprop()
networks = {}
train_loss = {}
for key in optimizers.keys():
networks[key] = MultiLayerNet(
input_size=784, hidden_size_list=[100, 100, 100, 100],
output_size=10)
train_loss[key] = []
# 2:开始训练==========
for i in range(max_iterations):
batch_mask = np.random.choice(train_size, batch_size)
x_batch = x_train[batch_mask]
t_batch = t_train[batch_mask]
for key in optimizers.keys():
grads = networks[key].gradient(x_batch, t_batch)
optimizers[key].update(networks[key].params, grads)
loss = networks[key].loss(x_batch, t_batch)
train_loss[key].append(loss)
if i % 100 == 0: #i与100Take the under-pressure sieve analyzer is equal to0
print( "===========" + "iteration:" + str(i) + "===========")
for key in optimizers.keys():
loss = networks[key].loss(x_batch, t_batch)
print(key + ":" + str(loss))
# 3.绘制图形==========
markers = {"SGD": "o", "Momentum": "x", "AdaGrad": "s", "Adam": "D"}
x = np.arange(max_iterations)
for key in optimizers.keys():
plt.plot(x, smooth_curve(train_loss[key]), marker=markers[key], markevery=100, label=key)
plt.xlabel("iterations")
plt.ylabel("loss")
plt.ylim(0, 1)
plt.legend()
plt.show()

输出结果:

 

分析:

(1)

 np.random.choice():

import numpy as np
a = np.random.choice(10, 8)
#从[0, 10)内输出8A one-dimensional array number is formed anda
print(a)
b = np.random.choice(a, 5)
#从一维数组aTo extract5Number of one-dimensional arrayb
#注意:a必须是一维的
print(b)

结果:

[1 3 7 5 7 5 1 7]
[5 7 7 5 1]

二、权重的初始值

What kind of initial value setting is related to neural network learning success.

1. The initial weights value cannot be set as0

The initial weight value must not be set as0

之前一段时间,The initial weights value set to:

import numpy as np
a = 0.01 * np.random.randn(10, 100)

补充:np.random.randn的介绍:

import numpy as np
a = 0.01 * np.random.randn(2, 4, 3)#2*3*4的数组,It indicates that the generated array dimension
b = np.random.randn(2, 4)
print(f'a is {a}')
print(f'b is {b}')

 结果:

a is [[[-0.01141521 0.00021992 -0.00668211]
[-0.00799102 -0.01430591 0.00065054]
[ 0.00253524 -0.01118892 -0.01097236]
[-0.00580513 0.00963655 -0.00336067]]
[[ 0.00232957 -0.00983508 0.00066577]
[-0.01303359 0.02022611 -0.00138892]
[-0.00026297 -0.00356707 -0.01244644]
[ 0.00965091 0.00946335 0.00834518]]]
b is [[ 1.3743193 -1.40996427 0.11132154 -0.37661421]
[ 0.61963745 -0.37448273 -0.69203084 -1.4140828 ]]

 


总结

Don't particularly understand about parameter updating methods personally,But in the specific application do not seem to need to understand the mechanism of.Therefore no longer stay here too much.The next section focuses on the initial value of weights.

版权声明:本文为[Rachel MuZy]所创,转载请带上原文链接,感谢。 https://pythonmana.com/2022/252/202209090111011977.html