## A series of handwriting algorithms for Python machine learning -- Bayesian Optimization

There are several evidences 2020-11-13 12:41:26
series handwriting algorithms python machine

Bayesian Optimization Bayesian Optimization in the case of no demand guide , A series of design strategies for finding the global optimal solution of a black box function .（Wikipedia）

# The problem of optimal solution

The simplest , The way to get the best solution , It's grid search Grid Search 了 .

If the grid search costs a little bit more , You can try random search Random Search.

If it's convex Convex Function, We can use Gradient Descent. A lot of machine learning algorithms , All used this . Such as linear regression , Logical regression, etc .

If , This black box function is very expensive , It's not a convex function , We consider Bayesian optimization .

# Bayesian optimization concept

Bayesian Optimization we call this black box function Objective function Objective Function. Because the cost of the objective function is huge , We're going to find him an approximate function , This function is called Proxy functions Surrogate Function. The proxy function calculates an average curve and the corresponding standard deviation （Standard Deviation）. There's a proxy function , We can find a place to explore . This process , Use one Get function Acquisition Function To realize .

Bayesian optimization , It's in a particular search space search space In the .

The whole process is as follows ：

1. In search space , Choose a few starting points X
2. Calculate the initial point with the objective function X The corresponding solution y
3. Update proxy function
4. adopt acquisition function Get the next sample point .
5. Goto 2

The flow chart in Chinese and English is as follows ： Proxy functions , It's usually used Gussian Process.

Acquisition Function There are many choices . Common are ：

1. Probability of Improvement (PI).
2. Expected Improvement (EI).
3. Upper/Lower Confidence Bound (LCB/UCB).

Here we use UCB.

a u c b ( x ; β ) = μ ( x ) + β σ ( x ) a_{ucb}(x;\beta) =\mu(x) + \beta\sigma(x)

β > 0 \beta>0 , Weight. , Adjust the importance of mean and standard deviation . This is a exploitation vs exploration The problem of . The average value is exploitation, The standard deviation is exploration.

# Code

First , We define the search space as [0,1], The objective function is ：

def objective(x):
return ((x-0.47)**2 * math.sin(3 * x))


Here's the picture here , I deliberately made two local optima , such , You can't use Gradient Decent Solved .

Proxy functions , Directly with the sklearn Inside Gussian Process

surrogate = GaussianProcessRegressor()


Acquisition function , It was used UCB.beta No more research here .

#uppper confidence bound
#beta = 1
def acquisition(X, surrogate):
yhat, std = surrogate.predict(X, return_std=True)
yhat=yhat.flatten()
upper=yhat+std
max_at=np.argmax(upper)
return X[max_at]


In the beginning , The sample chooses two extremum of search space . namely [0,1]. At this time , The proxy function is a straight line ,std Big in the middle , Small on both sides .acquisition The function selects 0.55（ The maximum value of the dotted line above , Project to the objective function ） As the next exploration point . At this time , We have three sample points . Proxy function and next sample point （next sample） as follows ： Continue iteration ：   After four iterations , We found the maximum 0.0811051.

# use Hyperopt Explain Bayesian Optimization

Hyperopt We can only find the minimum value , Let's put a minus sign in front of the objective function .

from hyperopt import fmin, tpe, hp
best = fmin(
fn=lambda x:-objective(x),
space=hp.uniform('x', 0, 1),
algo=tpe.suggest,
max_evals=100)
print(best)


The maximum value after a hundred iterations is 0.08111392201180685