Bayesian Optimization Bayesian Optimization in the case of no demand guide , A series of design strategies for finding the global optimal solution of a black box function .(Wikipedia)
The simplest , The way to get the best solution , It's grid search Grid Search 了 .
If the grid search costs a little bit more , You can try random search Random Search.
If it's convex Convex Function, We can use Gradient Descent. A lot of machine learning algorithms , All used this . Such as linear regression , Logical regression, etc .
If , This black box function is very expensive , It's not a convex function , We consider Bayesian optimization .
Bayesian Optimization we call this black box function Objective function Objective Function. Because the cost of the objective function is huge , We're going to find him an approximate function , This function is called Proxy functions Surrogate Function. The proxy function calculates an average curve and the corresponding standard deviation (Standard Deviation). There's a proxy function , We can find a place to explore . This process , Use one Get function Acquisition Function To realize .
Bayesian optimization , It's in a particular search space search space In the .
The whole process is as follows :
The flow chart in Chinese and English is as follows :
Proxy functions , It's usually used Gussian Process.
Acquisition Function There are many choices . Common are :
Here we use UCB.
a u c b ( x ; β ) = μ ( x ) + β σ ( x ) a_{ucb}(x;\beta) =\mu(x) + \beta\sigma(x) aucb(x;β)=μ(x)+βσ(x)
β > 0 \beta>0 β>0, Weight. , Adjust the importance of mean and standard deviation . This is a exploitation vs exploration The problem of . The average value is exploitation, The standard deviation is exploration.
First , We define the search space as [0,1], The objective function is :
def objective(x):
return ((x-0.47)**2 * math.sin(3 * x))
Here's the picture
here , I deliberately made two local optima , such , You can't use Gradient Decent Solved .
Proxy functions , Directly with the sklearn Inside Gussian Process
surrogate = GaussianProcessRegressor()
Acquisition function , It was used UCB.beta No more research here .
#uppper confidence bound
#beta = 1
def acquisition(X, surrogate):
yhat, std = surrogate.predict(X, return_std=True)
yhat=yhat.flatten()
upper=yhat+std
max_at=np.argmax(upper)
return X[max_at]
In the beginning , The sample chooses two extremum of search space . namely [0,1]. At this time , The proxy function is a straight line ,std Big in the middle , Small on both sides .acquisition The function selects 0.55( The maximum value of the dotted line above , Project to the objective function ) As the next exploration point .
At this time , We have three sample points . Proxy function and next sample point (next sample) as follows :
Continue iteration :
After four iterations , We found the maximum 0.0811051.
Hyperopt We can only find the minimum value , Let's put a minus sign in front of the objective function .
from hyperopt import fmin, tpe, hp
best = fmin(
fn=lambda x:-objective(x),
space=hp.uniform('x', 0, 1),
algo=tpe.suggest,
max_evals=100)
print(best)
The maximum value after a hundred iterations is 0.08111392201180685
Source code address :
https://github.com/juwikuang/machine_learning_step_by_step/blob/master/bayesian_optimization.ipynb
Project address :