Conjugate Gradient

Table of contents


Algorithm Description

The Nonlinear Conjugate Gradient (CG) algorithm is used solve to optimization problems of the form

\[\min_{x \in X} f(x)\]

where \(f\) is convex and (at least) once differentiable. The algorithm requires the gradient function to be known.

The updating rule for CG is described below. Let \(x^{(i)}\) denote the function input values at stage \(i\) of the algorithm.

  1. Compute the descent direction using:

\[d^{(i)} = - [\nabla_x f(x^{(i)})] + \beta^{(i)} d^{(i-1)}\]
  1. Determine if \(\beta^{(i)}\) should be reset (to zero), which occurs when

\[\dfrac{| [\nabla f(x^{(i)})]^\top [\nabla f(x^{(i-1)})] |}{ [\nabla f(x^{(i)})]^\top [\nabla f(x^{(i)})] } > \nu\]

where \(\nu\) is set via cg_settings.restart_threshold.

  1. Compute the optimal step size using line search:

\[\alpha^{(i)} = \arg \min_{\alpha} f(x^{(i)} + \alpha \times d^{(i)})\]
  1. Update the candidate solution vector using:

\[x^{(i+1)} = x^{(i)} + \alpha^{(i)} \times d^{(i)}\]

The algorithm stops when at least one of the following conditions are met:

  1. the norm of the gradient vector, \(\| \nabla f \|\), is less than grad_err_tol;

  2. the relative change between \(x^{(i+1)}\) and \(x^{(i)}\) is less than rel_sol_change_tol;

  3. the total number of iterations exceeds iter_max.


Updating Rules

  • cg_settings.method = 1 Fletcher–Reeves (FR):

    \[\beta_{\text{FR}} = \dfrac{ [\nabla_x f(x^{(i)})]^\top [\nabla_x f(x^{(i)})] }{ [\nabla_x f(x^{(i-1)})]^\top [\nabla_x f(x^{(i-1)})] }\]
  • cg_settings.method = 2 Polak-Ribiere (PR):

    \[\beta_{\text{PR}} = \dfrac{ [\nabla_x f(x^{(i)})]^\top [\nabla_x f(x^{(i)})] }{ [\nabla_x f(x^{(i-1)})]^\top [\nabla_x f(x^{(i-1)})] }\]
  • cg_settings.method = 3 FR-PR Hybrid:

    \[\beta = \begin{cases} - \beta_{\text{FR}} & \text{ if } \beta_{\text{PR}} < - \beta_{\text{FR}} \\ \beta_{\text{PR}} & \text{ if } |\beta_{\text{PR}}| \leq \beta_{\text{FR}} \\ \beta_{\text{FR}} & \text{ if } \beta_{\text{PR}} > \beta_{\text{FR}} \end{cases}\]
  • cg_settings.method = 4 Hestenes-Stiefel:

    \[\beta_{\text{HS}} = \dfrac{[\nabla_x f(x^{(i)})] \cdot ([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})])}{([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})]) \cdot d^{(i)}}\]
  • cg_settings.method = 5 Dai-Yuan:

    \[\beta_{\text{DY}} = \dfrac{[\nabla_x f(x^{(i)})] \cdot [\nabla_x f(x^{(i)})]}{([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})]) \cdot d^{(i)}}\]
  • cg_settings.method = 6 Hager-Zhang:

    \[\begin{aligned} \beta_{\text{HZ}} &= \left( y^{(i)} - 2 \times \dfrac{[y^{(i)}] \cdot y^{(i)}}{y^{(i)} \cdot d^{(i)}} \times d^{(i)} \right) \cdot \dfrac{[\nabla_x f(x^{(i)})]}{y^{(i)} \cdot d^{(i)}} \\ y^{(i)} &:= [\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})] \end{aligned}\]

Finally, we set:

\[\beta^{(i)} = \max \{ 0, \beta_{*} \} \]

where \(\beta_{*}\) is the update method chosen.


Function Declarations

bool cg(ColVec_t &init_out_vals, std::function<fp_t(const ColVec_t &vals_inp, ColVec_t *grad_out, void *opt_data)> opt_objfn, void *opt_data)

The Nonlinear Conjugate Gradient (CG) Optimization Algorithm.

Parameters
  • init_out_vals – a column vector of initial values, which will be replaced by the solution upon successful completion of the optimization algorithm.

  • opt_objfn – the function to be minimized, taking three arguments:

    • vals_inp a vector of inputs;

    • grad_out a vector to store the gradient; and

    • opt_data additional data passed to the user-provided function.

  • opt_data – additional data passed to the user-provided function

Returns

a boolean value indicating successful completion of the optimization algorithm.

bool cg(ColVec_t &init_out_vals, std::function<fp_t(const ColVec_t &vals_inp, ColVec_t *grad_out, void *opt_data)> opt_objfn, void *opt_data, algo_settings_t &settings)

The Nonlinear Conjugate Gradient (CG) Optimization Algorithm.

Parameters
  • init_out_vals – a column vector of initial values, which will be replaced by the solution upon successful completion of the optimization algorithm.

  • opt_objfn – the function to be minimized, taking three arguments:

    • vals_inp a vector of inputs;

    • grad_out a vector to store the gradient; and

    • opt_data additional data passed to the user-provided function.

  • opt_data – additional data passed to the user-provided function.

  • settings – parameters controlling the optimization routine.

Returns

a boolean value indicating successful completion of the optimization algorithm.


Optimization Control Parameters

The basic control parameters are:

  • fp_t grad_err_tol: the error tolerance value controlling how small the \(L_2\) norm of the gradient vector \(\| \nabla f \|\) should be before ‘convergence’ is declared.

  • fp_t rel_sol_change_tol: the error tolerance value controlling how small the proportional change in the solution vector should be before ‘convergence’ is declared.

    The relative change is computed using:

    \[\left\| \dfrac{x^{(i)} - x^{(i-1)}}{ |x^{(i-1)}| + \epsilon } \right\|_1\]

    where \(\epsilon\) is a small number added for numerical stability.

  • size_t iter_max: the maximum number of iterations/updates before the algorithm exits.

  • bool vals_bound: whether the search space of the algorithm is bounded. If true, then

    • ColVec_t lower_bounds: defines the lower bounds of the search space.

    • ColVec_t upper_bounds: defines the upper bounds of the search space.

Additional settings:

  • int cg_settings.method: Update method.

    • Default value: 2.

  • fp_t cg_settings.restart_threshold: parameter \(\nu\) from step 2 in the algorithm description.

    • Default value: 0.1.

  • bool use_rel_sol_change_crit: whether to enable the rel_sol_change_tol stopping criterion.

    • Default value: false.

  • fp_t cg_settings.wolfe_cons_1: Line search tuning parameter that controls the tolerance on the Armijo sufficient decrease condition.

    • Default value: 1E-03.

  • fp_t cg_settings.wolfe_cons_2: Line search tuning parameter that controls the tolerance on the curvature condition.

    • Default value: 0.10.

  • int print_level: Set the level of detail for printing updates on optimization progress.

    • Level 0: Nothing (default).

    • Level 1: Print the iteration count and current error values.

    • Level 2: Level 1 plus the current candidate solution values, \(x^{(i+1)}\).

    • Level 3: Level 2 plus the direction vector, \(d^{(i)}\), and the gradient vector, \(\nabla_x f(x^{(i+1)})\).

    • Level 4: Level 3 plus \(\beta^{(i)}\).


Examples

Sphere Function

Code to run this example is given below.

Armadillo (Click to show/hide)

#define OPTIM_ENABLE_ARMA_WRAPPERS
#include "optim.hpp"

inline
double
sphere_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
{
    double obj_val = arma::dot(vals_inp,vals_inp);

    if (grad_out) {
        *grad_out = 2.0*vals_inp;
    }

    return obj_val;
}

int main()
{
    const int test_dim = 5;

    arma::vec x = arma::ones(test_dim,1); // initial values (1,1,...,1)

    bool success = optim::cg(x, sphere_fn, nullptr);

    if (success) {
        std::cout << "cg: sphere test completed successfully." << "\n";
    } else {
        std::cout << "cg: sphere test completed unsuccessfully." << "\n";
    }

    arma::cout << "cg: solution to sphere test:\n" << x << arma::endl;

    return 0;
}

Eigen (Click to show/hide)

#define OPTIM_ENABLE_EIGEN_WRAPPERS
#include "optim.hpp"

inline
double
sphere_fn(const Eigen::VectorXd& vals_inp, Eigen::VectorXd* grad_out, void* opt_data)
{
    double obj_val = vals_inp.dot(vals_inp);

    if (grad_out) {
        *grad_out = 2.0*vals_inp;
    }

    return obj_val;
}

int main()
{
    const int test_dim = 5;

    Eigen::VectorXd x = Eigen::VectorXd::Ones(test_dim); // initial values (1,1,...,1)

    bool success = optim::cg(x, sphere_fn, nullptr);

    if (success) {
        std::cout << "cg: sphere test completed successfully." << "\n";
    } else {
        std::cout << "cg: sphere test completed unsuccessfully." << "\n";
    }

    std::cout << "cg: solution to sphere test:\n" << x << std::endl;

    return 0;
}

Booth’s Function

Code to run this example is given below.

Armadillo Code (Click to show/hide)

#define OPTIM_ENABLE_ARMA_WRAPPERS
#include "optim.hpp"

inline
double
booth_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
{
    double x_1 = vals_inp(0);
    double x_2 = vals_inp(1);

    double obj_val = std::pow(x_1 + 2*x_2 - 7.0,2) + std::pow(2*x_1 + x_2 - 5.0,2);

    if (grad_out) {
        (*grad_out)(0) = 10*x_1 + 8*x_2   2*(- 7.0) + 4*(x_2 - 5.0);
        (*grad_out)(1) = 2*(x_1 + 2*x_2 - 7.0)*2 + 2*(2*x_1 + x_2 - 5.0);
    }

    return obj_val;
}

int main()
{
    arma::vec x_2 = arma::zeros(2,1); // initial values (0,0)

    bool success_2 = optim::cg(x, booth_fn, nullptr);

    if (success_2) {
        std::cout << "cg: Booth test completed successfully." << "\n";
    } else {
        std::cout << "cg: Booth test completed unsuccessfully." << "\n";
    }

    arma::cout << "cg: solution to Booth test:\n" << x_2 << arma::endl;

    return 0;
}

Eigen Code (Click to show/hide)

#define OPTIM_ENABLE_EIGEN_WRAPPERS
#include "optim.hpp"

inline
double
booth_fn(const Eigen::VectorXd& vals_inp, Eigen::VectorXd* grad_out, void* opt_data)
{
    double x_1 = vals_inp(0);
    double x_2 = vals_inp(1);

    double obj_val = std::pow(x_1 + 2*x_2 - 7.0,2) + std::pow(2*x_1 + x_2 - 5.0,2);

    if (grad_out) {
        (*grad_out)(0) = 2*(x_1 + 2*x_2 - 7.0) + 2*(2*x_1 + x_2 - 5.0)*2;
        (*grad_out)(1) = 2*(x_1 + 2*x_2 - 7.0)*2 + 2*(2*x_1 + x_2 - 5.0);
    }

    return obj_val;
}

int main()
{
    Eigen::VectorXd x = Eigen::VectorXd::Zero(test_dim); // initial values (0,0)

    bool success_2 = optim::cg(x, booth_fn, nullptr);

    if (success_2) {
        std::cout << "cg: Booth test completed successfully." << "\n";
    } else {
        std::cout << "cg: Booth test completed unsuccessfully." << "\n";
    }

    std::cout << "cg: solution to Booth test:\n" << x_2 << std::endl;

    return 0;
}