java - Neural Network with backpropogation not converging -

basically i'm trying implement backpropogation in network. know backpropogation algorithm hard coded, i'm trying make functional first.

it works 1 set of inputs , outputs beyond 1 training set network converges on 1 solution while other output converges on 0.5.

i.e output 1 trial is: [0.9969527919933012, 0.003043774988797313]

[0.5000438200377985, 0.49995612243030635]

network.java

private arraylist<arraylist<arraylist<double>>> weights; private arraylist<arraylist<double>> nodes;  private final double learning_rate = -0.25; private final double default_node_value = 0.0;  private double momentum = 1.0;  public network() {     weights = new arraylist<arraylist<arraylist<double>>>();     nodes = new arraylist<arraylist<double>>(); }  /**  * method used add layer {@link n} nodes network.  * @param n number of nodes layer  */ public void addlayer(int n) {     nodes.add(new arraylist<double>());     (int = 0;i < n;i++)         nodes.get(nodes.size()-1).add(default_node_value); }  /**  * method generates weights used link layers together.  */ public void createweights() {     // there weights between layers, have 1 less weight layer node layer     (int = 0;i < nodes.size()-1;i++) {         weights.add(new arraylist<arraylist<double>>());          // each node above weight         (int j = 0;j < nodes.get(i).size();j++) {             weights.get(i).add(new arraylist<double>());              // each node below weight             (int k = 0;k < nodes.get(i+1).size();k++)                 weights.get(i).get(j).add(math.random()*2-1);         }     } }  /**  * utilizes differentiated sigmoid function change weights in network  * @param out   desired output pattern network  */ private void propogatebackward(double[] out) {     /*      * error calculation using squared error formula , sigmoid derivative      *       * output node : dk = ok(1-ok)(ok-tk)      * hidden node : dj = oj(1-oj)summationkek(dkwjk)      *       * k output node      * j hidden node      *       * dw = learning_rate*d*outputofpreviouslayer(not weighted)      * w = dw + w      */      // update last layer of weights first because special case      double dkw = 0;      (int = 0;i < nodes.get(nodes.size()-1).size();i++) {          double outputk = nodes.get(nodes.size()-1).get(i);         double deltak = outputk*(1-outputk)*(outputk-out[i]);          (int j = 0;j < nodes.get(nodes.size()-2).size();j++) {             weights.get(1).get(j).set(i, weights.get(1).get(j).get(i) + learning_rate*deltak*nodes.get(nodes.size()-2).get(j) );             dkw += deltak*weights.get(1).get(j).get(i);         }     }      (int = 0;i < nodes.get(nodes.size()-2).size();i++) {          //hidden node : dj = oj(1-oj)summationkek(dkwjk)         double outputj = nodes.get(1).get(i);         double deltaj = outputj*(1-outputj)*dkw*learning_rate;          (int j = 0;j < nodes.get(0).size();j++) {             weights.get(0).get(j).set(i, weights.get(0).get(j).get(i) + deltaj*nodes.get(0).get(j) );         }       }  }  /**  * propogates array of input values through network  * @param in    array of inputs  */ private void propogateforward(double[] in) {     // pass weights input layer     (int = 0;i < in.length;i++)         nodes.get(0).set(i, in[i]);      // propagate through rest of network     // each layer after first layer     (int = 1;i < nodes.size();i++)          // each node in layer         (int j = 0;j < nodes.get(i).size();j++) {              // each node in previous layer             (int k = 0;k < nodes.get(i-1).size();k++)                  // add node weighted output k j                 nodes.get(i).set(j, nodes.get(i).get(j)+weightednode(i-1, k, j));              // once node has received of inputs can apply activation function             nodes.get(i).set(j, activation(nodes.get(i).get(j)));          }    }  /**  * method returns activation value of input  * @param   in total input of node  * @return  sigmoid function @ input  */ private double activation(double in) {     return 1/(1+math.pow(math.e,-in)); }  /**  * weighted output node.  * @param layer layer transmitting node on  * @param node  index of transmitting node  * @param previousnode  index of receiving node  * @return  output of transmitting node times weight between 2 nodes  */ private double weightednode(int layer, int node, int nextnode) {     return nodes.get(layer).get(node)*weights.get(layer).get(node).get(nextnode); }  /**  * method resets of nodes default value  */ private void resetnodes() {     (int = 0;i < nodes.size();i++)         (int j = 0;j < nodes.get(i).size();j++)             nodes.get(i).set(j, default_node_value); }  /**  * teach network correct responses input values.  * @param in    array of input values  * @param out   array of desired output values  * @param n     number of iterations perform  */ public void train(double[] in, double[] out, int n) {     (int = 0;i < n;i++) {         propogateforward(in);         propogatebackward(out);         resetnodes();     } }  public void getresult(double[] in) {     propogateforward(in);     system.out.println(nodes.get(2));     resetnodes(); }

snapsolve.java

public snapsolve() {      network net = new network();     net.addlayer(2);     net.addlayer(4);     net.addlayer(2);     net.createweights();      double[] l = {0, 1};     double[] p = {1, 0};      double[] n = {1, 0};     double[] r = {0, 1};      for(int = 0;i < 100000;i++) {         net.train(l, p, 1);         net.train(n, r, 1);     }      net.getresult(l);     net.getresult(n);  }  public static void main(string[] args) {     new snapsolve(); }

suggestions

the initial weights you're using in network pretty large. typically want initialize weights in sigmoid-activation neural network proportionally inverse of square root of fan-in of unit. so, units in layer of network, choose initial weights between positive , negative n^{-1/2}, n number of units in layer i-1. (see http://www.willamette.edu/~gorr/classes/cs449/precond.html more information.)
the learning rate parameter seem using large, can cause network "bounce around" during training. i'd experiment different values this, on log scale: 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, ... until find 1 appears work better.
you're training on 2 examples (though network you're using should able model these 2 points easily). can increase diversity of training dataset adding noise existing inputs , expecting network produce correct output. i've found helps when using squared-error loss (like you're using) , trying learn binary boolean operator xor, since there few input-output pairs in true function domain train with.

monitoring

also, i'd make general suggestion might in approach problems this: add little bit of code allow monitor current error of network when given known input-output pair (or entire "validation" dataset).

if can monitor error of network during training, see more when network converging -- error should decrease steadily train network. if bounces around, you'll know you're either using large learning rate or need otherwise adapt training dataset. if error increases, wrong gradient computations.

Search This Blog

Bradly