stty consulting › our future

Decision Trees & Neural Nets

What is decision tree induction?


Decision tree induction is a technique used to create a model to classify inputs into discrete values. The induction algorithm is data driven hence uses a set of predefined and classified training data. As the training data is controlled this type of learning is regarded as a supervised method.

Decision tree induction is a method in which the input data is subdivided by recursively searching for the most effective split. As the attributes could be over a continuous nature the recursive technique would need to deduce the most suitable value based on the initial training data.

After every subdivision the decision tree induction function will grow the branches by treating this node as the root of the new tree. The process will be repeated subdividing the data, until all the values in the node is of a single classification.

If the tree is allowed to grow uncontrolled the partitioning can easily become far too large and be an over-fit to the training data. With this in mind one must be wary of defining the initial split point. As the more compact the tree the quick a result classification to the input with be returned.

Some of the major factors in the induction of a poor Decision tree are through back data. If the training set has missing or bad data the rules deduced will have varying levels of validity. As the only knowledge the induction technique will have to base it decisions partitioning on is the training data the cleanliness is crucial.

As mentioned earlier, the input may be a continuous value. This attribute will cause problems as the inductive engine searches for the best-fit groupings. How will be able to assess the validity of the chosen split points without over-fitting our model to the training data? The questions of where do we split and how large should the group be, even do all the groups need to be the same size, i.e. 10-19, 20-29,etc. Could we get better results if we used more or less partitions?



What is a hidden layer of neutrons in a neural net?


A hidden layer in a neural net is a set of nodes between the input and output nodes of the network. These hidden nodes are the extra knowledge used to arrive at the resultant output. The nodes within the hidden layer are where the actual computation is carried out. If the inputs were directly connected to the output nodes we would find that the neural net would only be able to supply simple classifications. With the introduction of the intermediate layers the input signal can be passed through algorithms or calculations.

In the hidden layer each path between the nodes is given a 'weighting', which is a multiply to define, or set, the signal strength. The greater the signal strength the greater the importance the knowledge to the particular input. Therefore the pathway signal multiplier represents the importance of the pathway. As the signals arrive at the next node the weighted values will be combined and compared against the node's activation threshold. If the value exceeds the threshold the node will be triggered and thus produce an output signal.

This resultant output value will be passed through the network to the output nodes. As the input propagates its way around the network the hidden layer of nodes will be applying a series of rules or knowledge to the signal and forcing it down weighted paths. In a learning neural net the most often used paths will have the weightings increased to raise the importance of these pieces of knowledge. The neural net will over time and extended use become more and more accurate and reliable as the learning methods adjust the path weightings to best reflect the domain in which in resides.

One of the most resent uses in the data arena is a technology known as Data Mining. In Data Mining the neural net is used to discover patterns within very large data structures. As neural net are able to classify input into groups without a linear boundary. The weighting aspect of the neural net allows the modeller to define the importance of any such pattern thus when a pattern is highlighted the analyst will be assured as to its level of interestingness.

Another big advantage in using neural net in pattern recognition in very large data stores is the parallel processing nature. In order to process the vast quantities of information quickly you would wish to apply as much parallel methods as possible,