Decision Tree

Expected entropy of a categorical attribute with probability distribution Π :
H(Π)= -Σ(Π log Π)

e.g. for a training set containing p positive and n negative examples, we have:
H(p/(p+n),n/(p+n)) = - p/(p+n) log(p/(p+n)) - n/(p+n) log(n/(p+n))

How to pick attributes?
attribute  A , with  K distinct values, divides the training set  E into subsets  E 1 , ... ,  E K .

Expected Entropy remaining after trying attribute  A (with branches  i=1,2,..., K ):
EH(A) = Σ^K_{k=1} (p_k+n_k)/(p+n) H(p_k/(p_k+n_k), n_k/(p_k+n_k))
is each entropy multiplied by the proportion of that categorical attribute value

where, p_k+n_k is the # of nodes (positive or negative) in kth child

Information Gain for this attribute is:

Pick the attribute with largest I(A)!

Once the attribute is set move down to each child and repeat the process.

if data is continuoustrat each point as cut points. use projection of the points in x direction and y direction as attribtues

in continuous case pick random x,y values as attributes and whther value is greater or smaller send it to left, right respectively

use highest information gain as before to decide which attribute first.

but if we had 20,000 dimensional vectors we would go for random forrest

if you want to do unsupervised and figureout clusters using randomforrest you can try to fit gaussian for each attribute and each one: fit a gausian (mean, variance) to each side, with highest information is selected

To avoid overfitting
if # records < threshold
if information gain < threshold
start bottom up and remove node if
its impact is less than threshold
doesn't change accuracy or its removal increases validation set accuracy

for continuous features, sort, all availblevalues or mean ofconsecutive values