top of page

What is Classification And Regression In Machine Learning? Get Help In Regression And Classification

Classification and Regression

Decision Trees can be used for both.


  • Spam / not Spam

  • Admit to ICU /not

  • Lend money / deny

  • Intrusion detections


  • Predict stock returns

  • Pricing a house or a car

  • Weather predictions (temp, rain fall etc)

  • Economic growth predictions

  • Predicting sports scores

Decision Trees

  • The general idea is that we will segment the space into a number of simple regions.

  • The segmentation can be illustrated as a tree

  • The end nodes can have a category (classification) or a continuous number (regression)

  • These methods, while quite simple are very powerful.

Visualizing Classification as a Tree


  • Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best splits the set of items.

  • Different algorithms use different metrics for measuring “best"

  • These metrics measure how similar a region or a node is. They are said to measure the impurity of a region.

  • Larger these impurity metrics the larger the “dissimilarity” of a nodes/regions data.

  • Examples: Gini impurity, Entropy, Variance

Algorithms for building Decision Trees

Popular ones include

  • CART (Classification And Regression Tree)

  • CHAID (CHi-squared Automatic Interaction Detector)

  • C4.5

We will focus on CART, that uses the Gini impurity as its impurity measure.

Gini impurity

  • Used by the CART

  • Is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

Splitting using Gini impurity

  • When splitting, the Gini impurity of the two resulting nodes are combined using a weighted average.

  • With weights being the fraction of data on each node.

  • The CART algorithm simply chooses the right “split” by finding the split that maximizes the “decrease in Gini impurity” - also called the Gini Gain.

Decision trees are prone to ‘overfitting’

  • Decision Tree is a powerful algorithm that can adapt well and capture various patterns in the data

  • If allowed to grow fully, they become over-complex & tend to fit even the noise 10

  • Thus, a fully grown tree may not ‘generalize’ well on test or new unseen data


  • Ideally we would like a tree that does not over-fit the given data

  • One popular and simple way to prune a decision tree is by limiting the depth of the tree to avoid over fitting.

  • For example the tree on the right below is generated with a max depth of 2 while the tree on the left has no depth restriction (and hence overfits the data)


  • Stop growing the tree before it grows too big

  • This can be achieved by bounding hyperparameters 12 Hyperparameters in Decision Trees

- max_depth - The maximum depth of the tree. If set to ‘None’, then nodes are

expanded until all leaves are pure. Higher the value, more complex the tree

- min_samples_split - The minimum number of samples required to split a node.

Doesn’t split any node that is smaller than this number. Higher the values, less

complex the tree

- min_samples_leaf - The minimum number of samples required at a leaf node. All

leaf nodes have at least these many data points. Higher the value, less

complex the tree

Hyperparameter Tuning using Grid Search

  • Grid Search is a process of searching the best combination of hyperparameters from a predefined set of values

  • A parameter grid (Hyperparameters and corresponding values) is provided as an input to the Grid-search function

  • It tries all the combinations of the values passed and evaluates the model for each combination

  • It returns the combination of hyperparameter values that works best as per the metric provided for model evaluation

  • GridSearchCV( ) is an implementation of Grid Search with Cross Validation.

Cross Validation

- Cross Validation is a common Machine Learning technique that splits the data into n non-overlapping groups, and runs n experiments:

  • In each experiment, n-1 groups are used to train a model and the model is tested on the left out group.

  • The results are summarized over the n experiments.

- It gives a mechanism that allows us to test a model repeatedly on data that was not used to build the model.

- For Decision Trees, a very common approach is simply to choose the tree with minimum cross validation error.

Post-Pruning: Cost-complexity pruning

- Starting from the Full tree, create a sequence of trees that are sequentially smaller (pruned)

- At each step the algorithm

  • try removing each possible subtree

  • find the ‘relative error decrease per node’ for that subtree - Complexity parameter,

  • And remove the subtree with the minimum

- With the list of subtrees, one usually reverts back to using crossvalidation errors to find the best final pruned tree

Few Additional Thoughts

  • Other impurity measures

  • Regression Trees

  • Pros and Cons

Impurity Measures in Decision Trees

Regression Trees

Pros and Cons of Decision Trees

Pros -

  • Easy to understand and interpret

  • Useful in data exploration as it gives the splitting based on the significance of variables • Not influenced by the outlier/Null values and hence requires less data cleaning. Requires less time and effort during data pre-processing than other algorithms.

  • Can handle both continuous and categorical variables

  • Does not require any underlying assumptions in data. Works with both linearly and nonlinearly related variables.


  • A small change in the data-set can result in large change in the structure of the decision tree causing instability in the model.

  • Large trees can be difficult to interpret.

  • Tends to overfit.

bottom of page