What is a Decision Tree for Regression?
They help us predict continuous values.
How Does It Work?
- Start at the Root: Begin with the entire dataset at the root node.
- Split the Data: Choose the best feature to split the data into two groups. The goal is to minimize the error in each group.
- Repeat: Continue splitting each group until you meet a stopping criterion (like a maximum depth or minimum samples per leaf).
- Make Predictions: The value at each leaf node is the predicted value for data points that fall into that leaf.
Example with Calculation
Let's take a simple example to illustrate how a decision tree works for regression.
Dataset
Consider a small dataset of house prices based on the size of the house, first, sort the data, By sorting the data and evaluating potential split points, a decision tree for regression accurately predicts continuous values. :
House Size (sq ft) | Price (in $1000) |
---|
1100 | 199 |
1400 | 245 |
1425 | 319 |
1550 | 219 |
1600 | 312 |
1700 | 279 |
1700 | 255 |
1875 | 308 |
2350 | 405 |
2450 | 324 |
Step-by-Step Calculation
Calculate the Mean Squared Error (MSE) for the Root Node:
Evaluate Potential Split Points:
Evaluate potential splits between each pair of consecutive house sizes. Let's consider a split between 1550 and 1600 sq ft (this is the midpoint of these values).
Split at 1575 sq ft
- Left Split (size ≤ 1575):
- Right Split (size > 1575):
- Calculate the Weighted MSE for the Split:
Compare this with the MSE of the root node (4048.25). Since 2634.9 < 4048.25, the split improves the prediction.
Further Splits
Let's consider another split on the left and right nodes for more depth.
Split on Left Node at 1412.5 sq ft (midpoint between 1400 and 1425):
- Left-Left Split (size ≤ 1412.5):
- Left-Right Split (size > 1412.5):
Split on Right Node at 2012.5 sq ft (midpoint between 1875 and 2350):
- Right-Left Split (size ≤ 2012.5):
- Right-Right Split (size > 2012.5):
Final Predictions
The final tree and predictions are:
- Left-Left Split (size ≤ 1412.5):
Predicted Price=222
- Left-Right Split (1412.5 < size ≤ 1575):
Predicted Price=269
- Right-Left Split (1575 < size ≤ 2012.5):
Predicted Price=288.5
- Right-Right Split (size > 2012.5):
Predicted Price=364.5
This example demonstrates how to use a decision tree to predict house prices based on their sizes, ensuring that each split improves the prediction accuracy.
![](https://blogger.googleusercontent.com/img/a/AVvXsEiEKLE-WT5vxR7FYWvOAmcKowXv9EkKNlHdBLk2R29mLUdTgAkv4ouRfehyXFEiPZWfZM2mccKOv6h-Es8baZuET2JcRv4ptLnywbrNvyUZNsQKHQJIjZorQmKTKEQ-rPDVnNJaicXPZbi73L3tGbwYa-Fja-oABj2cpBgHEvFtPOFo6T-Ndv8OzOimCPw=w418-h284)
Test Case :
Qs: Predict the value for a house size of 1700 sq ft
To predict the value for a house size of 1700 sq ft using the provided decision tree, follow the path of the tree:
Starting at the root:
- Check if the house size (1700) is less than or equal to 1575.
- Since 1700 is greater than 1575, follow the "NO" branch.
Next node:
- Check if the house size (1700) is less than or equal to 2012.5.
- Since 1700 is less than or equal to 2012.5, follow the "YES" branch.
Final prediction:
- The predicted price for the house size of 1700 sq ft falls under the node with the predicted price of 288.5.
Therefore, the predicted price for a house size of 1700 sq ft is $288,500.
Comments
Post a Comment