Random forest
Before moving forward with the to-do list, let’s throw a Random Forest to it.
Random Forest
For many reasons, Random Forest is usually a very good baseline model. In this particular case I started with the polynomial OLS as baseline model, just because it was so evident from the correlations that the relationship between temperature and consumption follows a polynomial shape. But let’s go back to a beloved RF.
Metrics
Scatter plot matrix
Observed vs. Predicted and Residuals vs. Predicted
Check for …
check the residuals to assess the goodness of fit.
- white noise or is there a pattern?
- heteroscedasticity?
- non-linearity?
Normality of Residuals:
Check for …
- Are residuals normally distributed?
Leverage
Scale-Location plot
Residuals Autocorrelation Plot
Residuals vs Time
Well, not that bad, but it is overfitting quite a lot.
Tuning using the same brute-force approach
Best model
Compare vanilla vs. tuned
TODOs
to complete