Random forest

Before moving forward with the to-do list, let’s throw a Random Forest to it.

Random Forest

For many reasons, Random Forest is usually a very good baseline model. In this particular case I started with the polynomial OLS as baseline model, just because it was so evident from the correlations that the relationship between temperature and consumption follows a polynomial shape. But let’s go back to a beloved RF.

Metrics

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

Well, not that bad, but it is overfitting quite a lot.

Tuning using the same brute-force approach

Best model

Compare vanilla vs. tuned

TODOs

to complete