Baseline Model

From the first look at the correlations, a model with polynomials looked fitting. So let’s here fit such a model, that would be the model to beat later down the line.

Polynomial Model

Here we stick to a simple OLS, but using a polynomial specification for the predictors temperature and air humidity, thereby, addressing one of the TODOs listed after the naive model

⏩ stepit 'baseline': Starting execution of `strom.modelling.assess_model()` 2025-11-24 03:25:02

⏩ stepit 'get_single_split_metrics': Starting execution of `strom.modelling.get_single_split_metrics()` 2025-11-24 03:25:02

✅ stepit 'get_single_split_metrics': Successfully completed and cached [exec time 0.0 seconds, cache time 0.0 seconds, size 1.0 KB] `strom.modelling.get_single_split_metrics()` 2025-11-24 03:25:02

♻️  stepit 'cross_validate_pipe': is up-to-date. Using cached result for `strom.modelling.cross_validate_pipe()` 2025-11-24 03:25:02

✅ stepit 'baseline': Successfully completed and cached [exec time 0.1 seconds, cache time 0.0 seconds, size 14.9 KB] `strom.modelling.assess_model()` 2025-11-24 03:25:02

Metrics

	Single Split		CV
	train	test	test	train
MAE - Mean Absolute Error	2.081715	2.370468	1.434274	2.296291
MSE - Mean Squared Error	12.363331	21.082804	3.517579	14.436913
RMSE - Root Mean Squared Error	3.516153	4.591601	1.776864	3.797367
R2 - Coefficient of Determination	0.867354	0.776769	0.296869	0.854141
MAPE - Mean Absolute Percentage Error	0.200527	0.220512	0.220100	0.204433
EVS - Explained Variance Score	0.867354	0.785383	0.490437	0.854141
MeAE - Median Absolute Error	1.314816	1.374856	1.250695	1.483029
D2 - D2 Absolute Error Score	0.699475	0.666753	0.150291	0.676570
Pinball - Mean Pinball Loss	1.040858	1.185234	0.717137	1.148145

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

white noise or is there a pattern?
heteroscedasticity?
non-linearity?

Normality of Residuals:

Check for …

Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

TODOs

Substantial improvement here, we still have some items in the to-do list and given these results, I would add a coulpe more:

Check the degree of the polynomial
depite the much better fit, there are still some -although minor- heteroscedaticity, mostly associated to the lowest temperatures where the variance of strom use is higher. Here we can try other models that better handle that long tail, and also, try transformations.
there might be also a seasonal pattern, because we are not accounting for, for example, weekends and so on. Here we can see if the seasonal trend decomposition can help.
finally, there is still some evidence of autocorrelation. Which kinda makes sense: it is not the same to have one day with a very low temperature compared to a whole week with very low temperatures: while the former could be wintered with the heat already produced, as long as there is not much energy lose, the latter would certainly need much more work from the wärmepumpe to keep fighting the cold. In that sense, it seems unrealistic to just assume that the increase energy consumption of several days of extreme cold is just the sum of the daily increases. So if the one-day-increase due to extreme cold is 10, five consecutve days would probably result in an increase greater than 10*5=50, probably much greater. But let’s explore such hipothesis in the data and try to accomodate that in the model using lags, or moving average of the last few days.
run some PCA, with min, max mean and so on and think about a Cumulative Explained Variance plot https://archive.is/X1wrZ#selection-1059.8-1059.37 PCA seems sensible, because several climatic variables are highly correlated, but still could have different signals. E.g. temperature, the mean seems to capture well the process, but beyond that, how much it varies min, max, std dev, seems to have some predictive power.
perhaps experiment with feature selection.
and consider log-transforming the target, after all, the distribution of it is right-skewed with relatively long right tail
Consider models to handle censored data, since the consumption can never be negative. This, being a linear model, can still predict negative values, even though the polynomial show a rather good fit. But there is no restriction and in fact, in the test set, in the warm period with low consumption, there are a few instances in which the model ends up predicting negative values. So we would need to take that into account and add some sort of constraint for that.
Handle the outliers. These are really just a couple of points, but at least in these kind of model, they end up having a high leverage. So, we could just handle them by replacing them with locally-defined averages, or use models that are robust to those.

Polynomials but using statsmodels …

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     wd   R-squared:                       0.851
Model:                            OLS   Adj. R-squared:                  0.850
Method:                 Least Squares   F-statistic:                     650.5
Date:                Mon, 24 Nov 2025   Prob (F-statistic):               0.00
Time:                        03:25:06   Log-Likelihood:                -3447.1
No. Observations:                1261   AIC:                             6918.
Df Residuals:                    1249   BIC:                             6980.
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept              26.2162      2.714      9.658      0.000      20.891      31.542
tt_tu_mean             -1.9158      0.093    -20.707      0.000      -2.097      -1.734
I(tt_tu_mean ** 2)      0.1323      0.009     14.530      0.000       0.114       0.150
I(tt_tu_mean ** 3)     -0.0091      0.002     -6.039      0.000      -0.012      -0.006
I(tt_tu_mean ** 4)      0.0004      0.000      3.624      0.000       0.000       0.001
I(tt_tu_mean ** 5)  -7.126e-06   2.63e-06     -2.705      0.007   -1.23e-05   -1.96e-06
rf_tu_mean              0.0365      0.034      1.088      0.277      -0.029       0.102
rf_tu_min              -0.0130      0.018     -0.743      0.458      -0.047       0.021
rf_tu_max              -0.0466      0.040     -1.169      0.242      -0.125       0.032
tt_tu_mean.shift(1)    -0.2578      0.075     -3.457      0.001      -0.404      -0.112
tt_tu_mean.shift(2)     0.0374      0.073      0.512      0.609      -0.106       0.181
tt_tu_mean.shift(3)    -0.1064      0.050     -2.142      0.032      -0.204      -0.009
==============================================================================
Omnibus:                     1307.834   Durbin-Watson:                   0.897
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           126205.286
Skew:                           4.769   Prob(JB):                         0.00
Kurtosis:                      51.073   Cond. No.                     4.85e+07
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 4.85e+07. This might indicate that there are
strong multicollinearity or other numerical problems.

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'waermestrom'
Version: 20251124T032506Z-4d883

<vetiver.vetiver_model.VetiverModel at 0x7f5b088477c0>

here’s also the polynomial model, but using scikit-learn

0.8460040216097842