Naive Model

Naive model

A simple OLS model with temperature and air humidity, without any transformation or feature engineering -no thinking at all-.

⏩ stepit 'naive': Starting execution of `strom.modelling.assess_model()` 2026-04-12 13:37:03

⏩ stepit 'get_single_split_metrics': Starting execution of `strom.modelling.get_single_split_metrics()` 2026-04-12 13:37:04

✅ stepit 'get_single_split_metrics': Successfully completed and cached [exec time 0.0 seconds, cache time 0.0 seconds, size 1.0 KB] `strom.modelling.get_single_split_metrics()` 2026-04-12 13:37:04

⏩ stepit 'cross_validate_pipe': Starting execution of `strom.modelling.cross_validate_pipe()` 2026-04-12 13:37:04

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.

[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:    2.1s finished

✅ stepit 'cross_validate_pipe': Successfully completed and cached [exec time 2.1 seconds, cache time 0.0 seconds, size 2.2 KB] `strom.modelling.cross_validate_pipe()` 2026-04-12 13:37:06

✅ stepit 'naive': Successfully completed and cached [exec time 2.2 seconds, cache time 0.0 seconds, size 15.7 KB] `strom.modelling.assess_model()` 2026-04-12 13:37:06

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 3.340488 3.162185 2.826249 3.521145
MSE - Mean Squared Error 25.947551 24.493693 12.676356 28.646140
RMSE - Root Mean Squared Error 5.093874 4.949110 3.426344 5.349381
R2 - Coefficient of Determination 0.721589 0.749748 -8.271677 0.710541
MAPE - Mean Absolute Percentage Error 0.385556 0.325290 0.541985 0.374222
EVS - Explained Variance Score 0.721589 0.749876 -2.079990 0.710541
MeAE - Median Absolute Error 2.667506 2.272731 2.553267 2.588375
D2 - D2 Absolute Error Score 0.517747 0.587819 -1.510111 0.504172
Pinball - Mean Pinball Loss 1.670244 1.581092 1.413125 1.760573

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

TODOs

Clearly the naive model is not a good fit (as expected).

Listing 1
Naive Model, but using statsmodels …

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     wd   R-squared:                       0.737
Model:                            OLS   Adj. R-squared:                  0.737
Method:                 Least Squares   F-statistic:                     1963.
Date:                Sun, 12 Apr 2026   Prob (F-statistic):               0.00
Time:                        13:37:10   Log-Likelihood:                -4259.4
No. Observations:                1405   AIC:                             8525.
Df Residuals:                    1402   BIC:                             8541.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         27.9188      1.219     22.899      0.000      25.527      30.310
tt_tu_mean    -1.1363      0.019    -59.272      0.000      -1.174      -1.099
rf_tu_mean    -0.0612      0.014     -4.300      0.000      -0.089      -0.033
==============================================================================
Omnibus:                      979.383   Durbin-Watson:                   0.631
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            21072.270
Skew:                           2.959   Prob(JB):                         0.00
Kurtosis:                      21.026   Cond. No.                         737.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
R-squared: 0.7368832968577501

Naive model, but using scikit-learn without pipeline …