Outliers

Naive outliers

Z score

date nd wd nt ht obs tt_tu_min tt_tu_max tt_tu_mean tt_tu_median ... rs_ind_mean rs_ind_median rs_ind_std wrtr_min wrtr_max wrtr_mean wrtr_median wrtr_std _merge wd_z
1652 2021-01-11 9.413296 50.236627 20.267084 29.969543 4.0 -10.7 -2.4 -6.797917 -6.80 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.903684
1656 2021-01-15 8.286358 41.735698 12.286358 29.449339 4.0 -11.1 -2.6 -6.541667 -6.15 ... 0.020833 0.0 0.144338 NaN NaN NaN NaN NaN both 3.025781
1683 2021-02-11 8.751573 59.208399 21.817963 37.390437 4.0 -12.4 -4.5 -8.064583 -8.00 ... 0.229167 0.0 0.424744 NaN NaN NaN NaN NaN both 4.830212
1684 2021-02-12 10.626823 64.885871 23.812605 41.073266 3.0 -15.0 -3.0 -8.837500 -8.50 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 5.416532
1685 2021-02-13 12.659750 60.386958 58.232787 2.154171 3.0 -13.4 -2.9 -8.031250 -8.70 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 4.951923
1686 2021-02-14 7.790416 51.393652 51.258327 0.135325 4.0 -14.2 0.7 -7.366667 -7.65 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 4.023172
2352 2022-12-12 10.483619 55.413417 23.588144 31.825273 0.0 -12.2 -0.2 -6.552083 -6.45 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 4.438298
2353 2022-12-13 10.483619 55.413417 23.588144 31.825273 0.0 -15.1 -4.1 -8.937500 -8.80 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 4.438298
2357 2022-12-17 9.732795 44.614500 40.742659 3.871841 1.0 -7.1 -1.3 -3.133333 -2.70 ... 0.083333 0.0 0.279310 NaN NaN NaN NaN NaN both 3.323079
2358 2022-12-18 9.573892 57.289325 56.609472 0.679853 1.0 -8.2 -0.6 -4.979167 -5.35 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 4.632026
2745 2024-01-09 9.708277 42.242300 12.162534 30.079766 5.0 -6.5 -4.3 -5.506250 -5.50 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.078099
2746 2024-01-10 6.645277 41.582167 15.946443 25.635724 4.0 -7.7 -1.2 -5.337500 -5.65 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.009926
2749 2024-01-13 12.093837 41.894842 41.894842 0.000000 2.0 -7.9 -0.5 -4.202083 -4.15 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.042217
2753 2024-01-17 8.293448 78.472398 27.005158 51.467241 2.0 -7.7 3.7 -1.081250 -0.30 ... 0.208333 0.0 0.410414 NaN NaN NaN NaN NaN both 6.819631
2756 2024-01-20 8.135308 43.717538 42.186335 1.531202 3.0 -11.0 -0.2 -6.272917 -7.20 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.230449
3097 2024-12-26 9.066030 44.944476 44.944476 0.000000 6.0 -7.9 -0.5 -4.900000 -5.05 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.357156
3101 2024-12-30 7.049364 45.245536 16.109633 29.135903 4.0 -7.1 -0.2 -4.233333 -4.95 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.388247
3106 2025-01-04 5.903743 46.109890 45.229434 0.880455 1.0 -8.6 -2.2 -5.245833 -4.65 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.477510
3116 2025-01-14 8.003773 42.891896 16.238845 26.653051 3.0 -8.8 -0.1 -4.795833 -5.35 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.145184
3121 2025-01-19 11.454509 42.858147 42.858147 0.000000 2.0 -5.1 7.2 -0.562500 -0.45 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.141698
3413 2025-11-07 10.195547 51.275826 31.870871 19.404954 4.0 -1.0 8.4 2.258333 1.50 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 4.011004
3414 2025-11-08 10.756562 49.397565 48.633630 0.763934 7.0 -1.6 5.9 2.708696 4.20 ... 0.000000 0.0 0.000000 NaN NaN NaN NaN NaN both 3.817033
3415 2025-11-09 11.125413 44.946610 44.946610 0.000000 3.0 4.2 8.4 5.987500 6.00 ... 0.500000 0.5 0.510754 NaN NaN NaN NaN NaN both 3.357377
3416 2025-11-10 11.320897 49.463393 12.114932 37.348460 4.0 2.0 10.2 6.070833 5.95 ... 0.250000 0.0 0.442326 NaN NaN NaN NaN NaN both 3.823831

24 rows × 73 columns

Z score

date nd wd nt ht obs tt_tu_min tt_tu_max tt_tu_mean tt_tu_median ... rs_ind_median rs_ind_std wrtr_min wrtr_max wrtr_mean wrtr_median wrtr_std _merge wd_z wd_zr
1652 2021-01-11 9.413296 50.236627 20.267084 29.969543 4.0 -10.7 -2.4 -6.797917 -6.80 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 3.903684 7.283405
1683 2021-02-11 8.751573 59.208399 21.817963 37.390437 4.0 -12.4 -4.5 -8.064583 -8.00 ... 0.0 0.424744 NaN NaN NaN NaN NaN both 4.830212 8.898996
1684 2021-02-12 10.626823 64.885871 23.812605 41.073266 3.0 -15.0 -3.0 -8.837500 -8.50 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 5.416532 9.921366
1685 2021-02-13 12.659750 60.386958 58.232787 2.154171 3.0 -13.4 -2.9 -8.031250 -8.70 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 4.951923 9.111225
1686 2021-02-14 7.790416 51.393652 51.258327 0.135325 4.0 -14.2 0.7 -7.366667 -7.65 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 4.023172 7.491756
2352 2022-12-12 10.483619 55.413417 23.588144 31.825273 0.0 -12.2 -0.2 -6.552083 -6.45 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 4.438298 8.215615
2353 2022-12-13 10.483619 55.413417 23.588144 31.825273 0.0 -15.1 -4.1 -8.937500 -8.80 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 4.438298 8.215615
2358 2022-12-18 9.573892 57.289325 56.609472 0.679853 1.0 -8.2 -0.6 -4.979167 -5.35 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 4.632026 8.553419
2753 2024-01-17 8.293448 78.472398 27.005158 51.467241 2.0 -7.7 3.7 -1.081250 -0.30 ... 0.0 0.410414 NaN NaN NaN NaN NaN both 6.819631 12.367958
3413 2025-11-07 10.195547 51.275826 31.870871 19.404954 4.0 -1.0 8.4 2.258333 1.50 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 4.011004 7.470539
3414 2025-11-08 10.756562 49.397565 48.633630 0.763934 7.0 -1.6 5.9 2.708696 4.20 ... 0.0 0.000000 NaN NaN NaN NaN NaN both 3.817033 7.132311
3416 2025-11-10 11.320897 49.463393 12.114932 37.348460 4.0 2.0 10.2 6.070833 5.95 ... 0.0 0.442326 NaN NaN NaN NaN NaN both 3.823831 7.144165

12 rows × 74 columns

This is of course naive and catches many allegedly legit observations.

STL-based outlier detection

For Wärmestrom is particularly relevant to consider the season in detecting outliers. So let’s try that, using Multiple Seasonal-Trend decomposition using Loess.

MSTL

This seems better. It catches some observations that, looking only at the univariate distribution, may not seem like an outlier, but within the typical consumption in the season, the seem extreme.

I’ve spotted a couple of those points already in the correlation matrix. So let’s see how do these outliers look out there.

Well, a bit better than the naive approach, but still fails to detect a couple of points that, for higher temperatures show a very high consumption. Perhaps we need to resort to multivariate outlier detection.

Prophet-based outlier detection

/home/runner/work/strom/strom/.venv/lib/python3.10/site-packages/plotly/io/_json.py:558: UserWarning:

Discarding nonzero nanoseconds in conversion.

Well, rather similar. I mean, the band is pretty wide, and not sensible to the seasons. So, unsurprisingly, it only catches extreme values on the cold season. Again, it seems we would need necesarilly to include the climate data.

Isolation forests

Well, it catches again the extreme values, and a couple of very low values. But still fails to capture the possible outliers in the warm season.

Local Outlier Factor - LOF

I had higher hopes about this one. But yeah, it is pretty sensitive to the parameters.

TODOs