B. DuckDB

We will be setting-up DuckDB to work in Jupyter-powered Quarto Documents.

Set-up jupysql to be able to write sql queries directly

To do it with DuckDB, basically following this guide, just need to make sure jupysql, SQLAlchemy and duckdb-engine are installed, besides the core libraries (notebook, pandas, duckdb). If any of them mssing, simply pip install them.

Step 1 is then to import extension. It enables SQL cells in Jupyter. It supports inline SQL using %sql and a whole SQL cell starting it with %%sql.

import duckdb
import pandas as pd
import sqlalchemy # No need to import duckdb_engine, 
                  # SQLAlchemy will auto-detect 

%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False
Loading configurations from /home/runner/work/strom/strom/pyproject.toml.
Settings changed:
Config value
feedback True
autopandas True
displaycon False
dsn_filename ./connections.ini

I prefer Quarto to edit my notebooks, and the above still works. However, it seems Quarto’s SQL engine is still only for R since it requires knitr and does not seem to support the combo ipython-sql-SQLAlchemy. So you cannot simply use an SQl chunk like this

```{sql}
SELECT * FROM test;
```

But you have to use a standard python chunk and use the %sql or %%sql to be able to write SQL direcly.

Step 2 is to fire-up DuckDB, either in memory or pointing to a file.

%sql duckdb:///:memory:
# %sql duckdb:///path/to/file.db

Test it’s working

%sql SELECT 'Off and flying!' as a_duckdb_column
a_duckdb_column
0 Off and flying!
%sql SELECT * FROM duckdb_settings();
name value description input_type scope aliases
0 Calendar gregorian The current calendar VARCHAR GLOBAL []
1 TimeZone Etc/UTC The current time zone VARCHAR GLOBAL []
2 access_mode automatic Access mode of the database (AUTOMATIC, READ_O... VARCHAR GLOBAL []
3 allocator_background_threads false Whether to enable the allocator background thr... BOOLEAN GLOBAL []
4 allocator_bulk_deallocation_flush_threshold 512.0 MiB If a bulk deallocation larger than this occurs... VARCHAR GLOBAL []
... ... ... ... ... ... ...
131 username None The username to use. Ignored for legacy compat... VARCHAR GLOBAL [user]
132 variant_legacy_encoding false Enables the Parquet reader to identify a Varia... BOOLEAN GLOBAL []
133 wal_autocheckpoint 16.0 MiB The WAL size threshold at which to automatical... VARCHAR GLOBAL []
134 worker_threads 4 The number of total threads used by the system. BIGINT GLOBAL []
135 zstd_min_string_length 4096 The (average) length at which to enable ZSTD c... UBIGINT GLOBAL []

136 rows × 6 columns

We will need to make sure we load the extension for all Quarto documents, as they will all be different jupyter notebooks (TODO: double-check this). So we will use Quarto includes to keep the source files not-so-verbose.

Note that we use an underscore (_) prefix for the included file. You should always use an underscore prefix with included files so that they are automatically ignored (i.e. not treated as standalone files) by a quarto render of a project).

Also, remember the “trick” here

In Quarto projects, an include path that begins with a leading slash will be interpreted as project relative, meaning that you should be able to use an include such as:

"{{< include /_codebit.qmd >}}"

to include from the project root no matter the subdirectory of the file itself.