B. DuckDB

We will be setting-up DuckDB to work in Jupyter-powered Quarto Documents.

Set-up `jupysql` to be able to write sql queries directly

To do it with DuckDB, basically following this guide, just need to make sure jupysql, SQLAlchemy and duckdb-engine are installed, besides the core libraries (notebook, pandas, duckdb). If any of them mssing, simply pip install them.

Step 1 is then to import extension. It enables SQL cells in Jupyter. It supports inline SQL using %sql and a whole SQL cell starting it with %%sql.

import duckdb
import pandas as pd
import sqlalchemy # No need to import duckdb_engine, 
                  # SQLAlchemy will auto-detect 

%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

Loading configurations from /home/runner/work/strom/strom/pyproject.toml.

Settings changed:

Config	value
feedback	True
autopandas	True
displaycon	False
dsn_filename	./connections.ini

I prefer Quarto to edit my notebooks, and the above still works. However, it seems Quarto’s SQL engine is still only for R since it requires knitr and does not seem to support the combo ipython-sql-SQLAlchemy. So you cannot simply use an SQl chunk like this

```{sql}
SELECT * FROM test;
```

But you have to use a standard python chunk and use the %sql or %%sql to be able to write SQL direcly.

Step 2 is to fire-up DuckDB, either in memory or pointing to a file.

%sql duckdb:///:memory:
# %sql duckdb:///path/to/file.db

Test it’s working

%sql SELECT 'Off and flying!' as a_duckdb_column

	a_duckdb_column
0	Off and flying!

%sql SELECT * FROM duckdb_settings();

	name	value	description	input_type	scope	aliases
0	Calendar	gregorian	The current calendar	VARCHAR	GLOBAL	[]
1	TimeZone	Etc/UTC	The current time zone	VARCHAR	GLOBAL	[]
2	access_mode	automatic	Access mode of the database (AUTOMATIC, READ_O...	VARCHAR	GLOBAL	[]
3	allocator_background_threads	false	Whether to enable the allocator background thr...	BOOLEAN	GLOBAL	[]
4	allocator_bulk_deallocation_flush_threshold	512.0 MiB	If a bulk deallocation larger than this occurs...	VARCHAR	GLOBAL	[]
...	...	...	...	...	...	...
131	username	None	The username to use. Ignored for legacy compat...	VARCHAR	GLOBAL	[user]
132	variant_legacy_encoding	false	Enables the Parquet reader to identify a Varia...	BOOLEAN	GLOBAL	[]
133	wal_autocheckpoint	16.0 MiB	The WAL size threshold at which to automatical...	VARCHAR	GLOBAL	[]
134	worker_threads	4	The number of total threads used by the system.	BIGINT	GLOBAL	[]
135	zstd_min_string_length	4096	The (average) length at which to enable ZSTD c...	UBIGINT	GLOBAL	[]

136 rows × 6 columns

We will need to make sure we load the extension for all Quarto documents, as they will all be different jupyter notebooks (TODO: double-check this). So we will use Quarto includes to keep the source files not-so-verbose.

Note that we use an underscore (_) prefix for the included file. You should always use an underscore prefix with included files so that they are automatically ignored (i.e. not treated as standalone files) by a quarto render of a project).

Also, remember the “trick” here

In Quarto projects, an include path that begins with a leading slash will be interpreted as project relative, meaning that you should be able to use an include such as:

"{{< include /_codebit.qmd >}}"

to include from the project root no matter the subdirectory of the file itself.

Set-up jupysql to be able to write sql queries directly

Set-up `jupysql` to be able to write sql queries directly