I am excited about a number of things. First and foremost is DuckDB Web Shell — a web assembly implementation of DuckDB that you can run in your browser. No installation required: go to this url: https://shell.duckdb.org/ and the DuckDB engine is loaded and executed in your browser.
A few seconds after navigating your browser to the shell.duckdb.org url, DuckDB is running inside your browser and you are greeted by the CLI
All subsequent operations take place locally. In your browser. When DuckDB is running, you can go into airplane mode and work on your data in private.
One way to move data into the DuckDB instance is by adding local files. The command to add a file is .files add and when you entered it into the CLI, the browser file dialog window is opened and you can select one or more files (CSV, JSON, Parquet formats are supported).
Add a local file to make it available for SQL operations with DuckDB
Once the file has been added, use .files list to verify if it is now part of the context. If it is, it can now be accessed, for example using the function read_csv_auto:
select *
from read_csv_auto('movies.csv');
Add (upload) the local file, verify it is part of the DuckDB context and query it using rad_csv_auto
It is easy to turn the data in this file into a real table in DuckDB:
create table movies as select *
from read_csv_auto('movies.csv')
;
Create a table from a local CSV file and descrbe the table
If you are still on line, your local DuckDB database can also access remote file over HTTP. Instead of uploading a local file into your (local) browser’s DuckDB, you can also have DuckDB access files for example on GitHub or S3.
Here is an example, using the raw content endpoint for a csv file on GitHub:
SELECT *
FROM read_csv_auto('https://raw.githubusercontent.com/lucasjellema/informatica-databases/main/imdb/movies.csv')
;
Query data straight from a remote HTTP accessible csv file
Of course in this case too the data can be pulled into a local table using create table as select * from … syntax.
CREATE TABLE imdb_movies AS
SELECT *
FROM read_csv_auto('https://raw.githubusercontent.com/lucasjellema/informatica-databases/main/imdb/movies.csv')
;
Apart from extreme rapid access to a full fledged SQL database and analytical query engine, this local DuckDB allows me to perform easy analysis on local data sets. I am not really an Excel wizard and like most people I frequently have to deal with data in CSV or similar format. With this ability to rapidly bring the data to this SQL engine I expect to be able to perform much easier and richer analysis. I am way better at SQL than at Excel.
After manipulating data in DuckDB using SQL, it is straightforward to get the data in CSV file format (or JSON for that matter) once again. A simple COPY statement can create a local DuckDB file from any SELECT statement and with .files download <filename> the file can be downloaded from the browser to your operating system and local hard drive.
Create a local csv file from a Select statement, then download the file from the browser to the local operating system. Note: all data and the file was always completely local, inside the browser
Resources
DuckDB Documentation — https://duckdb.org/docs
Run DuckDB web shell in your browser — https://shell.duckdb.org/