DuckDB

Overview

DuckDB is an in-process SQL OLAP database management system designed for analytical workloads. With its spatial extension, DuckDB provides powerful geospatial capabilities including native support for GeoParquet, making it ideal for cloud-native geospatial analytics.

Key Features

FeatureDescription
In-ProcessRuns embedded in your application
Columnar StorageOptimized for analytical queries
Zero DependenciesSingle binary, no server
Spatial ExtensionFull geometry support
Parquet NativeDirect query of Parquet files
Arrow IntegrationZero-copy data exchange

Spatial Extension

Install and load the spatial extension:

INSTALL spatial;
LOAD spatial;

Supported Formats

FormatReadWrite
GeoParquetYesYes
ShapefileYesNo
GeoJSONYesYes
GeoJSONSeqYesYes
FlatGeobufYesNo

Usage Examples

Query GeoParquet

-- read local geoparquet
SELECT * FROM read_parquet('parcels.parquet');
 
-- read remote geoparquet
SELECT * FROM read_parquet('s3://bucket/data.parquet');
 
-- spatial filter with bbox
SELECT * FROM read_parquet('parcels.parquet')
WHERE ST_Intersects(
  geometry,
  ST_MakeEnvelope(-122.5, 37.7, -122.3, 37.9)
);

Spatial Operations

-- calculate area
SELECT ST_Area(geometry) as area FROM parcels;
 
-- buffer features
SELECT ST_Buffer(geometry, 100) as buffered FROM points;
 
-- spatial join
SELECT a.*, b.flood_zone
FROM parcels a, flood_zones b
WHERE ST_Intersects(a.geometry, b.geometry);

Query Overture Maps

-- query overture buildings
SELECT *
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-*/theme=buildings/*')
WHERE bbox.xmin > -122.5 AND bbox.xmax < -122.3
  AND bbox.ymin > 37.7 AND bbox.ymax < 37.9;

Integration

R

library(duckdb)
library(sf)
 
con <- duckdb::dbConnect(duckdb::duckdb())
duckdb::dbExecute(con, "INSTALL spatial; LOAD spatial;")
 
# query parquet and convert to sf
result <- duckdb::dbGetQuery(con, "
  SELECT * FROM read_parquet('parcels.parquet')
  WHERE ST_Area(geometry) > 10000
")
 
# convert to sf object
sf_result <- sf::st_as_sf(result, wkt = "geometry")

Python

import duckdb
 
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
 
# query to geopandas
gdf = con.execute("""
    SELECT * FROM read_parquet('parcels.parquet')
    WHERE ST_Area(geometry) > 10000
""").fetch_df()

CLI

duckdb -c "INSTALL spatial; LOAD spatial;"
duckdb -c "SELECT COUNT(*) FROM read_parquet('data.parquet');"

Comparison

DatabaseTypeSpatialBest For
DuckDBEmbedded OLAPExtensionAnalytics, Parquet
PostGISServer OLTP/OLAPNativeProduction, CRUD
SQLite/SpatiaLiteEmbeddedExtensionMobile, simple

Appendix

Created: 2024-12-23 | Modified: 2024-12-23

See Also


(c) No Clocks, LLC | 2024