Row vs Columnar Geospatial Formats
The Core Trade-off
Geospatial formats fall into two categories with fundamentally different performance characteristics:
Row-oriented (GeoJSON, Shapefile, GeoPackage):
- Store complete features together
- Fast for reading individual features
- Slow for analytical queries across many features
Columnar (GeoParquet, FlatGeobuf):
- Store each attribute in contiguous blocks
- Fast for analytical queries (read only needed columns)
- Excellent compression (similar values together)
- Slower for single-feature access
When Row-Oriented Wins
Choose GeoJSON or GeoPackage when:
- Human readability matters - GeoJSON is self-documenting
- Small datasets (< 10MB) - overhead not worth it
- Real-time feature serving - need to return individual features quickly
- Web interoperability - browsers parse GeoJSON natively
- Desktop GIS workflows - QGIS, ArcGIS prefer these formats
When Columnar Wins
Choose GeoParquet when:
- Large datasets (> 100MB) - compression saves 50-80%
- Analytical queries - “average value where X > Y”
- Cloud storage - designed for S3/Azure Blob
- Data warehousing - integrates with DuckDB, Spark, BigQuery
- Partial reads - only load columns you need
The Hybrid Approach
Modern architectures use both:
- Store in GeoParquet for efficiency
- Serve as GeoJSON for web compatibility
- Cache vector tiles for mapping
Streaming Formats
FlatGeobuf offers a middle ground:
- Binary format (faster than JSON)
- Row-oriented (good for feature access)
- Supports HTTP range requests
- Optional spatial index for filtered reads
Use FlatGeobuf when:
- Need binary efficiency
- HTTP range request support needed
- Real-time streaming of features
- Medium datasets (10-500MB)
See Also
- Guide - Geospatial File Format Selection
- GeoParquet
- GeoJSON
- Cloud Optimized GeoTIFF (COG)
- MOC - Geospatial