Apache Iceberg connector

FederIQ reads Iceberg tables through DuckDB's native iceberg extension. Declare the source once, query it like any other table.

sources:
  - name: bronze
    type: iceberg
    path: "s3://warehouse/bronze/events"
    # optional — tolerate data files that moved relative to the manifest
    allow_moved_paths: false

Then:

SELECT event_type, COUNT(*)
FROM bronze
GROUP BY event_type;

How it works

FederIQ emits:

INSTALL iceberg;
LOAD iceberg;
CREATE OR REPLACE VIEW bronze AS
  SELECT * FROM iceberg_scan('s3://warehouse/bronze/events');

on attach. DuckDB handles manifest parsing and data file access.

Credentials

For S3-backed tables, set AWS credentials via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION) or via DuckDB's secrets manager.

For local filesystem paths, no credentials are needed.

Limitations

DuckDB's Iceberg extension is read-only and tracks a subset of the spec:

v2 tables supported; older format versions best-effort.
No writes (add partitions, schema evolution, etc.) — use your engine of choice (Spark, Trino, pyiceberg) to author, FederIQ to query.
Time travel (FOR TIMESTAMP AS OF ...) is not yet surfaced through FederIQ's catalog — query DuckDB directly if you need it.
Expect slower first-query latency than Parquet — Iceberg reads the metadata tree on every attach.