FederIQ

Federated SQL over any source, in a single Rust binary.

FederIQ is an open-source federated query engine built on top of DuckDB. One command attaches a Postgres database, a Parquet directory on S3, a CSV on disk, and an HTTP JSON API — and lets you join them in plain SQL.

SELECT u.name, SUM(o.total) AS revenue
FROM postgres_users.public.users u
JOIN events.orders o ON o.user_id = u.id
GROUP BY u.name
ORDER BY revenue DESC;

Why not Trino?

Trino is the incumbent. It's also ~4 GB of JVM, a cluster to deploy, and a DBA to operate. FederIQ is a single ~20 MB binary:

	Trino	FederIQ
Deploy unit	JVM cluster	Static binary
Minimum memory	~4 GB	~100 MB
First query latency	~30s (cluster warm)	~200 ms (cold)
Embed in Python/Node	No	Yes (`pip`/`npm`)
Policy engine	Add-on (Ranger)	Built-in
License	Apache 2.0	Apache 2.0

Why not DuckDB alone?

DuckDB is incredible but single-node and single-process. FederIQ adds:

Catalog-driven federation — declare sources in YAML, query from any client.
Policy engine — column masking, row filters, region pinning enforced at query-rewrite time.
HTTP service mode — federiq serve with bearer auth + TLS + Prometheus metrics.
Cache layer — memory, disk, and Redis backends with hooks for a predictive prefetcher.
First-party Python and TypeScript clients.

License

Apache 2.0. The core will always be Apache 2.0 — hosted and enterprise features may be licensed separately in the future.

Install

Pre-built binary

Grab a release archive from the releases page. Extract and put federiq on your PATH.

Cargo

cargo install federiq-cli

Docker

docker run --rm -p 7878:7878 \
  -v "$PWD/federiq.yaml:/var/federiq/federiq.yaml:ro" \
  ghcr.io/dyber-pqc/federiq:latest \
  serve --addr 0.0.0.0:7878 --token "$FEDERIQ_TOKEN"

From source

git clone https://github.com/dyber-pqc/federiq.git
cd federiq
cargo build --release
./target/release/federiq --help

First build takes a few minutes because DuckDB is compiled from source.

Python

pip install federiq
pip install 'federiq[pandas]'   # optional pandas helper

TypeScript / Node

npm install federiq

Your first federated query

Create a catalog file:

# federiq.yaml
sources:
  - name: events
    type: parquet
    path: ./data/events.parquet

  - name: users
    type: csv
    path: ./data/users.csv

Run a query that joins the two:

federiq query "
  SELECT u.name, COUNT(e.id) AS events
  FROM users u LEFT JOIN events e ON e.user_id = u.id
  GROUP BY u.name
  ORDER BY events DESC
  LIMIT 10
"

That's it. No cluster. No ETL. The Parquet file and the CSV stay exactly where they are; FederIQ federates the query across both.

Add a Postgres database in two more lines:

  - name: sales
    type: postgres
    dsn: "host=localhost port=5432 dbname=sales user=postgres password=postgres"

Then:

federiq query "
  SELECT u.name, SUM(o.total) AS revenue
  FROM sales.public.orders o
  JOIN users u ON u.id = o.user_id
  GROUP BY u.name
"

Next steps

The catalog file — all source types and options.
Policies — declare column masks and row filters.
Running federiq serve — expose the engine over HTTP.

The catalog file

federiq.yaml declares your data sources and (optionally) your policies. Initialize one with:

federiq init

Structure

sources:
  - name: <alias>
    type: <source-type>
    # ... type-specific keys ...

policies:
  - name: <policy-name>
    applies_to: <source-alias or "*">
    # ... policy keys ...

Source types

`type:`	Required keys	Notes
`postgres`	`dsn`	DuckDB `postgres_scanner` extension
`mysql`	`dsn`	DuckDB `mysql_scanner` extension
`sqlite`	`path`	Local SQLite database file
`parquet`	`path`	Single file, directory, or glob
`csv`	`path`	`read_csv_auto` for schema inference
`http`	`url`	JSON response, default feature
`mongo`	`uri`, `database`, `collection`	Planned; blocked on upstream

Managing sources from the CLI

federiq sources list
federiq sources add --name events --type parquet --path ./data/events.parquet
federiq sources remove events
federiq sources test events
federiq sources describe events

Inspecting schemas

federiq schema list                   # every source
federiq schema list events            # one source
federiq schema describe events        # columns of a single table or view

Validating

federiq config validate
federiq doctor

SQL connectors

FederIQ attaches Postgres, MySQL, and SQLite natively via DuckDB's postgres_scanner, mysql_scanner, and sqlite_scanner extensions — meaning queries push down to the remote engine wherever possible.

Postgres

- name: sales
  type: postgres
  dsn: "host=db.internal port=5432 dbname=sales user=app password=secret"

Reference tables with three-part names:

SELECT * FROM sales.public.orders LIMIT 10;

MySQL

- name: catalog
  type: mysql
  dsn: "host=db.internal port=3306 user=app password=secret database=catalog"

SQLite

- name: analytics
  type: sqlite
  path: "./data/analytics.db"

Read-only by design

FederIQ attaches every SQL source READ_ONLY. Writes go through the source's own tools.

File connectors

Parquet

- name: events
  type: parquet
  path: "./data/events.parquet"       # single file
# path: "./data/events/*.parquet"     # glob
# path: "s3://bucket/events/*.parquet" # S3 via DuckDB's httpfs extension

CSV

- name: users
  type: csv
  path: "./data/users.csv"

read_csv_auto handles header detection, delimiter inference, and type inference. For anything non-trivial, convert to Parquet first — it's faster, smaller, and carries schema.

S3 / HTTP

Parquet sources with an S3 or HTTPS path: use DuckDB's httpfs extension. Set credentials via the usual AWS env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION) or the DuckDB secrets mechanism.

HTTP / REST connector

Materializes a JSON HTTP response as a DuckDB view.

- name: posts
  type: http
  url: "https://api.example.com/posts"
  # optional
  headers:
    Accept: "application/json"
  auth_bearer: "$TOKEN"
  method: GET

Requirements:

Response must be JSON.
Shape must be either a top-level array of objects, or an object with a single array property under one of data, results, items.

Caveats

The connector fetches the full response at attach time. Huge responses will hurt. Add pagination support in your own wrapper or use a proper database.
No streaming yet. Planned.
No response-change detection. Re-attach to refresh.

For anything beyond trivial REST feeds, write your own connector — see connector SDK.

Apache Iceberg connector

FederIQ reads Iceberg tables through DuckDB's native iceberg extension. Declare the source once, query it like any other table.

sources:
  - name: bronze
    type: iceberg
    path: "s3://warehouse/bronze/events"
    # optional — tolerate data files that moved relative to the manifest
    allow_moved_paths: false

Then:

SELECT event_type, COUNT(*)
FROM bronze
GROUP BY event_type;

How it works

FederIQ emits:

INSTALL iceberg;
LOAD iceberg;
CREATE OR REPLACE VIEW bronze AS
  SELECT * FROM iceberg_scan('s3://warehouse/bronze/events');

on attach. DuckDB handles manifest parsing and data file access.

Credentials

For S3-backed tables, set AWS credentials via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION) or via DuckDB's secrets manager.

For local filesystem paths, no credentials are needed.

Limitations

DuckDB's Iceberg extension is read-only and tracks a subset of the spec:

v2 tables supported; older format versions best-effort.
No writes (add partitions, schema evolution, etc.) — use your engine of choice (Spark, Trino, pyiceberg) to author, FederIQ to query.
Time travel (FOR TIMESTAMP AS OF ...) is not yet surfaced through FederIQ's catalog — query DuckDB directly if you need it.
Expect slower first-query latency than Parquet — Iceberg reads the metadata tree on every attach.

Roadmap: more connectors

Planned connectors, roughly in order of priority:

Connector	Approach	Status
MongoDB	`mongodb` crate → NDJSON tempfile → view	Upstream-blocked on Rust 1.93 compat
Apache Iceberg	DuckDB `iceberg` extension	Shipped (see iceberg)
Snowflake	JDBC over `snowflake-jdbc` bridge or REST	Planned
BigQuery	BigQuery Storage API	Planned
Elasticsearch	REST search with cursor	Planned
Kafka	`rdkafka` + snapshot materialization	Planned
Delta Lake	`delta-rs`	Planned
DynamoDB	AWS SDK scan	Planned

Open an issue on the repo if your source isn't on the list — we pick based on user demand.

Writing your own connector

See the CONNECTORS.md guide in the repo for the full walkthrough, but here's the shape:

#![allow(unused)]
fn main() {
use federiq_core::{AttachCtx, Connector};

pub struct MyConnector {
    /* your config */
}

impl Connector for MyConnector {
    fn kind(&self) -> &'static str { "mysource" }

    fn attach(&self, ctx: &AttachCtx<'_>) -> anyhow::Result<()> {
        // 1. Fetch rows from the remote source.
        // 2. Write them as NDJSON to ctx.tmpdir.
        // 3. CREATE OR REPLACE VIEW <ctx.alias> AS SELECT * FROM read_json_auto('...');
        Ok(())
    }
}
}

A dynamic plugin system (WASM components loaded from ~/.federiq/plugins) is on the roadmap so third parties can ship connectors without recompiling FederIQ.

Policy model

A FederIQ policy is a YAML record that attaches to a source (or every source via "*") and declares how data must be handled. Policies ship with four capabilities:

policies:
  - name: mask_pii
    applies_to: users
    mask_columns:
      email: "'***@***.com'"
      ssn: "NULL"

  - name: analysts_see_active_only
    applies_to: events
    when_role: analyst            # only applies if FEDERIQ_ROLE=analyst
    row_filter: "status = 'active'"

  - name: pin_to_us_east
    applies_to: "*"
    require_region: us-east-1     # refuses to run if FEDERIQ_REGION != us-east-1

Fields

Field	Type	Purpose
`name`	string	Human-readable identifier
`applies_to`	string or `"*"`	Source name to match
`when_role`	string (optional)	Only applies when caller role matches
`require_region`	string (optional)	Refuses to run outside this region
`mask_columns`	map of `col: expr`	Replace the column with the SQL expression
`row_filter`	string (optional)	SQL WHERE predicate AND'd into every SELECT

Context

PolicyContext is pulled from environment variables at query time:

FEDERIQ_ROLE — caller role (e.g., analyst, admin)
FEDERIQ_REGION — data residency region (e.g., us-east-1)

What happens at query time

See enforcement for the rewrite details.

Enforcement

When federiq query or federiq serve sees a SQL statement against a table with active policies, it parses the SQL with sqlparser-rs, applies each policy as an AST transformation, and emits the rewritten SQL for DuckDB to execute. The rewriter is schema-aware: it walks the full AST (CTEs, subqueries, set operations, JOINs) and uses the attached engine to resolve column lists for wildcard expansion.

Row filters

-- User input
SELECT id FROM events WHERE user_id = 42;

-- After policy `row_filter: "status = 'active'"`
SELECT id FROM events WHERE user_id = 42 AND status = 'active';

Column masking

-- User input
SELECT name, email FROM users;

-- After policy `mask_columns: { email: "'***@***.com'" }`
SELECT name, '***@***.com' AS email FROM users;

The masking expression is valid SQL. The column is aliased back to its original name so callers don't see a schema change.

`SELECT *` is expanded automatically

The schema-aware rewriter resolves the column list for each table in scope and expands the wildcard explicitly, applying masks per-column.

-- User input
SELECT * FROM users;            -- columns: id, name, email, created_at

-- After policy `mask_columns: { email: "'***@***.com'" }`
SELECT id, name, '***@***.com' AS email, created_at FROM users;

t.* (qualified wildcard) expands only that table's columns, even inside a JOIN.

JOINs are walked, not rejected

JOINs across the federation seam work under the default fail-closed policy, with masks applied per source:

-- pg.public.users has mask_pii on email
SELECT u.name, u.email, e.event_type
FROM pg.public.users u
JOIN events e ON u.id = e.user_id;

-- becomes
SELECT u.name, '***@***.com' AS email, e.event_type
FROM pg.public.users u
JOIN events e ON u.id = e.user_id;

CTEs, UNIONs, subqueries

The rewriter recurses into all of these:

WITH active AS (SELECT email FROM pg.public.users)
SELECT email FROM active;

-- Mask applies inside the CTE body — not after.
WITH active AS (SELECT '***@***.com' AS email FROM pg.public.users)
SELECT email FROM active;

UNION / INTERSECT / EXCEPT walk each branch independently. Subqueries in WHERE (IN (SELECT …), EXISTS (SELECT …)) get the same treatment.

Ambiguous columns

If an unqualified column name matches a mask in multiple tables in scope, the rewriter refuses with a structured error and asks you to qualify:

could not resolve which table column 'email' belongs to
(multiple tables in scope: ["u", "archive"]);
qualify the column reference (e.g. `t.email`)

Region pinning

Region mismatches fail the rewrite:

policy 'pin_to_us_east' requires region='us-east-1'
but current context has eu-west-1

Static checking

Before running a query, preview violations:

federiq policy check "SELECT email, ssn FROM users"

Disabling enforcement

Policies auto-apply when the catalog declares any. To opt out on a per-call basis:

federiq query --enforce-policy=false "SELECT ..."

Use sparingly — this bypass is primarily for development and debugging.

Running `federiq serve`

federiq serve exposes the engine as an HTTP service.

Endpoints

Method	Path	Purpose
GET	`/healthz`	Liveness probe — always 200
GET	`/metrics`	Prometheus exposition
GET	`/sources`	List attached source names
POST	`/query`	Execute SQL — `{"sql": "..."}` → JSON rows

/healthz and /metrics are always reachable. /sources and /query require the bearer token (see auth).

Running it

federiq serve --addr 0.0.0.0:7878 --token "$FEDERIQ_TOKEN" \
  --tls-cert /etc/federiq/cert.pem --tls-key /etc/federiq/key.pem

Without --tls-cert the server listens on plain HTTP and prints a loud warning — don't expose an unencrypted port to the internet.

Making requests

curl -H "authorization: Bearer $FEDERIQ_TOKEN" \
  -H "content-type: application/json" \
  -d '{"sql":"SELECT 1 AS n"}' \
  https://localhost:7878/query

Configuring via env

Variable	Effect
`FEDERIQ_SERVER_TOKEN`	Bearer token for inbound requests
`FEDERIQ_LOG_FORMAT`	`json` switches tracing to structured logs
`FEDERIQ_ROLE`	Caller role for policies
`FEDERIQ_REGION`	Caller region for policies

Authentication and TLS

Bearer tokens

federiq serve enforces a shared-secret bearer token on every request to /sources and /query.

federiq serve --token "$(openssl rand -hex 32)"

Or via environment:

export FEDERIQ_SERVER_TOKEN="..."
federiq serve

Clients send Authorization: Bearer <token>:

new FederIQ("https://federiq.example.com", { token: process.env.FEDERIQ_TOKEN })

Token comparison is constant-time to avoid timing oracles.

TLS

Provide PEM-encoded cert and key:

federiq serve \
  --tls-cert /etc/federiq/cert.pem \
  --tls-key  /etc/federiq/key.pem

Generate a self-signed cert for development:

openssl req -x509 -newkey rsa:4096 -sha256 -days 365 -nodes \
  -keyout key.pem -out cert.pem -subj "/CN=localhost"

For production, use certs from Let's Encrypt or your internal CA.

What's missing (tracked)

Multiple tokens with distinct scopes / roles
mTLS (client certs)
OIDC / OAuth integration
Rate limiting

Open an issue if any of these are blocking your deployment.

Observability and metrics

Prometheus metrics

federiq serve exposes metrics at /metrics in Prometheus text format.

Metric	Kind	Labels
`federiq_queries_total`	counter
`federiq_query_errors_total`	counter
`federiq_query_duration_seconds`	histogram
`federiq_http_requests_total`	counter	`route`
`federiq_auth_failures_total`	counter

Scrape config:

scrape_configs:
  - job_name: federiq
    static_configs:
      - targets: ["federiq.internal:7878"]

Structured logs

export FEDERIQ_LOG_FORMAT=json
federiq serve

Emits one JSON object per log event (stdout), with event fields flattened so they're indexable by downstream shipper.

Log levels are controlled by RUST_LOG:

RUST_LOG=federiq=debug,federiq_core=debug federiq serve

Per-source attach timing

Every source's ATTACH latency is tracked by the engine:

federiq query "SELECT 1" --timing

per-source attach timing:
  events               parquet  2.14ms
  sales                postgres 84.2ms

And from Python:

import json
print(json.dumps(engine.attach_timings(), indent=2))

EXPLAIN ANALYZE

DuckDB's EXPLAIN ANALYZE gives operator-level timing that naturally includes federated source scan costs:

federiq query "SELECT COUNT(*) FROM events" --explain-analyze

Caching

FederIQ has an optional result cache keyed by a content-addressed hash of the query SQL. Three backends ship in the default build:

memory — in-process LRU with TTL (fastest, evicted on restart)
disk — file-backed store sharded by hash prefix (durable, shareable across processes)
redis — enabled with --features cache-redis (shared across nodes)

From the CLI

federiq query --cache disk --cache-dir ./.cache "SELECT COUNT(*) FROM events"
federiq cache status --backend disk --dir ./.cache
federiq cache clear  --backend disk --dir ./.cache

From Rust

#![allow(unused)]
fn main() {
use federiq_core::{DiskCache, Engine, FreqPredictor};
use std::sync::Arc;

let engine = Engine::new()?
    .with_cache(Arc::new(DiskCache::new("./.cache")?))
    .with_predictor(Arc::new(FreqPredictor::new(10)));

engine.attach_all(&catalog.sources)?;

let rows = engine.query_cached("SELECT COUNT(*) FROM events")?;
}

Predictors

Predictor::observe(sql, hit) gets called on every cache access. The default FreqPredictor counts queries; predict() returns the top-k most-frequent for background warmup.

#![allow(unused)]
fn main() {
engine.warmup()?;  // runs predictor.predict() and caches the results
}

A smarter ML predictor is on the roadmap — implement the Predictor trait to plug in your own.

What's not cached yet

Queries whose results depend on the calling role/region (planned for v0.5.1 when PolicyContext is plumbed through query_cached).
Negative results (errors are not cached).
Schema introspection.

Python client

pip install federiq
pip install 'federiq[pandas]'   # optional pandas helper

Quickstart

import federiq

engine = federiq.connect("./federiq.yaml")

rows = engine.query("SELECT 1 AS n")
print(rows)  # [{'n': 1}]

df = federiq.query_df(engine, "SELECT COUNT(*) AS n FROM events")
print(df)

API

class Engine:
    @staticmethod
    def from_catalog(path: str) -> "Engine": ...
    def attach(self, source: dict) -> None: ...
    def query(self, sql: str) -> list[dict]: ...
    def explain(self, sql: str) -> str: ...
    def explain_analyze(self, sql: str) -> str: ...
    def attach_timings(self) -> list[dict]: ...
    def duckdb_version(self) -> str: ...

Jupyter

See examples/jupyter/federiq_intro.ipynb for an interactive walkthrough.

TypeScript client

npm install federiq

The TypeScript client is a thin HTTP wrapper around federiq serve.

Quickstart

import { FederIQ } from "federiq";

const fq = new FederIQ("https://federiq.example.com", {
  token: process.env.FEDERIQ_TOKEN,
});

const { rows } = await fq.query("SELECT 1 AS n");
console.log(rows);

API

class FederIQ {
  constructor(baseUrl: string, opts?: FederIQOptions);
  query(sql: string): Promise<{ rows: unknown[] }>;
  sources(): Promise<string[]>;
  healthz(): Promise<boolean>;
}

interface FederIQOptions {
  token?: string;
  fetch?: typeof fetch;
  timeoutMs?: number;
  headers?: Record<string, string>;
}

Works in any runtime with global fetch — Node 18+, Bun, Deno, and modern browsers.

CLI reference

federiq [--config PATH] [-v] <subcommand>

Every subcommand accepts --help for detail. A condensed reference:

Subcommand	Purpose
`init`	Scaffold a new `federiq.yaml`
`sources list`	List declared sources
`sources add`	Add a source
`sources remove`	Remove a source
`sources test`	Verify one or all sources are reachable
`sources describe`	Show source detail and generated ATTACH SQL
`schema list`	List tables in a source
`schema describe`	Show column types for a table
`query`	Execute SQL (see flags below)
`shell`	Interactive REPL
`serve`	HTTP server with auth + TLS + metrics
`config show`	Print resolved catalog YAML
`config validate`	Lint catalog sources
`policy list`	List policies
`policy show`	Show one policy
`policy validate`	Cross-check policies against sources
`policy check`	Static violation scan for a SQL query
`cache status`	Cache statistics
`cache clear`	Empty the cache
`doctor`	Environment diagnostics
`completions <shell>`	Emit shell completion scripts

`federiq query` flags

Flag	Meaning
`-f, --file PATH`	Read SQL from a file
`-o, --output PATH`	Write results to a file
`--format FMT`	`table` (default), `csv`, `json`, `ndjson`, `parquet`
`--explain`	Print DuckDB plan
`--explain-analyze`	Plan plus per-operator timing
`--timing`	Print total and per-source timing
`--cache MODE`	`off` / `memory` / `disk`
`--cache-dir PATH`	Disk cache location
`--enforce-policy BOOL`	Default `true` when catalog has policies

Configuration reference

Environment variables

Variable	Where	Purpose
`FEDERIQ_SERVER_TOKEN`	`federiq serve`	Bearer token for inbound requests
`FEDERIQ_LOG_FORMAT`	all commands	`json` switches logs to JSON
`FEDERIQ_ROLE`	all commands	Caller role for policies
`FEDERIQ_REGION`	all commands	Caller region for policies
`RUST_LOG`	all commands	tracing env-filter expression

Catalog reference

See catalog and policy pages for the authoritative schema.

File locations

Default catalog: ./federiq.yaml
Default disk cache: ./.federiq-cache/
Shell history: ./.federiq_history

All configurable via flags — nothing writes to $HOME without your involvement.

FAQ

How is this different from Trino?

Trino is a distributed query engine. FederIQ is a single binary built on DuckDB. For small-to-medium deployments you get most of Trino's federation story without the JVM ops burden. For true petabyte multi-tenant workloads, Trino is still the right answer.

Can I use FederIQ in production?

Today: probably not without caveats. Tier-1 features (auth, TLS, policy enforcement, observability) are shipped but have only been used by early-alpha testers. Wait for v1.0 for load-bearing workloads.

Does it modify my data?

No. Every source is attached READ_ONLY. FederIQ only reads.

What SQL dialect?

DuckDB's, which is broadly Postgres-compatible. Functions like date_trunc, epoch, list_aggregate, and DuckDB's extensions are all available.

Why is the first build slow?

DuckDB's C++ core compiles from source via the bundled feature. First build takes 3–6 minutes. Subsequent incremental builds are fast.

How do I pin data to a region?

Use the require_region field of a policy. See enforcement.

Can I cache results across processes / nodes?

Yes — use the disk cache (shared filesystem) or the Redis cache (cargo install federiq-cli --features cache-redis).

What about joins that span sources?

DuckDB plans them natively. For expensive cross-source joins consider caching the hot side or materializing to Parquet. A cost-aware planner is on the roadmap.

Is there a managed cloud?

Planned. Core will stay Apache 2.0 forever; hosted and enterprise features will be separately licensed.

FederIQ vs. Trino vs. DuckDB

A pragmatic feature comparison. None of these tools is strictly better — they solve different pain points.

Dimension	DuckDB	FederIQ	Trino
Deploy unit	Single binary	Single binary	JVM cluster
Minimum memory	~50 MB	~100 MB	~4 GB
Cold start	~50 ms	~200 ms	~30 s
Multi-source federation	Limited	Yes	Yes
Postgres / MySQL / SQLite	Via extensions	Native catalog	Native
Parquet / CSV / JSON	Native	Via DuckDB	Via Hive connector
HTTP / REST	No	Yes (default feature)	No
Mongo	No	Planned	Yes
Iceberg	Via extension	In progress	Yes
Declarative policy engine	No	Yes (YAML, SQL rewrite)	Via Ranger add-on
Column masking / row filters	Manual views	Built-in, catalog-declared	Via Ranger
HTTP service mode	No	Yes (auth + TLS + metrics)	Yes
Cache layer (memory/disk/Redis)	Manual	Built-in	Materialized views
Python / TS clients	Yes / No	Yes / Yes	Yes / community
License	MIT	Apache 2.0	Apache 2.0

When to pick each

DuckDB alone: single-node analytics, embedded in a Python/R/Node process, local Parquet/CSV processing.
FederIQ: you need federated queries across 2–10 sources, a policy engine, or an HTTP service, but don't want the operational cost of Trino. Works great embedded or as a sidecar.
Trino: petabyte-scale, hundreds of concurrent users, existing Hive metastore, or you already run a JVM fleet.

Benchmarks

Rigorous TPC-H numbers against Trino and Dremio are on the roadmap. When we publish, the benchmark harness and raw numbers will live in the repo under bench/ so you can reproduce.