Routing — Client-Aware vs Proxy (Vitess, Citus, ProxySQL)

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

In a sharded database, something has to know "which shard holds this key". That something lives in one of two places, and the choice shapes every query's latency, every deploy's blast radius, and every on-call's Friday night.

Client-aware routing puts the shard map inside the client driver. The driver hashes the key, consults its cached copy of the topology, and opens a TCP connection straight to the node that owns the data. No middle tier. No extra hop. The query travels exactly one network segment from your application process to the storage node. Cassandra's Java driver (with TokenAwarePolicy), the DataStax drivers for ScyllaDB, MongoDB's modern drivers with SRV-record bootstrapping, Redis Cluster's client libraries, and DynamoDB's SDKs all work this way. The cost: every client language needs its own implementation, every shard-map change is a fleet-wide rollout, and a misbehaving client can saturate a shard before you notice.

Proxy routing puts the shard map inside a middle-tier service. The application connects to the proxy as if it were a single database; the proxy parses each query, picks the shard, forwards the bytes, and streams the response back. Vitess's vtgate, Citus's coordinator node, ProxySQL, AWS RDS Proxy, and Envoy's SQL filters are the canonical examples. The cost: one extra network hop (typically 0.3–1 ms within a region), the proxy fleet has to scale with query load, and the proxy itself must not be a single point of failure. The benefit: the application does not know sharding exists, and you can reshard the entire cluster without redeploying a line of client code.

Production systems rarely pick one. Large deployments run client-aware inside a region for the latency, a proxy for cross-region traffic or analytics, and a shared source of truth for the shard map — etcd, ZooKeeper, or metadata tables. This chapter walks the two architectures, the hybrids, and the trade-offs you are about to make whether you know it or not.

You have sharded your database. Chapter 91 picked the strategy — hash, range, or directory. Chapter 92 picked the shard key. Now the application opens a connection and issues a query. Connects to what?

That is the routing question. Every sharded system has to answer it, and the answer defines the architecture. Does the application know the topology and connect directly to the right shard? Or does it connect to some intermediary that figures out the shard on its behalf? The first design is client-aware. The second is proxy. Both exist in production; both are correct; neither is free. This chapter is about when to pick each, and about the hybrids that cheat.

Client-aware architecture

A client-aware driver is a library linked into the application process that knows where every shard lives. It holds a copy of the shard map — the list of shards, their addresses, and the rule that maps a key to a shard. When your code calls driver.get(key), the library computes the destination (hash, range lookup, or directory consultation), picks a pooled connection to that destination (or opens one), and writes the query. The request travels exactly one hop. The response comes back the same way. No middle tier observes the traffic.

Why this is fast: the only latency on the wire is the physical round-trip between the application and the storage node. In a single AZ on modern cloud networks that is 0.2–0.5 ms. A proxy in the middle would add another 0.3–1 ms — not catastrophic, but doubled on the hot path. At 100,000 queries per second, a millisecond per query is a hundred CPU-seconds per wall-second of extra work, which is a proxy fleet you now have to run.

The canonical example is Cassandra's Java driver with TokenAwarePolicy. Cassandra places rows on the ring by hashing the partition key into a 64-bit token. Each node owns a range of tokens. The driver subscribes to the gossip protocol at startup — the cluster tells it which node owns which token ranges, and pushes updates when the topology changes. On every query, the driver hashes the partition key, walks its local token-range table, and picks a replica that owns it. The query goes directly to an owning node. The coordinator-node indirection that Cassandra used in its earliest releases is avoided entirely.

ScyllaDB's drivers extend the idea — they know not just which node owns a partition but which CPU core on that node owns it, and connect to a dedicated port per core. The query lands on the exact CPU that holds the data, with zero cross-core coordination inside the server. Redis Cluster works similarly: the cluster has 16,384 hash slots, each owned by a master; client libraries (Jedis, redis-py-cluster, ioredis) keep a local copy of the slot map, hash each key (CRC16 mod 16384), and connect to the owner. If their cache is stale, the server returns a MOVED redirect with the new location and the client updates.

The cost of client-aware is coupling. Every language the application uses needs its own driver, each reimplementing topology awareness. Every change to the topology has to reach every client; there is always a window where stale clients send queries to nodes that no longer own the data. The protocol includes error codes (MOVED, UnknownTopologyException) for correction — drivers catch, refresh, and retry. The system is eventually consistent in its view of itself.

Proxy architecture

A proxy puts the shard map behind a network interface. The application opens one TCP connection to a proxy process; the proxy is the database as far as the application is concerned. On every query the proxy parses enough to extract the shard key, computes the destination shard, forwards the query, reads the response, and streams it back. The application code is identical to a single-node deployment — it uses its native database driver (mysql-connector-python, psycopg2, the MySQL JDBC driver) and talks to the proxy on the standard port. Sharding is a server-side fact the application never observes.

Why this is operationally simpler: every change to the sharding scheme happens inside the proxy fleet. Resharding, moving a shard, adding replicas — none of it touches the application. You deploy the proxy change, the application keeps working. With client-aware, the equivalent change requires rolling out new driver versions or new topology caches to every application instance in every language — a coordination problem that grows with your engineering organisation.

The canonical example is Vitess. The proxy there is called vtgate, and it speaks the MySQL protocol on the client side. Your Rails, Python, Go, or Java app connects to vtgate:3306 as if it were a MySQL server. The app sends a SQL query; vtgate parses it, uses the VSchema (the directory that maps tables and shard keys to shards) to identify which MySQL shards it touches, forwards the query to each shard's vttablet, merges responses, and returns them. YouTube runs tens of thousands of MySQL shards this way; Slack runs hundreds; GitHub migrated MySQL to Vitess without changing the Rails code paths.

Citus plays the same role inside Postgres. Citus is a Postgres extension that turns one Postgres cluster into a distributed one. The coordinator node receives every client connection — clients speak the standard Postgres wire protocol, so psql and psycopg2 work unchanged. The coordinator parses the query, consults the metadata tables to find which workers hold which shards, pushes the query (or sub-queries) down to the workers, and aggregates. Workers are themselves Postgres instances with a slice of the table. The coordinator is the proxy, but it's also a Postgres server, so the client code is completely unaware that anything distributed is happening.

ProxySQL, AWS RDS Proxy, and Envoy's SQL filters fill different niches: ProxySQL handles MySQL connection pooling and read/write splitting (covered later); RDS Proxy is AWS's managed pool; Envoy's filters bring service-mesh observability to SQL traffic with a sidecar that turns the proxy hop into a localhost loopback.

The cost of proxy routing is the hop. Every query pays 0.3–1 ms of extra latency — the parsing, the routing lookup, the forward, the response aggregation. The proxy fleet has to scale with query load, not data size; at 1 million QPS you are running dozens of proxy instances. And the proxy cannot be a single point of failure, so you deploy many, front them with a load balancer, and engineer graceful degradation when one dies.

Vitess — the gold-standard proxy-based system

Vitess deserves its own section because it is the reference architecture for sharding-as-a-proxy, and because understanding its components clarifies what any such system has to provide.

Vitess separates concerns into three tiers:

vtgate is the stateless query router — horizontally scalable, any instance serves any query. It parses incoming SQL, consults the topology, rewrites queries for the shards (adding shard-key filters, routing cross-shard joins, re-aggregating results), and returns the combined response.
vttablet is a stateful sidecar next to each MySQL shard — connection pooling, query rate limiting, hot-row protection, and coordination with Vitess's online schema and resharding tools.
vtctld is the control-plane service for topology changes and reshards.

The glue is the topology service — an etcd or ZooKeeper cluster holding the authoritative shard map. Every vtgate subscribes for changes; when a reshard moves data, the topology is updated, every vtgate picks up the change, and queries route to the new shards. The application sees nothing.

The VSchema is Vitess's directory. It declares which tables are sharded, which columns are shard keys (called vindexes), and what kind of vindex each column uses — hash, lookup, consistent hash, custom. The query flow for SELECT * FROM users WHERE user_id = 42: client connects to vtgate over MySQL protocol; vtgate parses the SQL, applies the user_id vindex to get a keyspace ID, looks up the shard for that keyspace ID, forwards the query to that shard's vttablet, streams rows back. For cross-shard queries vtgate does the scatter-gather and re-aggregation; distributed transactions are supported via 2PC.

YouTube (where Vitess was built), Slack, GitHub, MarkHub, Square, and parts of ShopForge all run applications that predate their sharding and were ported to Vitess without rewriting the query layer. That is the main point of the proxy architecture: it fits a sharding strategy to an application that does not know it is sharded.

Citus — the Postgres-embedded proxy

Citus is Vitess's Postgres analogue, with one structural difference: the proxy is itself a Postgres server.

You install Citus as a Postgres extension on every node, designate one as the coordinator and the rest as workers, and run SELECT create_distributed_table('events', 'tenant_id'). Citus splits the table into shards (32 by default), distributes them across workers, and stores placements in metadata tables on the coordinator. Clients connect to the coordinator exactly as they would to a single Postgres instance — psql, the Rails pg gem, SQLAlchemy, anything Postgres-compatible. The coordinator runs the Citus planner, which rewrites queries into sub-queries against the workers and aggregates the results. Point queries that include the distribution column get pruning — the coordinator forwards to a single worker. Queries without the distribution column become scatter-gathers.

Citus shines for multi-tenant SaaS and real-time analytics on Postgres. Tenants get automatic isolation; analytics queries parallelise across workers. The proxy nature shows up in upgrades: adding a worker is SELECT citus_add_node(...) plus a background rebalance — no client change. The classic limitation is the coordinator as a bottleneck, especially for complex cross-shard joins. Newer releases support multi-coordinator deployments, but the core model is one coordinator.

ProxySQL — the MySQL proxy veteran

ProxySQL predates most of the cloud-native sharding proxies. It ships as a standalone daemon that speaks the MySQL protocol on both sides — clients talk to it like a MySQL server, and it talks to real MySQL servers as a client. Its main jobs:

Connection pooling. Many application connections, few backend connections. Crucial in PHP and Ruby apps where every request opens a fresh connection; without ProxySQL, a busy deployment can exhaust the backend's connection limit.
Read/write splitting. A query-rules engine (regex-based) routes SELECTs to a read-replica pool and writes to the primary. With GTID tracking, ProxySQL can route a read to a replica only if that replica has caught up past the session's last write.
Query rewriting and caching. Match a query with a regex, rewrite it, or answer from a small in-memory cache.
Failover smoothing. ProxySQL detects backend failures and reroutes connections without the application observing.

ProxySQL does not natively shard. If your MySQL is sharded, you typically combine ProxySQL with application-level routing: the app picks the shard, then connects to a ProxySQL instance that fronts that shard's master and replicas for read/write splitting and pooling. In a Vitess deployment, vtgate does what ProxySQL would do; in a DIY deployment, ProxySQL does a narrower version of the job.

Cassandra's token-aware driver — client-aware done right

Cassandra is the reference for client-aware routing.

A Cassandra cluster is a ring. Each node is assigned one or more tokens — 64-bit integers — and owns the range from the previous node's maximum to its own. Rows are placed by hashing their partition key (MurmurHash3, 64-bit). Replication duplicates each row onto the next N nodes clockwise.

When a client connects, the driver opens a control connection to one node and issues SELECT peer, data_center, rack, tokens FROM system.peers. It learns every node, their data centers, racks, and token assignments, and builds a local token map — a sorted array of (token, replicas) entries. On every query the driver extracts the partition key from the prepared statement, hashes it, binary-searches the token map for the owning replicas, and picks the closest one (same rack, then same DC, then remote) via a load balancing policy. The query goes directly. The replica executes locally without consulting other nodes. No coordinator-node indirection. No routing hop. The driver is the router.

Topology changes propagate via gossip; the driver subscribes to cluster events and refreshes its token map. The contrast with Vitess is stark — in Vitess, a thousand app instances connect to a hundred vtgate proxies that hold the shard map; in Cassandra, a thousand app instances each hold the shard map in-process and there is no proxy tier at all. Opposite ends of the spectrum, both work.

MongoDB — mongos proxy with driver shortcuts

MongoDB spans both worlds. A classic sharded MongoDB cluster has three tiers: the config servers (a replica set holding the shard map), the shards (each shard is itself a replica set), and mongos — stateless router processes that clients connect to. Applications connect to mongos; mongos consults the config servers (cached in-process), routes the query, merges results. This is the proxy architecture.

Modern MongoDB drivers add a client-aware shortcut. The driver bootstraps from an SRV record that lists the config servers. It can talk to the config servers directly to read the chunk map, then route targeted queries (those whose filter includes the shard key) straight to the owning shard, bypassing mongos. Cross-shard queries, non-targeted queries, and writes that need the router's parsing logic still go through mongos. The driver falls back to the proxy path for anything it cannot handle itself.

The hybrid reduces the mongos hop for the common point-lookup case while keeping the proxy available for everything else. It also means a MongoDB deployment without mongos at all is not practical — the router still owns the parts of query handling the driver does not reimplement.

The shared source of truth — shard map storage

Both architectures need to agree on "which shard owns key X". Where that agreement lives defines the failure modes.

etcd or ZooKeeper — consensus-backed KV stores with watch APIs. Vitess's topology service is one of these; writes are strongly consistent (Raft/Zab); watches propagate changes in milliseconds. The standard choice for production sharded systems.
Metadata tables on a coordinator — Citus stores shard placement in Postgres tables on the coordinator. Simple to operate (you already back up Postgres), but the coordinator becomes a central dependency.
Gossip — each node broadcasts; every node converges eventually. Cassandra's token map. Consistency is eventual; during topology changes nodes briefly disagree.
DNS SRV records — MongoDB bootstraps drivers from SRV. Simple, but DNS caching limits propagation speed.

Your choice constrains the routing layer. If the shard map lives in etcd, both proxies and clients can subscribe. If it lives in gossip, clients have to join the protocol. If it lives in a coordinator, clients have to query it to refresh — which is why Citus's coordinator does so much routing work itself.

Python — a minimal client-aware sharded DB client

Enough prose. Here is a sketch of a client-aware router in forty lines. It is not production-grade — no retries, no connection health checks, no topology refresh — but it shows what a client-aware driver is, at its core.

import hashlib
import psycopg2

class ShardClient:
    def __init__(self, shard_map):
        # shard_map: list of (host, port, database) tuples, indexed by shard id.
        self.shard_map = shard_map
        self.connections = {}  # shard_id -> psycopg2 connection

    def _shard_id(self, key):
        h = int(hashlib.md5(str(key).encode()).hexdigest(), 16)
        return h % len(self.shard_map)

    def _get_conn(self, shard_id):
        if shard_id not in self.connections:
            host, port, db = self.shard_map[shard_id]
            self.connections[shard_id] = psycopg2.connect(
                host=host, port=port, dbname=db
            )
        return self.connections[shard_id]

    def execute(self, key, sql, params=None):
        shard_id = self._shard_id(key)
        conn = self._get_conn(shard_id)
        with conn.cursor() as cur:
            cur.execute(sql, params or ())
            return cur.fetchall() if cur.description else None

    def close(self):
        for conn in self.connections.values():
            conn.close()

Thirty-odd lines. The calling code does client.execute(user_id, "SELECT ... WHERE user_id = %s", (user_id,)) and the router hashes user_id, picks the shard, opens (or reuses) a connection, runs the query. No proxy, no extra hop.

What the sketch omits: retries on MOVED errors, topology refresh (every production client-aware driver subscribes to some source of truth and refreshes the shard map), connection-pool-per-shard with health checks, circuit breakers for dead shards, cross-shard fan-out for non-sharded queries, and all the observability (timing histograms per shard, per-shard error rates) that is how you actually operate this thing. Every one of those concerns is real work, and every client library in every language in your stack has to reimplement them. That is the hidden cost of client-aware.

Proxy + client-aware combined — best of both

Large deployments rarely pick one. The recurring pattern:

Client-aware for the hot path within a region. OLTP traffic — "get user 42's cart" — uses a client-aware driver inside the application pod. One network hop, sub-millisecond tail latency.
Proxy for everything complicated. Cross-region traffic, analytics, ad-hoc tools, reporting dashboards go through a proxy tier that handles cross-shard joins, batched writes, and long-running aggregations.
One shared source of truth. Both clients and proxies subscribe to the same topology store. A reshard updates the source once; both layers pick it up.

A concrete example: a Cassandra OLTP application uses the DataStax Java driver with token-awareness for direct routing; the analytics pipeline uses Spark with the Cassandra connector, which is itself client-aware at the partition level but with a Spark driver process that acts as a proxy for query planning. Two different access patterns, two different routing layers, one cluster. In Vitess deployments, some teams run vtgate as a sidecar inside the application pod — the "proxy hop" becomes a localhost hop, indistinguishable from a local library call. Nominally proxy-based; client-aware in latency profile.

Slack's MySQL-to-Vitess migration

Slack's datastore was a monolithic MySQL in the early days, then a sharded MySQL with application-level routing, and finally Vitess. The transition is instructive.

Pre-Vitess. Slack's messages table was too big for one MySQL. The team sharded by team_id (workspace ID) — each workspace's messages lived on one of ten MySQL shards. A Ruby helper library computed the shard, looked up the host from a config file, and opened a connection. Application code had to remember to call shard_for(team_id) before every query; miss the call and you hit the wrong shard. Adding a shard was a multi-week effort.

Post-Vitess. The application connects to vtgate over the MySQL protocol; ActiveRecord code looks identical to single-MySQL code. vtgate holds the VSchema — messages sharded by team_id, users by user_id. Each query is parsed, the shard key extracted, the query forwarded to the right vttablet. Vitess runs around 200 MySQL shards in the messages cluster; the application has no idea.

Latency overhead. About 0.5 ms per query through vtgate. On a point lookup that used to take 1.2 ms end to end, Vitess makes it 1.7 ms. For p50 user-visible latency (hundreds of milliseconds dominated by API fan-out) this is negligible. Operational benefit. Zero application-code changes when shards move, split, or the cluster expands. Resharding from 100 to 200 shards — once a multi-quarter migration — is now a background vtctld command with online VReplication.

Client-aware routing (left) collapses the query into a single hop from the application to the owning shard — the driver holds the shard map and picks the destination. Proxy routing (right) routes every query through a middle tier that holds the shard map and forwards the query. Production deployments frequently run both — client-aware for in-region OLTP, proxy for cross-region traffic and analytics.

Common confusions

"Proxy means slow, client-aware means fast." The latency gap is usually 0.3–1 ms. For OLTP at thousands of QPS it matters; for analytics queries taking seconds it is irrelevant. The real difference is operational: proxies let you change the sharding scheme without touching the application.
"Sharding always requires a proxy." Cassandra, DynamoDB, ScyllaDB, Redis Cluster, and modern MongoDB all ship client-aware drivers. The SQL world leans proxy because SQL parsing is complex enough that reimplementing it in every language is painful; the NoSQL world leans client-aware because the query language is simpler and the latency matters more.
"You need the shard map everywhere." Only where routing happens. Proxy-only: only the proxy. Client-aware: every client instance caches it. Hybrid: both, synchronised through a shared source of truth.
"The proxy is a single point of failure." No production deployment runs one proxy. Vitess runs tens to hundreds of vtgate instances behind a load balancer; Citus's coordinator can be made HA with replication. The tier is redundant.
"Client-aware is always lower latency." Within a region, yes. Across regions, a region-local proxy that terminates the cross-region hop closer to the data can beat a poorly placed client-aware driver opening a connection from Singapore to Virginia.
"ProxySQL is a sharding proxy." ProxySQL routes, pools, and splits reads/writes for MySQL. It does not shard. Teams sharding MySQL at the application level often use ProxySQL per shard group for connection pooling, but the sharding decision is upstream.

Going deeper

Vitess internals — vtgate query planning

vtgate's query planner is a compiler. It takes a SQL AST, consults the VSchema, and produces a primitive tree — small operators (route, join, aggregate, subquery) each running on one or more shards. A point query is a single route primitive. A cross-shard join becomes a route-scatter-gather-hash-join tree. The planner is cached per-query-shape. The vindex abstraction is the extension point: a Go struct implementing Map(ids) -> keyspace_ids. Built-ins include Hash, Numeric, Lookup (two-table indirection), ConsistentLookup, and Region; custom vindexes let applications define their own functions, including directory-style lookups against a separate Vitess table.

Service-mesh-aware database proxies

Envoy's MySQL and Postgres filters apply service-mesh treatment to SQL traffic — traces, rate limits, mTLS, retries. An Envoy sidecar next to every pod intercepts 3306/5432, decorates queries with tracing headers, enforces RBAC per SQL statement, and feeds query patterns to the observability stack. For large organisations the uniformity with HTTP handling is the selling point.

Latency-based routing across regions

The router can pick a replica not just by ownership but by network distance. Cassandra's DCAwareRoundRobinPolicy prefers local-DC replicas; LatencyAwarePolicy tracks real-time latencies and down-ranks slow replicas. Vitess's cell concept groups shards by region. MongoDB's readPreference: "nearest" picks the replica with the lowest measured latency. Same idea, different surfaces: prefer the shard that will answer fastest, subject to consistency constraints.

Where this leads next

Routing is stage three of sharding. Chapter 94 covers resharding without downtime — the protocols that move data between shards while the cluster keeps serving traffic (Vitess's VReplication, MongoDB's chunk migration, online-schema tools). Chapter 95 covers global secondary indexes — how you query by a non-shard-key column without fanning out everywhere, with the lookup-vindex and materialised-index patterns Vitess and DynamoDB implement. After Build 12 you can read the architecture pages of any sharded system and know exactly what each layer is doing.

References

CNCF Vitess, Vitess Architecture — the canonical description of vtgate, vttablet, vtctld, and the topology service. The VSchema and vindex guides show how directory sharding is expressed declaratively, and the operator documentation describes how a Vitess cluster is deployed and scaled in production.
Vistron / Citus Data, Citus Distributed Tables and Query Processing — describes Citus's coordinator-worker model, the metadata tables that hold shard placements, and the planner that rewrites queries across shards. The co-location and reference-table sections explain how Citus handles cross-shard joins.
ProxySQL Project, ProxySQL Documentation — covers ProxySQL's query routing rules, connection pooling, read/write splitting, and high-availability deployment patterns. The architecture overview clarifies where ProxySQL sits in a sharded MySQL deployment versus where Vitess sits.
Apache Cassandra Project, Java Driver — Load Balancing and Token Awareness — the DataStax driver's token-aware policy is the reference implementation of client-aware routing. The document explains the token-ring subscription, the replica-selection logic, and the failure-recovery retries.
MongoDB Inc., Sharded Cluster Components — describes the mongos router, the config server replica set, and the driver-side SRV bootstrap. The targeted-query section shows exactly when modern drivers bypass mongos and when they still need it.
Amazon Web Services, RDS Proxy Concepts — AWS's managed MySQL/Postgres proxy for connection pooling and failover smoothing. The documentation is a useful contrast to Vitess: RDS Proxy does not shard; it proxies. Reading the two side by side clarifies what a "sharding proxy" adds over a "connection-pooling proxy".