Column-level access and row-level security

A new junior analyst joins the risk team at a Bengaluru fintech on a Monday. Her dbt project queries a 200-million-row merchants table that holds GSTIN, PAN, contact email, monthly GMV, and a fraud-score column. By the rules she has been told, she should be able to read GMV and fraud-score but not PAN, and she should only see merchants in the four southern states her team handles. By Tuesday afternoon her notebook is happily printing PANs for merchants in Maharashtra, because the only thing standing between her and the full table was a GRANT SELECT at the table level and a comment in the dbt repo that said "please filter by region". Column-level access and row-level security exist because that comment is not a control. The warehouse has to enforce the filter before it returns the rows.

Column-level access (CLS) hides specific columns from specific principals; row-level security (RLS) hides specific rows. Both are enforced by the query planner, not by application code, so a curious analyst cannot bypass them by writing a different query. The hard part is not the syntax — it is keeping the policies consistent across views, materialized views, joins, and copy-on-write table snapshots so that nothing leaks through a back door.

What CLS and RLS actually do, and why they live in the planner

CLS and RLS are predicates the query planner inserts into your query before it runs. When the analyst writes SELECT * FROM merchants, the warehouse rewrites it — invisibly to her — into something like SELECT gstin, gmv_monthly, fraud_score FROM merchants WHERE region IN ('KA','TN','KL','AP'). The PAN column is gone because her role lacks the column-level grant; the rows from Maharashtra are gone because her role's RLS policy says region IN (your-team's-states). The rewrite happens before the optimizer picks an execution plan, so the unauthorized column never gets read from storage and the unauthorized rows never get scanned.

This matters because the alternative — filter the result after the query runs, in the application — has been broken every way it can break. An analyst who connects with the BI tool's service account inherits the application's filter; the analyst connecting directly with psql does not. A SaaS vendor who uses your warehouse credentials to run their own query does not see your application filter at all. CLS and RLS attach the predicate to the role, not the application, so the analyst sees the same filtered view whether she queries through Tableau, dbt, a Jupyter notebook, or psql.

CLS and RLS as planner-level rewritesDiagram showing a user query entering the planner, the planner inserting column-level projection and row-level WHERE predicate based on the role's policies, and the rewritten query going to the executor. Below, a contrasting "application-side filter" path shows how a service-account bypass leaks data when the filter is not in the planner. Where the policy gets applied changes who it protects User query SELECT * FROM merchants Planner inject role's CLS + RLS Optimizer picks plan Executor scans only allowed vs. application-layer filter (broken) User query via psql, bypasses BI Warehouse no role-level filter App-side WHERE never executed Leak all rows The role-attached predicate travels with the principal — the application filter does not. Rule of thumb: If a policy is enforceable only when the query goes through your specific application, it is not a security control — it is a UX hint.
The planner is the only place the rewrite catches every client. A WHERE clause inside the BI tool's SQL template does not survive an analyst opening DBeaver with the same credentials.

Why the planner and not the storage layer: the storage layer (Parquet files in S3, micro-partitions in Snowflake) does not know which role is reading. It only knows what bytes were requested. By the time the read reaches storage, the policy has either been applied (the planner narrowed the scan) or it has been bypassed. There is no "CLS at the storage layer" mechanism in any major warehouse — the entire mechanism lives in the planner.

CLS in practice — masking, projection, and the column-grant ladder

Column-level access shows up in three flavours, ordered by how strict they are.

Column-level GRANTs. The simplest form: GRANT SELECT(gstin, gmv_monthly) ON merchants TO ROLE risk_jr;. The role can SELECT gstin, gmv_monthly FROM merchants. Try SELECT * FROM merchants and the planner errors with "permission denied for column pan". Postgres, Snowflake, BigQuery, and Redshift all support this directly. The cost: every analyst who joins or leaves a team triggers a maintenance task on every grant, every column. Forty teams × twenty tables × fifteen sensitive columns is 12,000 grant rows that have to stay in sync with HR.

Dynamic data masking. Instead of denying access, the planner returns a redacted value. CREATE MASKING POLICY pan_mask AS (val STRING) RETURNS STRING -> CASE WHEN CURRENT_ROLE() IN ('compliance') THEN val ELSE 'XXXXX' || RIGHT(val, 4) END; then ALTER TABLE merchants MODIFY COLUMN pan SET MASKING POLICY pan_mask;. Now SELECT pan FROM merchants returns XXXXX1234 for everyone except the compliance role. Why masking instead of denial: queries that join, group, or filter on the column still work — COUNT(DISTINCT pan) still returns the right cardinality because the mask is applied after aggregation in some warehouses (Snowflake) and before in others (BigQuery). Masking is the form CLS takes when the analyst legitimately needs to use the column for analysis but should never see the raw value.

View-based projection. The legacy form: create a view merchants_safe that selects only the allowed columns and grant on the view, not the table. CREATE VIEW merchants_safe AS SELECT merchant_id, region, gmv_monthly, fraud_score FROM merchants;. The view inherits the underlying table's RLS but applies its own column projection. The catch is that the analyst must remember to query merchants_safe and not merchants — and any junior who is allowed to see any column of merchants can write a join through the table directly. View-based projection is enforceable only if you also revoke base-table access.

In practice the production pattern most Indian fintechs settle on is: column-level GRANTs for the deny-by-default sensitive columns (PAN, Aadhaar, mobile, email), masking policies for columns where partial visibility is useful (last-4-digits of cards, hashed mobile), and views for legacy BI tools that cannot deal with masked columns. The three layer together — a view that selects a subset of columns from a table whose sensitive columns are individually masked.

Building a tiny RLS engine in 80 lines

Row-level security feels harder than CLS because the predicate depends on the current user's attributes — their team, their region, their security clearance — and those attributes change. Build a tiny engine to see how it composes.

# rls_engine.py — minimal row-level security predicate injector
import sqlite3
from dataclasses import dataclass
from typing import Callable, Dict

@dataclass
class Principal:
    name: str
    role: str
    attrs: Dict[str, str]   # e.g. {"region": "south", "states": "KA,TN,KL,AP"}

# Policies are functions: principal -> SQL predicate fragment
PolicyFn = Callable[[Principal], str]

class RLSPlanner:
    def __init__(self):
        self.policies: Dict[str, PolicyFn] = {}

    def add_policy(self, table: str, fn: PolicyFn):
        self.policies[table] = fn

    def rewrite(self, table: str, base_sql: str, principal: Principal) -> str:
        if table not in self.policies:
            # default-deny: no policy means no access (fail closed)
            raise PermissionError(f"no RLS policy for {table}")
        predicate = self.policies[table](principal)
        # naive injection — production parsers use AST rewriting, not string concat
        if " WHERE " in base_sql.upper():
            return base_sql + f" AND ({predicate})"
        return base_sql + f" WHERE ({predicate})"

# Define policies
def merchants_policy(p: Principal) -> str:
    if p.role == "compliance":
        return "1=1"                                 # full access
    if p.role == "risk_jr":
        states = p.attrs.get("states", "").split(",")
        in_clause = ",".join(f"'{s.strip()}'" for s in states)
        return f"region IN ({in_clause})"
    if p.role == "vendor_external":
        return "is_public = 1"
    return "1=0"                                     # fail-closed for unknown roles

# Setup
db = sqlite3.connect(":memory:")
db.executescript("""
CREATE TABLE merchants(merchant_id INT, region TEXT, gmv_monthly INT, is_public INT);
INSERT INTO merchants VALUES (1,'KA',45000,1),(2,'MH',82000,0),(3,'TN',15000,1),
                             (4,'DL',91000,0),(5,'KL',8000,1);
""")

planner = RLSPlanner()
planner.add_policy("merchants", merchants_policy)

riya = Principal("riya", "risk_jr", {"states": "KA,TN,KL,AP"})
asha = Principal("asha", "compliance", {})
karan = Principal("karan", "vendor_external", {})

for who in (riya, asha, karan):
    sql = planner.rewrite("merchants", "SELECT merchant_id, region FROM merchants", who)
    rows = db.execute(sql).fetchall()
    print(f"{who.name:8s} ({who.role:16s})  rows={rows}  sql={sql}")
# Output:
riya     (risk_jr         )  rows=[(1, 'KA'), (3, 'TN'), (5, 'KL')]  sql=SELECT merchant_id, region FROM merchants WHERE (region IN ('KA','TN','KL','AP'))
asha     (compliance      )  rows=[(1, 'KA'), (2, 'MH'), (3, 'TN'), (4, 'DL'), (5, 'KL')]  sql=SELECT merchant_id, region FROM merchants WHERE (1=1)
karan    (vendor_external )  rows=[(1, 'KA'), (3, 'TN'), (5, 'KL')]  sql=SELECT merchant_id, region FROM merchants WHERE (is_public = 1)

Walk through what makes this an engine rather than a hand-applied filter. PolicyFn = Callable[[Principal], str] is the policy abstraction — a policy is a pure function from principal to predicate. Why a function and not a static string: the predicate depends on attributes the principal carries (team, region, clearance), which are runtime values. A static WHERE region = 'KA' would only protect one specific user; a policy function generates the right predicate for any principal in the role. The default-deny clause raise PermissionError(...) if no policy is registered is the fail-closed default — the absence of a policy means no access, not full access. Why fail-closed instead of fail-open: a forgotten policy on a sensitive table would otherwise leak silently. Fail-closed surfaces the missing policy at first query, loud, with a stack trace; fail-open surfaces it months later in a compliance audit. The merchants_policy function with a fall-through 1=0 is the same pattern at the per-policy level — unknown roles get an always-false predicate so their queries return zero rows rather than error or full table. The string-concat injection is the part you replace in production: real warehouse planners rewrite the AST, not the SQL text, because string concat breaks on subqueries, CTEs, and complex predicates. Why AST rewriting matters: a query like SELECT * FROM (SELECT * FROM merchants) m WHERE m.gmv > 1000 has the outer WHERE in a different scope from the inner FROM. String concat would inject the policy in the wrong place. AST rewriting walks the tree and inserts the predicate at every reference to the protected table — including inside CTEs and subqueries. The Principal dataclass with attrs is the attribute-based part of attribute-based access control (ABAC) — the policy can branch on any attribute the principal carries, not just role. This is what lets you write "south-region team in Bengaluru office on weekdays" as one policy.

The toy is single-table, single-predicate, no caching. Production engines (Snowflake's RAP, BigQuery's row access policies, Postgres RLS via CREATE POLICY) add: predicate caching by role for query-plan reuse, hierarchical roles (risk_sr inherits risk_jr's predicate plus extra access), policy versioning so changes can be audited, and security barrier views that prevent the optimizer from leaking data through side-channel inference. The shape — fail-closed default, principal-to-predicate function, planner-level injection — does not change.

Snowflake, BigQuery, Postgres — same idea, different syntax

Every warehouse implements CLS and RLS, with mostly-overlapping but slightly-different vocabulary.

Snowflake uses masking policies for CLS and row access policies for RLS. A masking policy is attached to a column with ALTER TABLE ... MODIFY COLUMN ... SET MASKING POLICY; a row access policy is attached to a table with ALTER TABLE ... ADD ROW ACCESS POLICY. Both policies are SQL functions that branch on CURRENT_ROLE(). Snowflake's distinguishing feature is that the policies follow the data through secure views and secure UDFs — a secure view does not expose its query plan to the user, so an analyst cannot deduce the policy by reading EXPLAIN output.

BigQuery uses policy tags (Data Catalog) for CLS and row access policies for RLS. Policy tags are taxonomy-style: you create a taxonomy "PII" with child tags "PAN", "Aadhaar", "Email", attach the tag to columns, then grant roles/bigquerydatapolicy.maskedReader on the tag. A user with the maskedReader role on the PAN tag sees masked PANs everywhere — across every table, every dataset. Row access policies are SQL WHERE predicates attached to a table for a given user/group: CREATE ROW ACCESS POLICY south_team ON merchants GRANT TO ('group:south-team@company.com') FILTER USING (region IN ('KA','TN','KL','AP')).

Postgres uses GRANTs for column-level access and CREATE POLICY for row-level security with ALTER TABLE ... ENABLE ROW LEVEL SECURITY. The policy syntax: CREATE POLICY south_only ON merchants FOR SELECT USING (region IN (SELECT state FROM user_states WHERE username = current_user));. A security barrier view (CREATE VIEW ... WITH (security_barrier = true)) prevents the planner from pushing predicates through the view in ways that would let the user infer hidden rows. Without it, a query like SELECT * FROM merchants_view WHERE leaky_function(pan) could call leaky_function on rows the user is not supposed to see, leaking the existence (and content) of those rows through side effects.

Databricks Unity Catalog uses column masks and row filters — both are SQL functions registered on the catalog that the planner injects when the protected table is queried. Unity Catalog's distinguishing feature is the centralised catalog — one policy applies across every workspace and every cluster, not per-table.

The lesson for a platform team is not "which warehouse has the best RLS" — they are all roughly equivalent in expressiveness — but "which model fits how your teams actually look up sensitive data?" If your sensitivity classification lives in a central data catalog and applies across many tables (PAN is PAN whether it appears in merchants, users, or kyc), BigQuery's tag-based model is the cleanest fit. If your classification is table-specific and varies by business context, Snowflake's per-column policies are easier. If you are running open-source Postgres at the edge of your platform, native RLS via CREATE POLICY is what you have, and the rest of the world's syntactical comforts are not available.

CLS/RLS attachment model across warehousesDiagram contrasting four warehouses' policy attachment models: Snowflake attaches per column and per table with role checks; BigQuery uses central policy tags via Data Catalog; Postgres attaches policies on tables with security barrier views; Databricks Unity Catalog attaches at the catalog level applying across workspaces. Where the policy is attached changes blast radius Snowflake per-column mask per-table row policy CURRENT_ROLE() branch in policy Secure views hide query plan Best when: per-table classification BigQuery policy tags (Data Catalog taxonomy) Tag → role grant applies cross-table Row access policy SQL FILTER USING Best when: central PII taxonomy Postgres column GRANT CREATE POLICY USING / WITH CHECK on SELECT, UPDATE... security_barrier stops side-channel Best when: OLTP source-of-truth Unity Catalog column masks + row filters at catalog Cross-workspace one policy = one place SQL UDF policy tested with unit tests Best when: multi-workspace lake
The differences are about *where the policy is attached* — column, table, tag, or catalog. The deeper the attachment point, the broader the blast radius when the policy changes.

The leaks — where a poorly-applied policy still leaks data

CLS and RLS prevent the obvious "select all" leak. They do not prevent inference, side-channels, or the back-door joins that production teams discover only in incident reviews.

Joins through unprotected tables. merchants has RLS on region. The payments table does not, but it has a merchant_id. An analyst writes SELECT p.amount, m.gstin FROM payments p LEFT JOIN merchants m USING (merchant_id) WHERE p.amount > 1e7. The RLS on merchants filters out non-south merchants — m.gstin is NULL for them — but the analyst now has a complete list of which merchant_ids are non-south because their gstins came back NULL. The fix is RLS on payments too, or a security-barrier view that joins both tables and applies the filter at the view level.

Aggregate inference. A user has access to COUNT(*) over merchants but not to individual rows. They write SELECT COUNT(*) FROM merchants WHERE pan = 'AAAPP1234L'. If the count is 1, that PAN exists; if 0, it does not. Differential privacy and k-anonymity floors are the formal defence; the practical defence is to disallow predicates on sensitive columns in non-privileged roles.

Policy mutation through DDL. An analyst with ALTER TABLE on a downstream table can CREATE TABLE leak AS SELECT * FROM merchants_view. If the materialisation runs as the analyst's role, RLS applies and only their rows land in leak. If it runs as a service account with broader access (a common dbt setup), the materialisation contains rows the analyst never had access to — and now they have SELECT on leak. The fix is forcing CTAS / materialised views to inherit the invoking role's policies, which Snowflake does by default, BigQuery does not, and Postgres needs an explicit SECURITY INVOKER setting.

Column-mask bypass via CASE statements. A masking policy returns XXXXX1234 for non-privileged roles. The analyst writes SELECT pan, CASE WHEN LENGTH(pan) > 5 THEN 'long' ELSE 'short' END FROM merchants. If the mask is applied after the projection (the wrong order), the CASE sees the unmasked PAN and the analyst can bit-by-bit reconstruct it. Snowflake applies masks at projection time so this fails; BigQuery applies them via Data Catalog and the same trick has been demonstrated and patched twice in 2023–24.

The pattern is: a leak-prevention mechanism only prevents what it explicitly checks. Anything the policy author did not anticipate — joins, aggregates, DDL, masked-column-as-input — is a back door until someone notices.

Common confusions

Going deeper

Attribute-based access control (ABAC) vs role-based access control (RBAC)

RBAC says "role X can read column Y". ABAC says "principal with attributes (team, region, clearance) can read column Y if (their team owns this data, their region matches, their clearance is at least Y's classification)". RBAC is simpler — fewer rules, fewer attributes — but explodes combinatorially: 40 teams × 4 regions × 3 clearance levels is 480 roles. ABAC scales to that combinatorial space because the policy branches on attributes at evaluation time. The hard part of ABAC is keeping the attribute store fresh — when an engineer transfers from the south team to the north team, every query they run after that transfer has to see the new attribute. Snowflake supports both via tag-based RBAC and CURRENT_ROLE-branched policies; BigQuery leans into ABAC via policy tags and IAM conditions; Postgres natively supports only RBAC, with ABAC layered on via custom policy SQL.

Why CDC pipelines need separate policy thinking

Change data capture (CDC) sends every changed row downstream. If the source table has RLS but the CDC target does not, the analyst on the target sees rows their RLS would have hidden in the source. Razorpay's data team learned this the hard way in 2023 when their merchant CDC stream was consumed by a sandbox warehouse that had no RLS — every region's data was visible to every team's BI dashboard for six weeks before someone noticed. The right pattern is to apply the same RLS at the CDC sink, or to filter out the policy-protected rows in the connector itself. Debezium can be configured with row-level filters; Fivetran's column-block lists handle CLS but not RLS, which is why teams running Fivetran on PII-heavy tables typically build a downstream view layer to re-apply the protection.

Differential privacy as a CLS extension

When the analyst legitimately needs aggregate statistics over a sensitive column but should not be able to infer individual rows, plain CLS is not enough — COUNT(*) WHERE pan = ? leaks individual existence. Differential privacy adds calibrated noise to query results so individual rows cannot be inferred from aggregates. Aadhaar's research-data sharing programme uses a (ε, δ)-differential privacy mechanism on demographic queries, and Apple's iOS keyboard suggestions use it at the device level. Differential privacy is the right model when "release this aggregate to the analyst, but no exact-row inference" is the requirement; for plain CLS/RLS where the requirement is "they cannot read this column or these rows at all", it is overkill.

Policy testing — yes, you can unit-test RLS

The mistake teams make is treating policies as configuration. Policies are code: they have edge cases, they regress, and a missing edge case is a security incident. The best teams treat the policy as a function and write unit tests: assert query_as("riya", "SELECT count FROM merchants").row_count() == expected_for_south_team; assert query_as("karan-vendor", "SELECT * FROM merchants").row_count() == 2. The dbt community has been pushing this pattern with the dbt-row-level-security package; Snowflake users with terraform-managed policies can test by spinning up a sandbox account with synthetic test data and running the policies against principal stand-ins. It is more work than treating the policy as a one-time SQL file, but the alternative is finding a regression six months later when the audit catches it.

Where this leads next

References