The feature store as a materialized view

In 2019 a team at Swiggy built their first ML model for delivery-time prediction. The data scientist computed features in a Jupyter notebook over three months of orders, trained a model with delivery_time_p50_last_7_days per restaurant, and got an offline RMSE of 4.2 minutes. They shipped it. In production the model was off by 11 minutes. The bug was not the model. The bug was that in the notebook, delivery_time_p50_last_7_days had been computed from a fully materialised order table; in production, the same column was being read from an OLTP replica that lagged by 6 hours and excluded orders still in transit. Two different definitions of one column were each correct, and the model was trained on one and served on the other. That gap — between a feature's offline definition and its online value — is the entire reason feature stores exist.

A feature store is a system that gives a machine-learning model the same view of a feature in training (offline, historical) and in production (online, real-time). It is structurally a pair of materialised views over your existing warehouse and event streams — one optimised for point-in-time historical lookups, one optimised for sub-10ms key-value reads. The feature store is not a new source of truth; it is a discipline that says "the same SQL produced both stores".

What problem actually creates a feature store

A machine-learning model needs the same column to mean the same thing in two different places. In training, the data scientist asks "for every order between Jan and March, what was the restaurant's avg_prep_time_last_30_days as of the moment that order was placed?" In production, the prediction service asks "for the order being placed right now, what is avg_prep_time_last_30_days?" These are the same column. Each one is read by a different system — Spark over a warehouse for training, a key-value store for serving. If the two systems disagree on the column's value for the same (restaurant_id, timestamp) pair, the model breaks in production even when offline metrics look pristine.

This is training-serving skew, and it is not solved by "writing better SQL". It is solved by lifting the feature definition out of both notebooks and out of microservice code, putting it in one place, and compiling it into two physical pipelines that are guaranteed to compute the same thing. That single place is the feature store.

The feature store is the planner+two-pipelines abstraction sitting between your raw data and your model. Feast, Tecton, and Hopsworks differ in which of these boxes they own, not in the architecture.

"Materialised view" is the right mental model

A materialised view in a database is the result of a query, persisted to disk, kept in sync with its source tables as they change. CREATE MATERIALIZED VIEW restaurant_avg_prep_30d AS SELECT restaurant_id, AVG(prep_time) FROM orders WHERE event_ts > NOW() - INTERVAL '30 day' GROUP BY restaurant_id. The database guarantees: every read against the view returns a value consistent with the source tables as of some defined freshness.

A feature store is the same idea, except the "view" lives in two physical places at once — a row-oriented analytical store for training (Iceberg/Parquet on S3) and a key-value store for serving (Redis/DynamoDB) — and the freshness contract is per-feature rather than per-view. Why two physical stores rather than one: training reads "give me the value of avg_prep_time_30d for these 12 million (restaurant_id, ts) pairs at once" — that is a 4-billion-cell scan, perfect for Parquet column scans on S3. Serving reads "give me the value of avg_prep_time_30d for restaurant 8472, right now" — that is one key lookup, p99 needs to be under 10 ms, perfect for Redis. No single store gives both shapes within budget.

The mental shift this enables: stop thinking of the feature store as a special ML database. Think of it as a materialised-view system where the "view" happens to be served twice with different read shapes. Every architectural confusion about feature stores ("why two stores?", "why a Flink job AND a Spark job?", "why do I need a registry?") dissolves under this view.

The registry — the one place where feature definitions live — is the materialised view's CREATE statement. The Spark backfill job and the Flink streaming job are the two REFRESH strategies the planner emits. The offline and online stores are the two physical materialisations. The point-in-time training join is SELECT FROM mv AS OF event_ts. Once you see the database analogy, every box in a feature-store diagram has a name you already know.

A minimal feature store you can run on your laptop

The fastest way to internalise the materialised-view view is to build a 60-line one. The script below defines one feature, materialises it from a synthetic order table, writes the offline materialisation to a Parquet file, writes the online materialisation to a Python dict (standing in for Redis), and serves a point-in-time training query and an online query against the same definition.

# feature_store_mvp.py — feature store as a materialised view
import pandas as pd
from datetime import datetime, timedelta

# ---- Source: warehouse table (would be Iceberg/BigQuery in production)
orders = pd.DataFrame([
    {"order_id": i, "restaurant_id": (i % 4) + 1,
     "prep_time_min": 10 + (i % 7) * 2,
     "event_ts": datetime(2026, 4, 25, 9, 0) + timedelta(minutes=i*5)}
    for i in range(60)
])

# ---- The feature definition (the registry entry)
FEATURE = {
    "name": "avg_prep_time_30min",
    "entity": "restaurant_id",
    "ttl": timedelta(minutes=30),
    "agg": "mean",
    "source_col": "prep_time_min",
}

# ---- Materialisation: same logic, two sinks
def materialise(orders, feature, as_of):
    cutoff = as_of - feature["ttl"]
    window = orders[(orders.event_ts > cutoff) & (orders.event_ts <= as_of)]
    return (window.groupby(feature["entity"])[feature["source_col"]]
                  .agg(feature["agg"]).reset_index()
                  .rename(columns={feature["source_col"]: feature["name"]}))

# ---- OFFLINE store: snapshot per training row's event_ts (point-in-time)
def offline_pit_join(orders, feature, training_rows):
    out = []
    for _, row in training_rows.iterrows():
        mv = materialise(orders, feature, row.event_ts)
        match = mv[mv[feature["entity"]] == row.restaurant_id]
        v = float(match[feature["name"]].iloc[0]) if len(match) else None
        out.append({**row.to_dict(), feature["name"]: v})
    return pd.DataFrame(out)

# ---- ONLINE store: dict, refreshed by streaming job (here: every minute)
ONLINE = {}
def online_refresh(orders, feature, now):
    mv = materialise(orders, feature, now)
    for _, r in mv.iterrows():
        ONLINE[(feature["name"], r[feature["entity"]])] = r[feature["name"]]

def online_get(feature_name, entity_id):
    return ONLINE.get((feature_name, entity_id))

# ---- Use it
training_rows = orders[["order_id", "restaurant_id", "event_ts"]].sample(5, random_state=2)
print("Offline (point-in-time):")
print(offline_pit_join(orders, FEATURE, training_rows))

online_refresh(orders, FEATURE, datetime(2026, 4, 25, 13, 0))
print("\nOnline at 13:00:", online_get("avg_prep_time_30min", 1))

# Sample run:
Offline (point-in-time):
   order_id  restaurant_id            event_ts  avg_prep_time_30min
0        12              1  2026-04-25 10:00       16.0
1        37              2  2026-04-25 12:05       14.0
2         8              1  2026-04-25 09:40       12.0
3        51              4  2026-04-25 13:15       18.0
4        25              2  2026-04-25 11:05       14.0

Online at 13:00: 14.0

Walk through what just happened. FEATURE = {...} is the registry. It's a single Python dict, not a database table, but it represents the only place in the system where the feature's name, entity, time-to-live, and aggregation logic are defined. Every downstream pipeline reads from this. materialise(orders, feature, as_of) is the shared SQL — the one compute kernel that both pipelines call. Why share the kernel rather than write two implementations: this is exactly the bug Swiggy hit in the opening story. If materialise lives in one function and both sinks call it, training-serving skew is structurally impossible. If the offline pipeline reimplements the logic in Spark SQL and the online pipeline reimplements it in Flink Java, drift is inevitable on the first edit. offline_pit_join is the point-in-time join that training uses. For each training row, it asks "what would the feature have been at that row's event_ts?" — not now, not at warehouse-load time, but at the row's own moment in history. This is what production feature stores call the PIT correctness guarantee, and it is the operation that batch SQL cannot easily express without window-function gymnastics. online_refresh is the streaming-side materialisation. In production this would be a Flink job triggered by Kafka events; here it is a manual function call. The shape is the same: write the feature value into a key-value store keyed by (feature_name, entity_id). online_get is the 5-ms read path the prediction service uses. Why a tuple key (feature_name, entity_id): a single feature store typically holds 1000+ features per entity. Keying by (feature, entity) lets the online store retrieve a vector of features for one restaurant in a single multi-key fetch (Redis MGET, DynamoDB BatchGetItem). Keying by entity alone would force every refresh to read-modify-write the entity's full vector, which is a write-amplification disaster.

The toy is single-process Python and 60 lines. Production is the same six functions, distributed: the registry is a Postgres table, materialise is a compiled Spark or Flink plan, the offline store is Iceberg, the online store is Redis or DynamoDB, the PIT join is AS OF SYSTEM TIME SQL, and the refresh runs continuously. The shape does not change.

Where each feature-store platform sits in the picture

The seven boxes — registry, source warehouse, source streams, planner, offline pipeline, online pipeline, and the two stores — are the same in every system. What differs is which boxes the vendor owns and which you supply yourself. Drawing them in one frame makes the choice mechanical instead of philosophical.

The platform choice as ownership lines, not feature checklists. Razorpay's fraud team picks Tecton (middle column) because they have warehouse and Kafka in-house but don't want to operate the pipelines. PhonePe's risk team picks Hopsworks because RBI rules require everything inside their Mumbai DC, even the stores.

Once you have the materialised-view abstraction, each commercial platform is a different choice about which boxes it owns versus borrows.

Feast is registry-only. It stores feature definitions, generates the SQL plans, but expects you to bring your own offline store (BigQuery, Snowflake, Redshift) and your own online store (Redis, DynamoDB, Bytewax). Feast does the planner; you operate the pipelines. The point-in-time join is generated as Spark or DuckDB SQL that you run yourself. This is the right choice if your data team is already running Spark and Iceberg — Feast just adds the registry discipline without bringing a new database.

Tecton owns everything end-to-end. The registry is theirs; they run the Spark cluster that does offline backfills; they run the Flink cluster that does streaming materialisation; they manage the Redis-style online store; they bill you per feature-second of compute. The cost model is high but the operational surface is one vendor. Razorpay's fraud team uses Tecton for the same reason a startup uses Stripe instead of building a payment gateway: the per-feature cost is high but the team-month savings are higher.

Hopsworks is open-core, self-hosted. Like Tecton in scope but you run the cluster — Hopsworks bundles its own offline store (HopsFS, an HDFS variant), online store (RonDB, a NewSQL key-value system), and Spark/Flink for materialisation. Indian banks under RBI data-localisation rules pick Hopsworks because the entire stack runs inside their Mumbai data centre, not on a vendor's US infrastructure.

Build-it-yourself is what teams below 50 ML engineers usually settle on: Iceberg as the offline store, Redis as the online store, dbt + Airflow for the offline materialisation, a Flink job for the online materialisation, a YAML file in git for the registry. ₹0 in licensing, three engineers' time. The Swiggy delivery-prediction team rebuilt their stack this way after the 2019 incident. Why build-it-yourself remains viable even at scale: the seven boxes are all infrastructure your data platform team already operates for non-ML workloads. The "feature store" overhead is a registry table and a discipline of routing both pipelines through one definition file. If the team is already running Iceberg for the lakehouse and Redis for product caches, building the planner in 2,000 lines of Python is cheaper than the ₹2.5 crore/year a managed feature-store contract typically costs at 100M-events/day scale.

The choice is not "which is best" but "which boxes do we have the staff to operate". The materialised-view abstraction makes the choice tractable: list the seven boxes (registry, source warehouse, source streams, planner, offline pipeline, online pipeline, two stores), figure out which ones you already run, and pick the platform that fills the rest.

Common confusions

"A feature store is a new database we have to install." It is not. It is a discipline plus, optionally, a planner. The actual storage in 90% of feature stores is your existing warehouse (offline) plus a Redis or DynamoDB instance (online). If a vendor sells you "the feature store" as a new database, they're selling you the planner with the storage bundled — same architecture, different billing.
"The online store is the source of truth." It is not. The online store is a derived view, refreshed from the offline pipeline or the streaming pipeline. If Redis crashes and loses data, you rebuild from the offline store or the source warehouse. Treating the online store as authoritative is how feature inconsistency creeps in — operators "patch" Redis directly, drift starts immediately.
"Training-serving skew is solved by using the same SQL." Same SQL is necessary, not sufficient. The data the SQL runs over must also match — a warehouse snapshot lagging 6 hours behind the source-of-truth OLTP database produces correct-SQL-on-stale-data, which is the Swiggy bug. The skew check must include "is the offline source as up-to-date as the online source's recent window?".
"Feature stores are just for ML models." Feature stores are a special case of materialised-view systems. Some teams use them for non-ML pipelines — pricing engines, fraud rules, recommendation cards on a homepage — because the same online/offline-consistency property is valuable anywhere a derived value is read in two contexts.
"You need a feature store from day one of any ML project." You don't. A feature store is the answer when (a) you have ≥2 models, (b) you have ≥1 streaming feature, or (c) you have ≥1 production incident traceable to training-serving skew. Until then, point-in-time joins in dbt and a daily Spark batch are usually enough.
"The point-in-time join is just a regular join with a time filter." It is a restaurant_id equi-join combined with a "find the latest feature row whose event_ts <= training_row.event_ts" lookup. SQL expresses this with LATERAL or ASOF JOIN (DuckDB, ClickHouse, kdb+), or with a window function plus ROW_NUMBER. Doing it with a vanilla join produces leakage: training rows accidentally see future feature values, validation AUC inflates, model collapses in production.

Going deeper

Point-in-time correctness as the deepest contract

The hardest property a feature store provides is point-in-time (PIT) correctness on the offline join: training row r with event_ts = t must see exactly the feature value the online store would have served at time t, not the value as of any later moment. A naive batch join — "join training rows to today's snapshot of the feature table" — leaks future information into the training set and inflates offline metrics. Tecton, Feast, and Hopsworks all implement PIT joins as ASOF joins under the hood: for each training row, find the most-recent feature row with feature.event_ts <= training_row.event_ts. The implementation is non-trivial because doing this naively is O(N×M) — for every training row, scan the feature history. Production implementations sort both sides by timestamp and do a merge-join in O(N+M). Iceberg's hidden partitioning on event_ts plus a sort-by-event_ts inside each partition is what makes the merge-join I/O-efficient at lakehouse scale.

Why every team eventually splits "transformations" from "features"

Beyond a few hundred features, the registry develops a structural problem: a "feature" like avg_prep_time_30d is built on a "transformation" like prep_time = (delivered_ts - placed_ts).total_seconds() / 60. Two features (avg_prep_time_30d, p95_prep_time_30d) want to share the same transformation. Naive feature stores duplicate the transformation in every feature definition, and a fix to the transformation logic must be replicated everywhere. Tecton introduced BatchSource + Aggregation as separate concepts to factor this; Feast 0.30+ has on-demand transformations as first-class objects. The lesson is that a feature definition is really a pipeline of transformations terminating in an aggregation, and the registry should model both layers, not collapse them. Razorpay's internal feature store has 4,200 features but only 380 underlying transformations.

When the materialised-view metaphor breaks: feature embeddings

Vector embeddings — say, a 512-dimensional vector representation of a user's recent search history — fit the feature-store model awkwardly. The "value" is not a scalar but a 2KB blob; the online store key-value semantics still work, but the offline store's row-oriented storage starts to look more like a vector database. Production teams either (a) treat embeddings as opaque blobs in the existing feature store and accept that PIT joins still work, or (b) split embeddings into a dedicated vector store (pgvector, Pinecone, Milvus) and accept that they no longer share PIT machinery with scalar features. Most teams choose (a) until they need similarity search on the embeddings, at which point (b) becomes inevitable. Flipkart's recommendation team manages roughly 200 scalar features in their feature store and 6 embedding tables in a separate Milvus cluster — the discipline of "what goes where" is a senior-engineer judgement, not a vendor's choice.

The cost shape: why feature stores are the second-most expensive line in an ML platform

After GPU compute, feature-store infrastructure is typically the largest line item in an Indian fintech's ML budget — ₹40-80 lakh/month for a top-of-funnel team. The cost is dominated by the online store (Redis cluster sized for write throughput, not read throughput) and the streaming materialisation (Flink TaskManagers running 24/7, sized for the worst feature's state-store needs). Optimisation lessons learned the hard way: (1) do not stream features that don't need it — most features tolerate hourly batch and cost 10× less; (2) co-locate the Redis cluster with the prediction service to avoid cross-AZ network charges; (3) compact and TTL the online store aggressively — feature values older than the longest serving-time-to-live are pure waste. Cred publishes that 70% of their feature-store cost reduction came from tier (1) alone — auditing which features actually needed sub-second freshness.

Where this leads next

/wiki/online-features-key-value-lookups-at-p99 — the online half of the dual store, sized for 5 ms reads.
/wiki/offline-features-big-tables-point-in-time-correctness — the offline half, where PIT joins live.
/wiki/streaming-features-and-feature-freshness — when the dual pipeline must reach the online store in under a second.
/wiki/feast-tecton-hopsworks-architectures-compared — a side-by-side of the three platforms named above.
/wiki/training-serving-skew-the-fundamental-ml-problem — the failure mode that justifies the whole abstraction.

References

Eugen Hotaj et al., "Feature Stores: A Survey of the State of the Art" — academic survey of feature-store architectures, 2023.
Tecton, "What is a Feature Store?" — the canonical vendor exposition, useful for the planner-vs-pipelines split.
Feast documentation, "How Feast works" — open-source registry-first design.
Hopsworks, "The Modular Feature Store" — the open-core self-hosted approach.
Uber engineering, "Michelangelo: Uber's ML platform" — the original write-up that popularised the term "feature store" outside Twitter.
Chip Huyen, "Designing Machine Learning Systems" — chapter 7 covers the materialised-view view explicitly.
/wiki/streaming-features-and-feature-freshness — internal chapter on the freshness contract.
/wiki/cdc-iceberg-the-real-world-pattern — internal chapter on how CDC keeps the offline store fresh.