Feast, Tecton, Hopsworks: architectures compared

A platform engineer at Meesho is comparing three feature-store products on a Tuesday afternoon, with a deadline of Friday to recommend one to the head of ML. Feast is open-source and free; Tecton is a managed SaaS that costs ₹1.6 crore per year for their tier; Hopsworks ships an on-prem appliance that the security team likes because the data never leaves the VPC. All three claim to solve "feature store" — point-in-time correctness, online + offline parity, training-serving skew. So why are they priced an order of magnitude apart, and why does picking the wrong one cost 18 months of platform-team time? Because the three vendors drew the line between platform and customer code in three different places. That line — what the product owns versus what your team owns — is the entire architectural difference.

Feast is a thin metadata-and-SDK layer on top of storage you bring; the customer owns materialisation, infra, and SLAs. Tecton owns the entire feature lifecycle — definition language, materialisation engine, online + offline stores — as a managed service. Hopsworks ships a vertically-integrated platform you run yourself, with its own filesystem, online store (RonDB), and training pipeline. The right pick is the one where the boundary matches your team's capacity.

What "feature store" actually means as a product

A feature store is not one thing; it is six things stitched together. Feature definitions (the SQL or Python that computes a feature). A registry (which features exist, what they depend on, who owns them). An offline store (point-in-time-correct historical data for training). An online store (low-latency current data for serving). A materialisation pipeline (the job that keeps both stores in sync). An SDK (the library data scientists use to fetch features for training and inference). The architectural question is which of these six the vendor owns and which your team owns.

Feast owns the registry and the SDK. Everything else — the offline store (you bring BigQuery / Snowflake / Iceberg), the online store (you bring Redis / DynamoDB), the materialisation job (you write Spark / Flink) — is your team's problem. The product is essentially a YAML schema and a Python client. You get cheap and flexible; you give up turnkey.

Tecton owns all six. Feature definitions are written in Tecton's Python DSL, registered with the Tecton control plane, materialised by Tecton-operated Spark/Flink clusters, written to a Tecton-managed online store (DynamoDB or their own KV), queried via the Tecton SDK. You write feature definitions and SLA targets; Tecton runs the rest. You get turnkey; you give up control over the storage layer and pay SaaS prices.

Hopsworks also owns all six but is self-hosted. Their control plane, their HopsFS distributed filesystem (the offline store), their RonDB online store, their feature engineering jobs run on their Hopsworks cluster. You install it on your own Kubernetes — typically inside a regulated VPC where data sovereignty matters — and you own the operations. You get turnkey-but-on-prem; you give up the elasticity that managed SaaS provides.

Feast vs Tecton vs Hopsworks — who owns each layer of the feature storeThree columns showing the six layers of a feature store stack (definitions, registry, offline store, online store, materialisation, SDK), coloured by whether the vendor owns it or the customer does. Six layers, three boundary lines Layer Feast Tecton Hopsworks Feature definitions vendor (DSL) vendor (DSL) vendor (DSL) Registry vendor vendor vendor Offline store customer (BQ/SF) vendor vendor (HopsFS) Online store customer (Redis) vendor (DDB) vendor (RonDB) Materialisation customer (Spark) vendor (Spark/Flink) vendor (Flink/Spark) SDK vendor vendor vendor Free OSS ~6 owned by you ~₹1.6 cr/yr SaaS, 0 ops ~₹70 lakh/yr on-prem, you ops
The orange cells are vendor-owned; grey cells are customer-owned. Feast's six grey-or-mixed cells are the price of being free; Tecton and Hopsworks pay for the orange by either taking SaaS revenue or charging an enterprise licence.

How a feature definition flows in each system

The cleanest way to see the architectural difference is to follow one feature — failed_txn_rate_24h — from definition to inference in each platform. Pick a Razorpay-style fraud feature: for each card_id, the rate of failed transactions in the last 24 hours.

In Feast, you define the feature in a Python file:

# feature_repo/features.py — Feast feature definition
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
from datetime import timedelta

card = Entity(name="card", join_keys=["card_id"])

card_txn_source = FileSource(
    path="s3://razorpay-features/card_txn_24h.parquet",
    timestamp_field="event_ts",
    created_timestamp_column="materialized_at",
)

failed_rate_view = FeatureView(
    name="card_failed_rate_24h",
    entities=[card],
    ttl=timedelta(days=7),
    schema=[
        Field(name="failed_txn_rate_24h", dtype=Float32),
        Field(name="failed_txn_count_24h", dtype=Int64),
    ],
    source=card_txn_source,
    online=True,
)
# What Feast does when you run `feast apply` and `feast materialize`:
$ feast apply
Created entity card
Created feature view card_failed_rate_24h
Registry updated: features.db (SQLite)

$ feast materialize 2026-04-24T00:00:00 2026-04-25T00:00:00
Materializing feature view card_failed_rate_24h
Reading from FileSource: s3://razorpay-features/card_txn_24h.parquet
Writing to online store: redis://features.online.razorpay.internal:6379
Written 4,832,194 rows in 312s

Walk through what Feast actually does. FileSource(path="s3://...") — Feast does not own the offline data; it points at a file you produced with your own pipeline. The Parquet file at that path was written by a Spark job you wrote, scheduled by Airflow you operate. Feast is reading, not writing, the offline data. online=True — this flag toggles whether feast materialize will copy current values from the offline source into Redis. The materialisation step is a Python loop that reads Parquet rows and writes Redis keys — it is your serving-time consistency story, but the job that creates the Parquet in the first place is still entirely your responsibility. feast materialize — runs as a Python process. For 4.8 million rows it took 312 seconds, single-threaded; at PhonePe scale (2 billion cards) you would shard this across workers manually, because Feast has no built-in distributed materialisation engine. Why this matters: the bottleneck for Feast at scale is always the materialisation step, because it is single-process Python by default. Teams either write their own Spark/Flink job that mirrors Feast's logic and writes both Parquet and Redis in one pass — bypassing feast materialize entirely — or they outgrow Feast. The materialisation Python script is the friction point that pushes serious users to either Tecton or Hopsworks.

In Tecton, you define the feature in Tecton's DSL, and Tecton's control plane runs the job:

# fraud/features.py — Tecton feature definition
from tecton import batch_feature_view, Entity, FilteredSource
from tecton.types import Field, Float32, Int64
from datetime import timedelta

card = Entity(name="card", join_keys=["card_id"])

@batch_feature_view(
    sources=[FilteredSource(card_txn_source)],
    entities=[card],
    mode="spark_sql",
    online=True,
    offline=True,
    feature_start_time=datetime(2024, 1, 1),
    batch_schedule=timedelta(hours=1),
    ttl=timedelta(days=7),
    schema=[Field("failed_txn_rate_24h", Float32),
            Field("failed_txn_count_24h", Int64)],
)
def card_failed_rate_24h(card_txn_source):
    return f"""
    SELECT card_id,
           SUM(CASE WHEN status = 'FAILED' THEN 1 ELSE 0 END) AS failed_txn_count_24h,
           SUM(CASE WHEN status = 'FAILED' THEN 1.0 ELSE 0.0 END)
             / COUNT(*) AS failed_txn_rate_24h,
           MAX(event_ts) AS event_ts
    FROM {card_txn_source}
    WHERE event_ts >= TIMESTAMP'{{ feature_start_time }}'
    GROUP BY card_id
    """

You run tecton apply against the Tecton control plane and it does everything else — provisions a Databricks Spark cluster (or its own Spark-on-Kubernetes), runs the SQL on the schedule, writes both an offline store (Iceberg or Delta) and an online store (DynamoDB), keeps the registry, exposes a Python SDK that returns features for training (point-in-time-correct join) or serving (sub-10-ms KV lookup). When the cluster fails, Tecton retries. When the schema drifts, Tecton catches it. When materialisation is slow, Tecton scales the cluster.

In Hopsworks, the shape is similar to Tecton but the runtime lives on your Kubernetes:

# fraud_features.py — Hopsworks feature definition
import hopsworks
import hsfs

project = hopsworks.login()
fs = project.get_feature_store()

card_fg = fs.create_feature_group(
    name="card_failed_rate_24h",
    version=1,
    primary_key=["card_id"],
    event_time="event_ts",
    online_enabled=True,           # writes to RonDB online store
    statistics_config=True,        # auto-collect statistics for drift detection
)

# Compute the feature in PySpark (running on the Hopsworks cluster)
df = spark.sql("""
    SELECT card_id,
           SUM(CASE WHEN status='FAILED' THEN 1 ELSE 0 END) AS failed_txn_count_24h,
           SUM(CASE WHEN status='FAILED' THEN 1.0 ELSE 0.0 END)
             / COUNT(*) AS failed_txn_rate_24h,
           MAX(event_ts) AS event_ts
    FROM card_txn_events
    WHERE event_ts >= current_timestamp() - INTERVAL 24 HOURS
    GROUP BY card_id
""")

card_fg.insert(df, write_options={"start_offline_materialization": True})

The card_fg.insert(df) call writes to both HopsFS (offline) and RonDB (online) atomically, with the offline write committed via Hudi or Iceberg copy-on-write, the online write via RonDB's NDB protocol. Why one call writes to both: Hopsworks's FeatureGroup.insert() opens a single transaction that the platform fans out to two sinks. Failure on either rolls back. This is the platform doing what a hand-rolled Flink "shared materialisation" job (covered in the previous chapter) does, but as a built-in. The cost: you have to be running the Hopsworks cluster, with its own scheduler, executor, and storage. The benefit: you don't write that Flink job yourself. Compare this with Feast's feast materialize (Python loop, single-machine) and Tecton's managed Spark cluster (vendor's infra). The same logical operation has three very different implementations.

What you give up to get each one

The pricing differences (₹0 / ₹70 lakh / ₹1.6 crore per year for a Meesho-scale workload) are not arbitrary. They reflect what the vendor is taking off your plate and how operationally invested they are.

Feast (₹0 SaaS, but ~₹40-80 lakh/year of platform-team effort). You get a feature registry, a clean SDK, point-in-time correctness for training (via get_historical_features). You give up: distributed materialisation, online-store operations, schema drift detection, monitoring. A typical Feast user at 50M-card scale runs a 3-engineer platform team to keep Feast healthy — building the Spark jobs that actually populate the offline store, the Flink jobs that maintain the online store, the alerts when materialisation drifts, the Helm charts for the Feast server. The total cost is real; it is just paid in headcount instead of SaaS bills. Meesho ran Feast for 18 months before migrating off because the platform-team cost was higher than Tecton's licence at their scale.

Tecton (₹1.6 crore SaaS for ~50M-entity scale, 0 platform engineers). You get end-to-end materialisation, monitoring, alerting, a managed online store, point-in-time-correct training joins, transformation freshness SLAs. You give up: control over which storage primitives are used (Tecton picks DynamoDB or its KV; you can't use your existing Redis), a hard dependency on the Tecton control plane (if their AWS account has an outage, your serving stops), and the SaaS bill. Tecton's pitch is straightforward — at scale, ₹1.6 crore/year is cheaper than three platform engineers (₹2-3 crore loaded) plus the operational risk.

Hopsworks (₹70 lakh-1.2 crore/year for the licence, plus your own ops). You get the integrated platform but on your VPC. You give up: the elasticity that managed SaaS provides — you size your HopsFS cluster, your RonDB cluster, your Spark cluster up-front and re-provision when traffic grows. Hopsworks wins when data sovereignty is a hard requirement: regulated lenders (Bajaj Finserv, IDFC First), insurance (HDFC Life), telecoms (Jio) where sending feature data to a US-based SaaS is non-negotiable. The licence covers vendor-supported software; the operational cost is yours.

How a feature flows through each platform from definition to servingThree horizontal pipelines comparing Feast, Tecton, and Hopsworks — same input feature definition, three different runtime paths to the offline and online stores. One feature definition — three runtime paths Feast YAML def + Python SDK YOUR Spark job (you write + run) YOUR Parquet (BQ / Snowflake) feast materialize → Redis (Python loop, single-proc) Tecton Python DSL tecton apply Tecton-managed Spark + Flink (vendor runs both batch + streaming) Iceberg + DynamoDB (both vendor-managed) Hopsworks Python SDK FeatureGroup.insert Hopsworks Spark on YOUR k8s (vendor software, your ops) HopsFS + RonDB (both vendor primitives)
Same input on the left (a feature definition); same output on the right (offline + online stores). The middle column — who runs the materialisation engine — is the architectural divide. Feast hands you an empty box; Tecton runs it as SaaS; Hopsworks ships it on-prem.

The training-time API: where they diverge most

The single biggest API difference between the three is how a data scientist asks "give me the feature values that were valid at the moment each label was generated, for 50 million labels." This is the point-in-time-correct training join from the previous chapters.

# Feast — get_historical_features
from feast import FeatureStore
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
    entity_df=label_df,           # has card_id and event_ts columns
    features=["card_failed_rate_24h:failed_txn_rate_24h",
              "card_failed_rate_24h:failed_txn_count_24h"],
).to_df()
# Under the hood: AS OF JOIN written to BigQuery / Snowflake / DuckDB.
# At 50M labels × 2 features, a 4-node BigQuery slot pool runs this in 14 minutes.

# Tecton — get_features_for_events
from tecton import get_workspace
ws = get_workspace("razorpay-prod")
training_df = ws.get_feature_view("card_failed_rate_24h") \
    .get_features_for_events(events=label_df).to_pandas()
# Under the hood: AS OF JOIN executed by Tecton's managed Spark cluster.
# At 50M labels, runs in ~7 minutes — Tecton has pre-bucketed the offline store.

# Hopsworks — query.get_training_data()
fg = fs.get_feature_group("card_failed_rate_24h", version=1)
query = fg.select_all().as_of(label_df["event_ts"])
training_df, _ = query.read_from_offline().with_event_time(label_df).to_pandas()
# Under the hood: AS OF JOIN executed by Hopsworks's PySpark on HopsFS.
# At 50M labels, runs in ~9 minutes on a 12-node cluster.

The latency numbers are real and they matter. At 50M labels, Tecton at 7 minutes vs Feast at 14 minutes is the difference between a data scientist running 8 experiments per day vs 4. Over a year of model development, that is a 2× productivity gap that in practice costs more than the Tecton licence. Why Tecton is faster: Tecton pre-buckets the offline store by entity_id and pre-sorts by event_ts, so the AS OF JOIN can skip 90% of partitions. Feast leaves bucketing to whatever you wrote your Parquet job to do — typically nothing. Hopsworks bucketing is configurable but not on by default. The performance gap is not a vendor mystery; it is a consequence of the offline-store engineering investment Tecton has made and Feast has not.

Common confusions

Going deeper

Tecton's two-mode materialisation: batch and streaming as one DAG

Tecton's killer architectural feature is that a single feature definition can compile to both a batch Spark job (refresh every hour) and a streaming Flink job (refresh every second), with the platform handling backfill from batch and forward-fill from streaming. The customer writes one SQL definition; Tecton emits two physical pipelines that share state — at the feature-group level, the streaming job writes to the same online store as the batch job, and the batch job's offline-store writes serve as the historical record the streaming job's checkpoints can recover from. This is hard to build in Feast (you would write the Spark job and the Flink job separately and manage their consistency yourself) and Hopsworks supports it but with manual configuration. The lesson: when feature freshness varies across features (some 24-hour, some 30-second), the unified materialisation engine pays for itself.

Hopsworks's as_of semantics and the timeline view

Hopsworks's offline store is built on Apache Hudi, which gives it true time-travel: any feature value at any timestamp in the past can be retrieved with fg.as_of(ts).read(). This matters during training-serving skew incidents — when the model performs differently in production than in training, you can replay the exact feature values the inference path would have seen at any historical timestamp, rather than reconstructing from logs. Tecton has a similar capability via the offline store's snapshot isolation; Feast leaves this to whatever your offline store does (BigQuery time travel is 7 days; Snowflake's is configurable; Iceberg's is whatever you set retention to). The "time-travel for training-serving skew investigation" use case is a strong argument for Hopsworks at regulated institutions where audit replay is a compliance requirement.

The real cost driver: who tunes the online store under load

Online-store latency is "good enough" by default in all three; what differs is who tunes it when traffic patterns change. At Razorpay, when Diwali week brought 4× normal traffic, the team using Tecton phoned a Tecton solutions engineer who scaled the DynamoDB tables and DAX nodes from a control panel; the team using Feast had a 3 a.m. on-call where the platform engineer manually re-sharded their Redis cluster. The on-call cost was real — over the year, the Feast team's pages-per-week ran 6× higher than the Tecton team's. Why this gap is structural: Tecton's SLA is contractual — they have a paid pager-rotation that cares about your store's tail latency. Feast's SLA is whatever your platform team can deliver. The pager difference is the visible part of an invisible difference in operational maturity that you only see under load.

Why neither Feast nor Hopsworks won the embedded-vector market

Vector embeddings as features are growing fast (recommendation models, LLM RAG, fraud anomaly scores). Tecton added vector support in 2025; Feast 0.40+ supports it via pluggable backends. Hopsworks supports it via a separate Vector Database product, not its main feature store. The architectural reason is that vector search is not a key-value lookup — it requires similarity indexes (HNSW, IVF) that don't fit naturally into the row-oriented online stores all three platforms started with. Tecton's response was to integrate Pinecone and Weaviate as backend options; Hopsworks built a separate vector index alongside RonDB. This is the next architectural frontier — feature stores that natively understand both scalar and vector features without bolt-ons.

Where this leads next

References