Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

Edge compute: serverless at the edge

It is 19:42 IST on a Friday and CricStream is forty-three minutes into the second innings of the IPL final. The viewer count crossed 28 million two overs ago. Anuradha, the platform engineer who owns the personalisation service, is watching a graph she has never liked: the p99 latency for the "Up Next" recommendation panel, served from a single ap-south-1 origin in Mumbai. For viewers in Delhi the number is 38ms — a single back-haul hop. For viewers in Chennai it is 71ms. For the 4 million viewers connected from Singapore, Dubai, London, and the Bay Area diaspora, the p99 is 312ms — and 312ms means a visible blank panel during the over break, which means a visible drop in the engagement metric the product team measures her on. Anuradha has spent the last quarter migrating that panel from her Mumbai monolith into 312 edge points of presence, and tonight is the first IPL final on the new architecture. The p99 from Singapore is 19ms. From London, 22ms. From the Bay Area, 31ms. The Mumbai origin is barely warm. This is the promise of edge compute, and it is genuinely real — but the architecture Anuradha shipped is not a smaller version of her Mumbai service. It is a different machine with a different programming model, and the parts she had to rewrite were not the parts she expected.

Edge compute means running your code in hundreds of geographically scattered points of presence (PoPs) — typically the same physical infrastructure your CDN already operates — so a user's request is processed within tens of kilometres of their device, not after a 200ms RTT to your origin region. The "serverless" part is that you do not provision, scale, or even particularly think about individual edge servers; you ship a function and the platform places, replicates, and invokes it for you. The trade-off is that "your code" no longer means a regular server process. It means an isolate, a Wasm module, or a tightly-sandboxed worker — with no local disk, no persistent memory, a strict CPU-time budget per request, and a runtime that is not the one you used in production yesterday.

Edge compute runs your code in 200–500 globally distributed PoPs so user requests are served within ~10ms instead of after a cross-ocean RTT. The mechanism is a shared-runtime sandbox (V8 isolates, Wasm) that boots in microseconds, has no local disk, and enforces strict CPU/memory caps per request. The wins — latency, automatic geo-distribution, no cold-start tax — are real for read-heavy and transformation-heavy workloads. The losses — no persistent state, no long-running connections, no full Node/Python runtime, and a fundamentally different consistency model with your origin database — make edge a poor fit for stateful business logic. The question is never "can I move this to the edge" but "which slice of this request can run at the edge, and which has to stay in the origin region".

What "the edge" actually is

When a vendor says "we run your code at the edge", the picture in your head is probably "a smaller copy of my server, in many cities". That picture is wrong on three axes simultaneously, and the rewrites you will face downstream all stem from those three differences.

The edge path keeps the request inside the user's continent. The win is the 160ms cross-ocean leg you no longer pay. The cost is that the box on the right runs a different runtime than the one on the left — no disk, no Node, strict CPU budget, replicated KV instead of your origin database.

The three axes the picture-in-your-head misses:

Compute primitive. The platform runs hundreds of customer functions on each PoP. To keep the per-tenant cost trivial, the runtime is almost never a Linux process per request. It is a V8 isolate (Cloudflare Workers, Deno Deploy, Vercel Edge Functions), a Wasm module (Fastly Compute@Edge, Shopify Oxygen), or a heavily-cropped container (AWS Lambda@Edge, which is the slowest of the three for exactly this reason). Boot time is microseconds for isolates, low milliseconds for Wasm, hundreds of milliseconds for cropped Lambdas. Your Node-with-200-npm-deps server does not fit. Why: a Linux process has ~10ms boot cost and ~50MB minimum RSS. Multiply by 300 PoPs × thousands of tenants × cold start on every fanout, and the platform's cost-per-request becomes uneconomic. Isolates share a single V8 process, snapshot heaps, and boot in 5ms — that is what makes "function in 300 places" affordable.
Storage primitive. A PoP has no durable disk you can write to. It has a CPU and a small RAM allowance per request. State lives in three other places: the platform's edge KV (eventually-consistent, ~1ms read locally, write replicates async), the platform's edge cache (per-PoP, evicted under memory pressure), or your origin region (one cross-ocean RTT away). Every variable in your function that survives the request is one of those three.
Connection primitive. Edge functions are short-lived. They cannot hold a long-running TCP connection, cannot subscribe to a Kafka topic, cannot maintain a websocket as the long-lived end. The "serverless at the edge" model is fundamentally request/response — you compute one response, return, and your isolate may not be the one that handles the next request from the same user.

Anuradha's "Up Next" panel maps cleanly onto all three: the panel is read-heavy (KV fits), the personalisation logic is a single-function transformation (isolate fits), and the response is one HTTP body (request/response fits). The panel is a poster child for edge. Her transaction-history page, by contrast, is none of those things — it is read-from-origin, requires a session, and is irrelevant to the 250 viewers in São Paulo. She left it on her Mumbai origin.

How the request actually gets there

The latency win does not come from "the function is closer". It comes from the entire chain of resolution — DNS → IP → TCP → TLS → HTTP — all completing inside the user's continent before any application code runs. Cutting any one link of that chain saves nothing; cutting all five saves the whole back-haul.

Each link of the chain has a budget. If any single link still requires a cross-ocean trip — most often DNS resolving to your origin region, or a function that calls back to origin DB on every request — the entire latency win evaporates. The architecture has to keep the whole chain in-continent.

The two failure modes for the chain are both common in the first edge migration:

DNS still points home. A surprising number of "edge" deployments leave the production DNS A-record pointing at a single origin IP. The CDN does its part; the function exists at the edge; but the user's resolver hands back the origin IP and the request goes straight to Mumbai. The fix is anycast or geo-DNS — the edge platform's nameservers usually do this for you, but only if you actually move the apex record to them.
Function fetches from origin on every request. The function exists at the edge, but its first line is await fetch('https://origin.example.com/api/recommendations'). Each request now pays the cross-ocean RTT it was supposed to avoid. The fix is to move the read either into edge KV (replicated, async-converging) or into the edge cache with a TTL (stale-while-revalidate). If your function can't avoid the origin call, it should not be at the edge.

What you can and cannot put at the edge

The single most useful framing is: the edge is good at transformation, decent at eventually-consistent reads, and terrible at transactional state. Map your service onto those three buckets.

The decision is per endpoint, not per service. CricStream runs personalisation and image transforms at the edge, but every payment write goes back to the Mumbai origin synchronously. KapitalKite runs token validation and rate limiting at the edge but never the order-matching logic.

The honest test for any endpoint: can you write down its behaviour as a function of (request, edge-KV-snapshot, origin-cache-snapshot) and have the answer be acceptable to product? If yes, it goes to the edge. If "acceptable" requires consulting the live origin database for every request, it stays at origin. Why: the moment a function calls origin synchronously on every request, it imports the cross-ocean RTT into its own latency floor. Edge functions that do this are slower than their origin counterparts — they pay the user-to-PoP RTT plus the PoP-to-origin RTT, where the origin handler would have paid only the user-to-origin RTT.

A worked example: edge-side rate limiting

Most production teams meet edge compute through a small, useful task: rate limiting. The naive picture is "count requests per IP and reject after N". The naive picture has a bug at the edge: each PoP sees only the slice of traffic that lands on it, so a single attacker hitting M PoPs with limit N each gets M × N requests through. The production fix is a token-bucket whose state lives in edge KV with an explicit acceptance of the small under-counting that comes from async replication.

# edge_rate_limiter.py — illustrative reference impl in Python.
# In production this runs as a JS/TS Worker; the algorithm is identical.
import time, threading

class EdgeKV:
    """Eventually-consistent edge KV. Reads are local (fast), writes
    replicate async with ~50–500ms global propagation."""
    def __init__(self):
        self._store = {}
        self._lock = threading.Lock()

    def get(self, key, default=None):
        with self._lock:
            return self._store.get(key, default)

    def put(self, key, value):
        with self._lock:
            self._store[key] = value  # in real KV this returns before global replication

class TokenBucketLimiter:
    def __init__(self, kv, capacity=100, refill_per_sec=10):
        self.kv = kv
        self.capacity = capacity
        self.refill = refill_per_sec

    def allow(self, ip):
        key = f"rl:{ip}"
        now = time.time()
        state = self.kv.get(key, {"tokens": self.capacity, "ts": now})
        # refill since last write
        elapsed = max(0.0, now - state["ts"])
        tokens = min(self.capacity, state["tokens"] + elapsed * self.refill)
        if tokens < 1.0:
            return False, 0
        tokens -= 1.0
        self.kv.put(key, {"tokens": tokens, "ts": now})
        return True, tokens

# --- demo at one PoP ---
kv = EdgeKV()
lim = TokenBucketLimiter(kv, capacity=5, refill_per_sec=1)
for i in range(8):
    ok, remaining = lim.allow("203.0.113.7")
    print(f"req {i+1}: allowed={ok}  remaining={remaining:.2f}")
    time.sleep(0.1)

Realistic output:

req 1: allowed=True  remaining=4.10
req 2: allowed=True  remaining=3.20
req 3: allowed=True  remaining=2.30
req 4: allowed=True  remaining=1.40
req 5: allowed=True  remaining=0.50
req 6: allowed=False remaining=0.00
req 7: allowed=False remaining=0.00
req 8: allowed=False remaining=0.00

Walkthrough, line by line: EdgeKV simulates the platform-provided KV — a read is always local to the PoP, a write returns before global replication. The limiter stores {tokens, last-write-timestamp} per IP. On each request: read state, refill the bucket by elapsed × refill_rate, deduct one token if any remain, write back. The realistic output shows the bucket draining over five requests, then rejecting. Why no global lock: a global lock across 312 PoPs would re-introduce the cross-ocean RTT this whole architecture is meant to eliminate. The async-KV approach accepts that two PoPs reading the same key 50ms apart may both see "5 tokens left" and both deduct — letting one extra request through. For abuse mitigation that under-counts by a few percent at fan-out boundaries, this is a vastly better trade than synchronous coordination. If you genuinely need exact global counting (financial limits, per-account quotas), you do not put rate limiting at the edge — you put a coarse limit at the edge and a precise limit at origin.

Common confusions

"Edge compute is just CDN with code" — it is not. A CDN caches origin responses; edge compute runs your code per request. The PoP is shared infrastructure but the execution model is a different machine — isolates instead of squid/varnish, no disk, strict CPU budgets per request, no long-lived connections.
"Cold start is the same problem as Lambda" — it is not. V8 isolates and Wasm modules boot in microseconds-to-milliseconds; Lambda containers boot in hundreds of milliseconds. The "no cold start tax" claim made by isolate-based platforms is mostly true — the tail is 5ms, not 500ms. Cropped-Lambda edge offerings have full cold-start pain.
"Edge KV is a global database" — it is an eventually-consistent replicated cache with global reads and async writes. It is not a database. There are no transactions, no foreign keys, no consistent secondary indexes, and reads from the last 500ms of writes may miss them at remote PoPs.
"My origin is no longer needed if I move to the edge" — origin is still where your durable state lives. Edge compute does not eliminate origin, it offloads the read path and request transformation. Writes still flow back to origin. A team that "deletes the origin" usually means they moved their writes into edge KV and are about to discover its consistency model the hard way.
"Wasm and isolates are interchangeable" — they are not. Isolates run JS/TS natively, can compile Wasm, share heap snapshots between requests. Wasm-only platforms (Compute@Edge) run any language that compiles to Wasm but have stricter memory and capability sandboxing. Choose by language, not by marketing.
"Latency is the only reason to go edge" — sovereignty (data-residency rules), DDoS absorption (millions of nodes vs one origin region), and regional cost arbitrage are real secondary motivations, and sometimes they are the primary motivation.

Going deeper

V8 isolates vs containers — the cost model that makes edge possible

Cloudflare's 2018 paper on Workers, and Deno's later isolate-cloud work, both rest on the same mechanism: a single V8 process hosts thousands of customer isolates, each with its own heap, sharing the same compiled JIT code. Per-tenant memory is ~3MB at idle vs ~50MB for a Linux process; per-request CPU overhead is ~50µs vs ~10ms; cold start is ~5ms vs ~200–1000ms. Multiplied across 300 PoPs and millions of tenants, the difference is between "uneconomic" and "free tier exists". The catch is the runtime: Workers runtime is V8 + a curated subset of WHATWG and Node-compat APIs, not full Node. Most npm packages with native binaries do not run; many packages with synchronous filesystem calls do not run; and the per-request CPU budget (50ms by default) prohibits heavy work. The architecture is "many tiny things in many places", not "fewer big things in many places".

Edge KV consistency — what "eventually" means in practice

Edge-KV products (Cloudflare KV, Vercel Edge Config, Deno KV, Upstash) all advertise eventual consistency with latency in the tens to hundreds of milliseconds. The numbers in production typically look like: ~50ms within a continent, ~250ms global, ~500ms tail, with occasional minute-scale lags during PoP rebalancing. The implication for application logic: if a user writes to one PoP and immediately reads from another (e.g. mobile network handoff), the read may miss the write. The fix is read-your-writes via a stickiness header (route the same user back to the write PoP for ~30s), or eventual semantics that the application UI tolerates ("the like was registered, the count updates in 2 seconds"). CRDT-based edge stores tighten this from "last-writer-wins" to "merge-converges", at the cost of metadata growth.

Cloudflare's Durable Objects — the escape hatch for stateful edge

The big admitted gap in pure edge-as-isolates is "what if I genuinely need a single coordination point per key". Cloudflare's Durable Objects answer: pin one object instance to one PoP per key, route all writes for that key to that PoP, give it a small persistent SQLite, and you have linearisable per-key state at edge latency for users near that PoP. The trade-off is that users far from the chosen PoP pay full cross-ocean RTT to reach their object's home — which is exactly the active-active sticky-routing trade-off, just at PoP granularity instead of region granularity. CricStream uses Durable Objects for live-match state (one object per match, pinned near the cricket stadium); KapitalKite would not, because order-matching needs to be in their certified colocated DC.

Operational nightmares — observability across 300 PoPs

The single most common surprise for teams shipping their first edge service is observability. A logline emitted from a PoP in Singapore does not magically arrive in your Mumbai log aggregator — and even if it does, the volume is now multiplied by number_of_PoPs, the timestamps come from different clocks (subject to NTP skew, see clocks and NTP), and PoP-correlated incidents (one bad PoP serving 5xx) can hide inside global p99 averages. Production teams typically ship structured logs to a vendor-side collector (Cloudflare Logpush, Vercel Drains), aggregate by PoP and by isolate-version, and build per-PoP dashboards before they consider an edge deployment "live". Without that, your incident response is permanently flying half-blind.

Where this leads next

Edge compute is the third leg of the geo-distribution stool, alongside active-active across regions (origin replication) and conflict-free geo-replication (the consistency model the edge KV layer rests on). A mature multi-region architecture in 2026 typically uses all three: edge compute for the read path and request transformation, regional active-active for stateful business logic, and a CRDT or sticky-routed convergence model underneath both.

The next chapter — observability in distributed systems is a data problem — picks up exactly where the operational-nightmare subsection leaves off: once you have hundreds of PoPs and dozens of regions, you do not have a logging problem, you have a streaming-data problem.

References

Kenton Varda, "How Workers works", Cloudflare blog (2018) — the original explanation of the V8-isolate architecture.
Mahmoud Hashemi, "Cloudflare Workers Durable Objects: Easy, Fast, Correct" (2020) — the canonical Durable Objects design rationale.
Tyler McMullen, "Compute@Edge: a Wasm-based edge platform", Fastly engineering (2020) — the Wasm-as-isolation argument.
Alex Snaps & Ryan Dahl, "Why we built Deno Deploy" (2021) — second-generation isolate cloud.
Lin Clark, "Standardising Wasm components and WASI", Bytecode Alliance (2023) — the runtime model edge platforms are converging on.
Cloudflare, Workers KV documentation — public consistency-and-latency contract.
AWS Lambda@Edge documentation and pricing — the cropped-container alternative, useful as the negative example for cold start.
Adya, Howell, Theimer, et al., "Farsite: federated, available, reliable storage for an incompletely trusted environment" (OSDI 2002) — antecedent of edge-storage thinking. See also: active-active across regions.