Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
Confidential computing and attestation
PaySetu's compliance team is reviewing a new feature: a fraud-detection model that needs to score every UPI transaction in under 30 ms. The model was trained on customer balance histories, location traces, and device fingerprints — material that the data-protection officer has flagged as "must never leave a controlled environment". The hosting plan was straightforward until somebody asked the awkward question: the cloud provider's hypervisor can read every byte of memory in our VM, so what stops a malicious provider engineer from copying the model and the input features as they fly through? The honest answer is "nothing, except a contract". Confidential computing is the architectural answer to that question — a hardware-enforced boundary inside the CPU that even the hypervisor cannot cross. Attestation is the cryptographic protocol that lets a remote verifier convince itself the boundary is real, the firmware is patched, and the code running inside is the code that was compiled.
Confidential computing uses CPU-level memory encryption (Intel SGX/TDX, AMD SEV-SNP, AWS Nitro Enclaves, Arm CCA) to keep a workload's RAM unreadable to the host OS and hypervisor. Attestation is the signed measurement chain — hardware → firmware → kernel → application — that a remote party verifies before sharing secrets. Without attestation, the enclave is just a sealed box; with attestation, you can prove to a customer in Mumbai that their key is being used by exactly the binary you built, on a CPU whose microcode is patched, on a host you do not own.
The threat model: who can read your VM's memory
Standard cloud isolation protects you from the other tenants on the same machine. The hypervisor enforces page-table isolation, the network stack enforces tenant tagging, the storage layer enforces volume ownership. What standard isolation does not protect against is the provider itself — the hypervisor, the host kernel, the firmware, and anyone with privileged access to the physical machine. A rogue datacentre operator with kernel access can dump your VM's memory; a cloud-provider security engineer responding to a (real or fabricated) law-enforcement request can do the same; a supply-chain compromise of the hypervisor binary can siphon every guest's RAM into an exfiltration channel. Why standard cryptography does not solve this: TLS protects data in transit, disk encryption protects data at rest, but data in use — the plaintext sitting in CPU registers and DRAM while your code is running — is unprotected. A workload doing fraud scoring on customer features must, at some point, have those features in cleartext in memory to compute on. Without confidential computing, that cleartext is visible to the hypervisor.
The result is a trust boundary that fits inside the CPU package. The hypervisor still schedules the VM, allocates pages, and routes interrupts — it is still the operating system of the cloud — but every cache line that leaves the CPU and lands in DRAM is encrypted with a key only the memory controller knows. A hypervisor-level memory dump captures ciphertext. A cold-boot attack (snapshot DRAM after pulling the power) captures ciphertext. A bus-tapping attack on the DDR pins captures ciphertext. The plaintext exists only inside the CPU's caches and registers, where it is protected by the same physical packaging that protects the rest of the silicon.
Three families of products implement this in 2026. Intel TDX (Trust Domain Extensions) provides VM-level confidentiality on recent Xeon parts; its predecessor, SGX, was process-level (an "enclave" of a few hundred MB inside a regular process) and has been deprecated for new server workloads. AMD SEV-SNP (Secure Encrypted Virtualization with Secure Nested Paging) provides VM-level confidentiality on EPYC parts, with hardware-enforced page-table integrity to defeat the well-known "VM-replay" and "memory-remapping" attacks against earlier SEV. AWS Nitro Enclaves carve a vCPU-and-memory partition out of an EC2 instance, isolated even from the parent instance's root user — a different point in the design space, since the trust boundary is the Nitro hypervisor itself rather than CPU memory encryption. Arm CCA (Confidential Compute Architecture) is the equivalent on Armv9 server parts. The mechanisms differ in detail, but the API the application sees is the same: launch a "confidential VM" or "enclave", get a measurement of what was launched, and use that measurement to convince a remote party to trust this instance.
The attestation handshake — proving the box is real
Memory encryption alone is not enough. A confidential VM that you trust because the cloud provider says so is no improvement on a regular VM you trust because the cloud provider says so. The actual security property comes from remote attestation: the workload generates a cryptographic statement signed by the CPU hardware that says "I am running on a real Intel TDX / AMD SEV-SNP / AWS Nitro chip with patched firmware, and the launch measurement of the code running inside is <hash>". A remote verifier — typically a key-management service — checks the signature against the manufacturer's root certificate, verifies the firmware is on the patched-CVE list, compares the launch measurement against a known-good value, and only then releases secrets to the workload.
The measurement chain is the load-bearing part. Why a chain and not a single hash: at boot, the platform firmware (e.g. OVMF, the open-source UEFI used by KVM) measures the kernel image and extends a hash register; the kernel measures init and extends; init measures the application binary and extends; the application can extend with any application-level measurement (e.g. a hash of its config file, or a hash of a TLS leaf certificate it just generated). Each extension folds the new hash into a running register so that the final value pins down every executable byte from CPU reset to "ready to serve". Tampering with any layer changes the final hash, and the verifier rejects the attestation.
The handshake's load-bearing security property is that the wrapped key in step 10 is encrypted to a public key that only exists inside the attested enclave. The hypervisor sees the wrapped key go past on the wire but cannot decrypt it. Even if the hypervisor takes a memory snapshot of the confidential VM right after step 10, it captures ciphertext (the enclave's RAM is encrypted) — the unwrapped plaintext key only exists transiently inside the CPU caches and registers. The trust chain is: manufacturer's root certificate (baked into firmware) → manufacturer-signed CPU attestation key → quote signature → measurements → application identity → released secret. Break any link and the verifier rejects.
A working attestation flow in Python
Here is a realistic verifier — the kind a payments KMS would run when a confidential workload asks for a key. The CPU-quote signing is mocked (real implementations call into Intel's DCAP library or AMD's SEV-SNP guest device), but the verification logic, the PCR comparison, and the wrapping protocol are exactly what production code does.
# attestation_verifier.py — KMS-side verification of a confidential VM quote.
# pip install cryptography
import hashlib, json, secrets, time
from cryptography.hazmat.primitives.asymmetric import ec, padding, rsa
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
from cryptography.exceptions import InvalidSignature
# Manufacturer (Intel/AMD) root key, baked into KMS trust store.
mfr_priv = ec.generate_private_key(ec.SECP384R1())
mfr_pub = mfr_priv.public_key()
# Per-CPU attestation key, signed by manufacturer at fab time.
cpu_priv = ec.generate_private_key(ec.SECP384R1())
cpu_pub_bytes = cpu_priv.public_key().public_bytes(
serialization.Encoding.X962, serialization.PublicFormat.UncompressedPoint)
mfr_endorsement = mfr_priv.sign(cpu_pub_bytes, ec.ECDSA(hashes.SHA384()))
# Known-good launch measurement of the binary the customer compiled.
KNOWN_GOOD_PCR = hashlib.sha384(b"paysetu-fraud-scorer-v1.4.2-tdx").digest()
PATCHED_FIRMWARE = {"tdx-module-v1.5.4", "ovmf-2026.04"} # current CVE-clean set
def cpu_quote(pcrs: dict, nonce: bytes, ephemeral_pub: bytes) -> dict:
"""Inside the enclave: ask the CPU to sign a measurement bundle."""
body = json.dumps({"pcrs": {k: v.hex() for k, v in pcrs.items()},
"nonce": nonce.hex(),
"ephemeral_pub": ephemeral_pub.hex(),
"fw": "tdx-module-v1.5.4",
"ts": int(time.time())}, sort_keys=True).encode()
sig = cpu_priv.sign(body, ec.ECDSA(hashes.SHA384()))
return {"body": body, "sig": sig, "cpu_pub": cpu_pub_bytes,
"endorsement": mfr_endorsement}
def verify(quote: dict, expected_nonce: bytes) -> dict:
"""KMS-side verification. Returns {ok, reason, ephemeral_pub} ."""
# 1. Manufacturer endorsed this CPU's attestation key.
try:
mfr_pub.verify(quote["endorsement"], quote["cpu_pub"],
ec.ECDSA(hashes.SHA384()))
except InvalidSignature:
return {"ok": False, "reason": "endorsement_invalid"}
# 2. CPU signed the quote body.
cpu_pub = ec.EllipticCurvePublicKey.from_encoded_point(
ec.SECP384R1(), quote["cpu_pub"])
try:
cpu_pub.verify(quote["sig"], quote["body"], ec.ECDSA(hashes.SHA384()))
except InvalidSignature:
return {"ok": False, "reason": "quote_signature_invalid"}
body = json.loads(quote["body"])
# 3. Nonce freshness — defeats replay.
if bytes.fromhex(body["nonce"]) != expected_nonce:
return {"ok": False, "reason": "nonce_mismatch"}
# 4. Firmware on the patched-CVE list.
if body["fw"] not in PATCHED_FIRMWARE:
return {"ok": False, "reason": f"firmware_not_patched:{body['fw']}"}
# 5. Launch measurement matches the known-good binary hash.
if bytes.fromhex(body["pcrs"]["app"]) != KNOWN_GOOD_PCR:
return {"ok": False, "reason": "pcr_mismatch"}
return {"ok": True, "reason": "verified",
"ephemeral_pub": bytes.fromhex(body["ephemeral_pub"])}
# --- demo ---
nonce = secrets.token_bytes(32)
pcrs = {"app": KNOWN_GOOD_PCR}
ephemeral = ec.generate_private_key(ec.SECP384R1())
ephemeral_pub = ephemeral.public_key().public_bytes(
serialization.Encoding.X962, serialization.PublicFormat.UncompressedPoint)
q = cpu_quote(pcrs, nonce, ephemeral_pub)
print("verify (good binary):", verify(q, nonce))
# Tamper: pretend the operator swapped in a different binary.
bad_pcrs = {"app": hashlib.sha384(b"paysetu-fraud-scorer-v1.4.2-EVIL").digest()}
q_bad = cpu_quote(bad_pcrs, nonce, ephemeral_pub)
print("verify (tampered binary):", verify(q_bad, nonce))
# Tamper: replay an old quote with a stale nonce.
print("verify (replay):", verify(q, secrets.token_bytes(32)))
Sample run:
verify (good binary): {'ok': True, 'reason': 'verified', 'ephemeral_pub': b'\x04...'}
verify (tampered binary): {'ok': False, 'reason': 'pcr_mismatch'}
verify (replay): {'ok': False, 'reason': 'nonce_mismatch'}
Walkthrough of the load-bearing lines:
mfr_priv.sign(cpu_pub_bytes, ...)— the per-CPU attestation key is endorsed by the manufacturer at fab time. The KMS only needs to trust the manufacturer's root; the CPU certificate cascades from there. Why a per-CPU key and not just the manufacturer root: a leaked manufacturer root would compromise every CPU ever made; a leaked per-CPU key compromises one CPU. The endorsement-by-root pattern is the same as TLS's intermediate-CA model — bound damage radius.if bytes.fromhex(body["nonce"]) != expected_nonce— the nonce check defeats replay. Without it, an attacker could capture a valid attestation, run a different binary, and present last week's quote. The KMS issues a fresh nonce per request and the enclave bakes it into the quote, so a quote is only valid for one handshake.if body["fw"] not in PATCHED_FIRMWARE— firmware hygiene matters. CVEs against TDX module / SEV-SNP firmware are published quarterly; the KMS rejects quotes from unpatched hosts so a known side-channel attack cannot be used against the workload.if bytes.fromhex(body["pcrs"]["app"]) != KNOWN_GOOD_PCR— the application identity check. This pins the released key to the exact binary the customer compiled and registered. A different version, a debug build, an attacker-substituted binary — all produce a different PCR and the key stays sealed.
The whole flow is around 60 lines of Python. Real production integrations are a few hundred lines once you wire in DCAP, certificate caching, revocation lists, and policy evaluation — but the conceptual surface is small. The hard part is operational, not cryptographic.
What breaks in practice — operational reality of confidential workloads
The cryptography works. The day-to-day operations are where confidential computing teams trip up, and the failure modes are all rooted in the trust-boundary shrinking by an order of magnitude.
The KMS becomes the central nervous system, and its availability dominates. Every confidential workload boot needs a KMS round-trip to unlock its secrets. CricStream, planning a confidential payment-tokenisation service for in-app purchases during a cricket-final stream, would discover that a 5-second KMS outage during the post-toss spike means 5 seconds of payment failures across the entire fleet, because no new pod can finish booting without the attestation handshake. The mitigation is a regional KMS with ≥3-replica consensus and a local cache of unwrapped keys with TTLs measured in minutes — but every minute of cached plaintext is a minute the workload holds the key, which means a workload compromise extends the blast radius. The trade-off between KMS availability and key-cache TTL is the central operational lever.
Patch management gets a new failure mode. When Intel publishes a CVE against the TDX module, the KMS verifier list of accepted firmware shrinks. Any host still running the vulnerable firmware suddenly cannot get keys for any workload, even ones with no relationship to the CVE. The cloud provider has to roll out the firmware patch in lockstep with the KMS verifier update, or the attestation handshake starts failing en masse. This is a coordination problem the cloud team is not used to: the security team's CVE response timeline now drives the platform team's patch deployment SLO. KapitalKite, evaluating confidential computing for trader-position handling, would need to allocate an on-call rotation for KMS attestation policy, separate from the existing cloud on-call.
Debugging gets harder by design. A workload running on a confidential VM cannot be gdb-attached from the host; the host cannot read the workload's memory, so neither can a debugger running on the host. strace, tcpdump on the loopback interface, perf record with kernel-mode samples — all of these stop being useful. What replaces them is structured in-enclave logging, exported through carefully audited channels (often a dedicated stream that the workload encrypts before emitting). Teams new to confidential computing often try to keep their existing observability stack and discover a quarter into the project that half of it doesn't work; the rebuild is non-trivial and is one of the main reasons confidential-computing projects slip.
The other lurking class of issues is side-channel attacks. Confidential computing protects against an attacker who reads memory, but it does not by itself protect against attackers who measure timing, cache pressure, branch prediction, or memory-controller bandwidth. The literature on SGX side channels — Foreshadow, ZombieLoad, MDS, the long Spectre/Meltdown family — is extensive, and TDX/SEV-SNP have inherited some of the same surface. Production deployments routinely require microcode updates, disabled hyperthreading, and constant-time cryptographic libraries inside the enclave to mitigate. The security model is "encrypted memory plus careful coding"; "encrypted memory alone" is not enough for adversaries with physical or co-tenant access.
Common confusions
-
"Confidential computing is the same as full-disk encryption" — it is not. Full-disk encryption protects data at rest; TLS protects data in transit; confidential computing protects data in use — the plaintext sitting in CPU registers and DRAM while the workload is computing on it. Without confidential computing, "data in use" is the gap in the defence-in-depth story, and that gap is exactly where a malicious hypervisor or a kernel-level attacker reads.
-
"Attestation just means signing the binary" — code-signing proves the binary came from the publisher; attestation proves the binary is currently running on a trusted CPU with patched firmware and unmodified kernel, and binds an ephemeral public key to that runtime identity. Code-signing is a deploy-time artefact; attestation is a runtime cryptographic statement. They solve different problems.
-
"Nitro Enclaves and SGX are the same thing" — Nitro Enclaves rely on the Nitro hypervisor for isolation; the trust boundary is "AWS controls the hypervisor and you trust AWS". SGX/TDX/SEV-SNP rely on CPU-level memory encryption; the trust boundary is "Intel/AMD made the CPU and the cloud provider cannot read enclave memory". Both have legitimate use cases, but the threat model is different — Nitro defends against a compromised parent EC2 instance, TDX/SEV-SNP defend against a compromised hypervisor.
-
"If I use a confidential VM I no longer need to trust the cloud provider" — you trust them less, but not zero. You still trust them to schedule your VM, to not partition it off the network, to not mount denial-of-service attacks via memory-bandwidth starvation, and to honour the attestation infrastructure (revocation lists, firmware updates). What changes is that confidentiality of your data no longer depends on provider trust; availability still does.
-
"Side channels are a theoretical concern" — they are not. Foreshadow extracted SGX sealing keys; SGAxe extracted attestation keys; LVI bypassed the SGX boundary entirely on some hardware. Real-world deployments that handle high-value secrets — payments, military, certificate authorities — invest heavily in side-channel hardening (constant-time crypto, disabled SMT, microcode pinning). Treating confidential computing as a magic encryption box ignores the layer where the actual research adversaries operate.
Going deeper
The PCR extension protocol and why it must be one-way
Each PCR (Platform Configuration Register) is a fixed-size hash register that supports one operation: PCR_new = H(PCR_old || measurement). This is intentionally one-way and append-only — there is no API to reset or reverse a PCR mid-boot. Why one-way matters: if an attacker who runs after the kernel could modify earlier PCR values, they could pretend the kernel is the one the verifier expects while actually running a different kernel. The append-only chain means the only way to produce a given final PCR is to have measured the exact chain of inputs in the exact order; any substitution at any step changes the final hash. This is the same Merkle-style integrity construction that Git uses for commit chains. In TPMs and CPU attestation modules, the PCR machinery is in silicon — there is no software path to forge it.
Sealing — the disk-bound counterpart to attestation
Attestation proves liveness ("right now, this enclave is running this binary"). Sealing is the persistent counterpart: the enclave can ask the CPU to encrypt a secret with a key derived from (enclave identity, CPU master key), and the resulting ciphertext can only be decrypted by an enclave with the same identity on the same CPU. Sealing is how a confidential database stores its encryption key on a regular disk — the disk only holds sealed ciphertext, and only the correctly-attested binary on the correct CPU can unseal it. The trade-off is mobility: a sealed blob is bound to one CPU, so migrating the workload to a new host requires a re-seal, which requires unsealing on the original host first. Cloud-scale deployments either avoid sealing in favour of central KMS-released keys, or accept the migration friction.
Attestation policies and how PaySetu would actually deploy this
A production attestation policy is not "this binary hash". It is a policy expression evaluated against the quote: (firmware in patched_set) AND (binary in approved_set) AND (geographic_region in allowed_regions) AND (NOT debug_mode) AND (cpu_microcode >= min_version). PaySetu's compliance team would author such policies in a policy language (Open Policy Agent / Rego is common) and deploy them to the KMS as code — versioned, code-reviewed, with rollback. The KMS then evaluates each attestation against the active policy, not a static hash, which lets the team roll out new binary versions without manually editing trust stores. The policy file becomes the real security boundary; a permissive policy with strong cryptography is no better than weak cryptography.
When confidential computing is the wrong answer
If the threat model does not include the cloud provider — e.g. a startup running a public-facing API where the only adversaries are external attackers and other tenants — confidential computing adds operational complexity for no security gain. If the workload is dominated by data-in-transit and data-at-rest concerns, well-deployed TLS plus disk encryption covers the threat. The cases where confidential computing genuinely pays for itself are: regulated industries where the customer contractually requires the cloud provider to be outside the trust boundary (banking, healthcare, government), workloads handling cryptographic root material (HSMs in software, CA signing keys, attestation root keys themselves), and multi-party computation across organisations that mutually distrust each other and the provider. For everything else, the operational tax outweighs the security benefit.
Reproduce this on your laptop
# Reproduce the attestation flow above (cryptographic logic; CPU quote is mocked).
python3 -m venv .venv && source .venv/bin/activate
pip install cryptography
python3 attestation_verifier.py
# Expected: 'verified', 'pcr_mismatch', 'nonce_mismatch'
For real hardware exploration on a recent Linux box, dmesg | grep -i 'tdx\|sev' reveals whether the CPU advertises confidential-computing extensions; on AMD EPYC parts, /sys/module/kvm_amd/parameters/sev_snp shows whether SEV-SNP is enabled in the kernel.
Where this leads next
Confidential computing is the architectural answer to "the cloud provider is in your trust boundary"; the next chapter on decentralised systems takes the same question further by asking what happens when no single party — provider, customer, or operator — is allowed inside the boundary at all. The two chapters share a thread: trust is no longer the social contract between organisations, it is a cryptographic property the system can prove.
The thread also connects backwards: the Netflix resilience culture chapter showed that resilience is a property you must verify by breaking, and the attestation handshake here shows that confidentiality is a property you must verify with cryptography. In both, "trust the platform" is replaced by "prove the platform" — a shift that recurs across every Part 20 chapter.
A reader interested in the cryptographic foundations should branch out to the observability is a data problem wall, since attestation logs are themselves a structured-event stream that needs the same retention, retrieval, and correlation infrastructure as any other distributed-systems telemetry.
References
- Costan and Devadas, "Intel SGX Explained" (IACR ePrint 2016/086) — the canonical academic exposition of SGX, including the attestation protocol.
- AMD, "AMD SEV-SNP: Strengthening VM Isolation with Integrity Protection and More" (white paper, 2020) — the SEV-SNP architecture and threat model.
- Intel, "Intel Trust Domain Extensions (Intel TDX) Module Specification" (2023) — the TDX module interface and PCR semantics.
- AWS, "AWS Nitro Enclaves: Isolated Compute Environments" (re:Invent 2020 talk + docs) — the Nitro design point.
- Van Bulck et al., "Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution" (USENIX Security 2018) — the side-channel attack that reshaped the SGX threat model.
- Confidential Computing Consortium, "Confidential Computing: Hardware-Based Trusted Execution for Applications and Data" (white paper, 2021) — vendor-neutral terminology and use cases.
- Microsoft, "Azure Confidential Computing — Guest Attestation" (docs) — a real KMS-side attestation policy example.
- See also: Netflix resilience culture, decentralized systems (not just crypto), observability in distributed systems is a data problem.