In short

Four major SDKs dominate quantum programming in 2026. Qiskit (IBM, open source, Python) is the most widely used and has the tightest integration with IBM's cloud hardware — if you want to run on a Heron chip, Qiskit is the default. Cirq (Google, open source, Python) is engineered around Google's Sycamore and Willow devices and around noisy-simulation workflows. PennyLane (Xanadu, open source, Python) is built for variational quantum computing and quantum machine learning — its killer feature is differentiable quantum circuits that play directly with PyTorch, TensorFlow, and JAX. CUDA-Q (NVIDIA, open source, C++ and Python) is the newest of the four: a heterogeneous GPU-plus-QPU SDK that treats quantum devices as accelerators inside a classical program, with state-of-the-art GPU simulators for circuits up to ~40 qubits. Pick by your hardware target (IBM → Qiskit, Google → Cirq, Xanadu photonic → PennyLane), by your workload (VQE/QML → PennyLane or CUDA-Q, general quantum research → Qiskit), or by your compute environment (dense GPU cluster → CUDA-Q). All four speak the same circuit-model language underneath — once you know one, learning a second takes days, not months.

You have read the textbook. You have watched the videos. You have worked through exercises with pencil and paper. And now you want to actually run a circuit — on hardware, or on a simulator that behaves like hardware. You open Python and type import qiskit and something actually starts running. This is the moment quantum computing stops being abstract.

The question is: which SDK. There is not one. There are four serious ones, plus a handful of niche contenders, and the choice is not obvious because each one solves a different problem well. Qiskit has the best hardware story. Cirq has the cleanest circuit-construction API. PennyLane has automatic differentiation. CUDA-Q has GPU-accelerated simulators and a heterogeneous compute model. If you try to learn all four in your first month you will learn none; if you try to learn the wrong one for your project you will waste three months discovering its limits.

This chapter maps the four SDKs onto the decisions you actually have to make. You will see what each SDK looks like in code (a three-qubit GHZ state in all four, side by side), which hardware each one targets, how installation works on a typical Indian laptop or college cluster, and which problems suit which tool. By the end you will know which one to pip install first.

The four SDKs — shape and purpose

Before the comparison, a sketch of each.

Qiskit (originally "Quantum Information Science Kit"). Started by IBM Research in 2017, open-sourced under the Apache 2.0 licence, Python-first. Version 2.0 was released in mid-2024 and is the stable current release. Qiskit has a larger contributor base than any of the others — over 600 contributors, a full-time IBM team, and a tight coupling to the IBM Quantum cloud. It is the most-taught SDK in university courses and the most cited in arXiv papers.

Cirq. Google started Cirq in 2018 as a Python framework designed around the constraints of their Sycamore and (now) Willow chips: hardware-efficient gates, explicit qubit connectivity, and native support for parameterised circuits. Open source under Apache 2.0. Smaller contributor community than Qiskit, but the tightest integration with Google's hardware and the cleanest circuit API.

PennyLane. Xanadu, a Canadian photonic-quantum-computing startup with significant Indian research ties, released PennyLane in 2018 as the first SDK designed for differentiable quantum programming. Its unique feature: any quantum circuit can be differentiated with respect to its parameters using the same automatic-differentiation machinery that powers neural networks. PennyLane integrates directly with PyTorch, TensorFlow, and JAX, so you can put a variational quantum circuit inside a classical neural network and train the whole thing end to end.

CUDA-Q. NVIDIA released CUDA-Q (originally "CUDA Quantum") in 2023. It is a heterogeneous-computing framework, available in both C++ and Python, that treats a quantum processor as one kind of accelerator alongside a GPU. CUDA-Q's distinguishing feature is state-of-the-art GPU-accelerated simulators — using tensor-network and state-vector methods that scale to ~40 qubits on a single A100 and more on multi-GPU clusters — together with a compiler stack that emits hardware instructions for multiple QPU vendors.

The four SDKs and their primary hardware and workflow targetsA two by two grid of SDK boxes. Top left is Qiskit by IBM, targeting IBM Heron cloud hardware with a general-purpose Python workflow and primitive-based execution. Top right is Cirq by Google, targeting Sycamore and Willow with a clean circuit construction API and noisy simulation. Bottom left is PennyLane by Xanadu, targeting multiple backends including photonic, with differentiable circuits and machine learning integration. Bottom right is CUDA Q by NVIDIA, targeting GPU simulators and cross vendor QPUs with heterogeneous compute. A central legend indicates the orthogonal axes: hardware focus versus workflow focus. The four SDKs — hardware target and workflow focus Qiskit · IBM most widely used · Apache 2.0 Hardware: IBM Heron, cloud Workflow: primitives (Sampler, Estimator) general-purpose · largest ecosystem Python · v2.0 (2024) Cirq · Google cleanest circuit API · Apache 2.0 Hardware: Sycamore, Willow (2024) Workflow: explicit qubit topology noisy simulation · tightly specified Python · v1.3 PennyLane · Xanadu differentiable QC · Apache 2.0 Hardware: Xanadu photonic + plugins Workflow: autodiff · VQE · QML PyTorch, TensorFlow, JAX integration Python · v0.35 CUDA-Q · NVIDIA GPU + QPU heterogeneous · Apache 2.0 Hardware: multi-vendor QPU + GPU sim Workflow: kernel model, C++/Python state-of-art simulators to ~40 qubits v0.7 (2024) Pick by hardware target and workload — all four share the same underlying circuit model.
The four SDKs differ in hardware target (IBM, Google, Xanadu plus plugins, multi-vendor plus GPU), in workflow (primitive-based execution, explicit circuit construction, differentiable circuits, heterogeneous kernels), and in ecosystem size (Qiskit largest, CUDA-Q newest and growing fastest). Underneath, they all express the same mathematical circuit model — porting between them is a notational exercise.

Why this 2×2 framing is useful: the hardware axis (IBM, Google, photonic, multi-vendor/GPU) tells you which QPU your circuits will end up running on. The workflow axis (general-purpose, topology-aware, autodiff, heterogeneous kernels) tells you which programming style the SDK optimises for. Picking the wrong cell costs months.

Qiskit — the default choice

Qiskit is the largest quantum SDK by installed base, by contributor count, and by university-course adoption. Its version 2.0 (2024) simplified the API around two "primitives" — the Sampler and the Estimator — which together cover almost every quantum-computing workflow at a higher level than writing raw circuit execution code.

What Qiskit gets right

What Qiskit gets wrong (or less right)

Installing Qiskit

pip install qiskit qiskit-ibm-runtime

For chemistry: pip install qiskit-nature pyscf. For ML: pip install qiskit-machine-learning. The core package is ~40 MB; the full ecosystem under 200 MB. Works on Windows, macOS, and Linux. Works on any recent Python (3.9+). No GPU required.

On a typical Indian college machine (4–8 GB RAM, Python 3.10, no GPU), Qiskit runs fine. Simulator circuits up to ~25 qubits are tractable; beyond that you start swapping to disk.

Cirq — Google's native SDK

Cirq was designed by Google's Quantum AI team to express circuits the way Google's hardware actually executes them: explicit qubits placed on a specific chip topology, native gate sets, exposed hardware constraints. The aesthetic is closer to "electrical engineering" than to "mathematical circuits."

What Cirq gets right

What Cirq gets wrong (or less right)

Installing Cirq

pip install cirq

~30 MB. Same cross-platform, same Python version requirements as Qiskit. No GPU requirement.

PennyLane — the variational-first SDK

Xanadu's PennyLane took a different design choice from Qiskit and Cirq: rather than optimising for hardware execution, PennyLane optimised for differentiable quantum programming. A PennyLane quantum function behaves like a black box that maps parameters (usually a NumPy array, PyTorch tensor, TensorFlow variable, or JAX array) to an expected value, and the gradient of that value with respect to the parameters is computed automatically — using the parameter-shift rule for real hardware and automatic differentiation for simulators.

This makes PennyLane the right tool whenever your workflow contains a classical optimisation loop over a quantum circuit: VQE, QAOA, variational quantum classifiers, quantum neural networks, QGANs, barren-plateau studies, hybrid ML models. If that describes your project, start here.

What PennyLane gets right

What PennyLane gets wrong (or less right)

Installing PennyLane

pip install pennylane

For PyTorch integration: pip install pennylane torch. For GPU simulation: pip install pennylane-lightning-gpu (requires CUDA). For IBM hardware: pip install pennylane-qiskit.

CUDA-Q — NVIDIA's heterogeneous SDK

CUDA-Q is the newest of the four major SDKs, released in 2023 and still evolving rapidly. Its core premise is that quantum computers will not replace classical computers — they will sit inside classical workflows as accelerators, the way GPUs do today. A CUDA-Q program runs on a CPU, calls into GPU kernels and QPU kernels transparently, and orchestrates the hybrid computation.

CUDA-Q has two characteristic strengths: GPU-accelerated simulators that push single-node simulation to ~40 qubits of dense state vector or several hundred qubits of tensor-network states, and multi-QPU vendor support through its compiler back-ends — the same kernel can be compiled for IonQ hardware, Quantinuum hardware, IBM hardware (experimentally), or simulator.

What CUDA-Q gets right

What CUDA-Q gets wrong (or less right)

Installing CUDA-Q

pip install cudaq

Works on Linux first, with good Mac support and experimental Windows. GPU simulators require an NVIDIA GPU with CUDA 11.8+. On a GPU-less machine CUDA-Q still runs (CPU simulators are available), but you are not using the SDK for its real purpose.

On a typical Indian college GPU cluster (often NVIDIA Tesla T4 or RTX 3080-era cards), CUDA-Q simulators handle 25–30 qubits comfortably. The GPU-QPU Stack chapter has a deeper account of the hardware story.

Example: three-qubit GHZ state in all four SDKs

A GHZ state — Greenberger, Horne, Zeilinger — is the canonical three-qubit entangled state:

|\text{GHZ}\rangle = \frac{1}{\sqrt{2}}(|000\rangle + |111\rangle)

Measuring all three qubits in the computational basis gives 000 half the time and 111 half the time, with essentially zero 001, 010, 011, 100, 101, or 110. It is produced by a Hadamard on qubit 0 followed by two CNOTs — a single-line circuit that exposes each SDK's flavour.

Example 1: GHZ state written in all four SDKs

Let's write the same circuit four ways.

Qiskit.

from qiskit import QuantumCircuit
from qiskit.primitives import StatevectorSampler

qc = QuantumCircuit(3, 3)
qc.h(0)
qc.cx(0, 1)
qc.cx(1, 2)
qc.measure([0, 1, 2], [0, 1, 2])

sampler = StatevectorSampler()
result = sampler.run([qc], shots=1000).result()
counts = result[0].data.c.get_counts()
print(counts)  # ~{'000': 500, '111': 500}

Why this reads naturally: Qiskit mirrors the textbook — you build a circuit step by step, measure, then execute via a primitive (here a statevector sampler for simulation). The API is verbose but each piece is explicit.

Cirq.

import cirq

q0, q1, q2 = cirq.LineQubit.range(3)
circuit = cirq.Circuit(
    cirq.H(q0),
    cirq.CNOT(q0, q1),
    cirq.CNOT(q1, q2),
    cirq.measure(q0, q1, q2, key='result'),
)

sim = cirq.Simulator()
result = sim.run(circuit, repetitions=1000)
print(result.histogram(key='result'))  # ~{0: 500, 7: 500}

Why Cirq looks different: Cirq puts qubits first — you declare the qubits explicitly, then build the circuit as a sequence of operations on them. The histogram returns integers (0 = 000, 7 = 111) rather than bit strings, reflecting Cirq's engineering-leaning conventions.

PennyLane.

import pennylane as qml
import numpy as np

dev = qml.device('default.qubit', wires=3, shots=1000)

@qml.qnode(dev)
def ghz():
    qml.Hadamard(wires=0)
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    return qml.counts(wires=[0, 1, 2])

print(ghz())  # ~{'000': 500, '111': 500}

Why PennyLane uses decorators: a QNode is a quantum function, tagged with @qml.qnode(device). This decorator is what allows PennyLane to differentiate the function later; for a non-parameterised circuit like GHZ, the decorator is pure overhead, but it is the same pattern you will use for VQE where differentiation becomes essential.

CUDA-Q.

import cudaq

@cudaq.kernel
def ghz():
    q = cudaq.qvector(3)
    h(q[0])
    x.ctrl(q[0], q[1])
    x.ctrl(q[1], q[2])
    mz(q)

result = cudaq.sample(ghz, shots_count=1000)
print(result)  # ~{'000': 500, '111': 500}

Why CUDA-Q's syntax is terse: @cudaq.kernel compiles the function to a quantum-program representation. cudaq.qvector(3) allocates three qubits; h, x.ctrl, mz are the native gate primitives. The kernel model is designed to compile cleanly for multiple QPU backends without Python-level overhead.

Result. All four programs produce the same distribution: roughly 500 counts for 000, 500 for 111, near-zero elsewhere. The differences are stylistic — Qiskit is textbook-like, Cirq is qubit-first, PennyLane wraps a decorator for differentiability, CUDA-Q uses a kernel pattern for multi-backend compilation.

Measured counts for a 1000-shot GHZ state — identical across all four SDKs on a simulatorA histogram of measurement counts for the eight possible outcomes of a three-qubit measurement: 000, 001, 010, 011, 100, 101, 110, 111. The 000 and 111 bars are tall, around 500 counts each. All six intermediate bars are essentially zero. An annotation notes that on a real noisy device the intermediate bars would appear at the few-percent level. GHZ state — 1000 shots, simulator (same in all four SDKs) 500 250 0 |000⟩ 502 |001⟩ |010⟩ |011⟩ |100⟩ |101⟩ |110⟩ |111⟩ 498 On real hardware the middle six bars rise to roughly 1–5% of total each — the noise signature.
On a simulator, all four SDKs produce identical histograms — roughly 500 counts on |000⟩, 500 on |111⟩, zero on the six intermediate states. On a real quantum computer the intermediate bars rise to a few percent each, and that rise is the hardware-fidelity measurement. The mathematics of the GHZ state is the same in all four SDKs; the difference is only in how you type it.

Interpretation. The four SDKs are dialects of the same language. A Bell state, a GHZ state, a Deutsch–Jozsa oracle, a Grover iteration — all appear in roughly the same shape across all four, with minor syntactic differences. Learning your second SDK takes a weekend, not a month, once you know your first well.

Example: VQE for H2 in PennyLane

The second worked example showcases PennyLane's killer feature — automatic differentiation through a quantum circuit — by running a small variational quantum eigensolver (VQE) for the ground-state energy of the hydrogen molecule. Compare with the Qiskit VQE implementation in the IBM Quantum Learning chapter.

Example 2: VQE for H2 ground-state energy in PennyLane

Step 1. The Hamiltonian. PennyLane's built-in chemistry module produces the H2 Hamiltonian directly:

import pennylane as qml
from pennylane import numpy as np

symbols = ["H", "H"]
coordinates = np.array([0.0, 0.0, -0.6614, 0.0, 0.0, 0.6614])
H, qubits = qml.qchem.molecular_hamiltonian(symbols, coordinates)
print(f"Number of qubits: {qubits}")  # 4

Why this one-liner replaces 30 lines of Qiskit boilerplate: PennyLane's chemistry module wraps PySCF and handles the Jordan–Wigner mapping internally. The output Hamiltonian is a qml.Hamiltonian object that PennyLane knows how to measure on any backend.

Step 2. The ansatz. Use a single-parameter ansatz — Hartree-Fock reference plus one double excitation:

dev = qml.device("default.qubit", wires=qubits)

@qml.qnode(dev)
def circuit(theta):
    qml.BasisState(np.array([1, 1, 0, 0]), wires=[0, 1, 2, 3])
    qml.DoubleExcitation(theta, wires=[0, 1, 2, 3])
    return qml.expval(H)

Why this captures H2 correlation: BasisState([1,1,0,0]) prepares the Hartree-Fock reference (the first two spin-orbitals occupied). DoubleExcitation(theta) mixes in the doubly excited configuration with amplitude governed by θ. For H2 in the minimal STO-3G basis, this one parameter captures essentially all the electron correlation.

Step 3. Optimise — with a PennyLane autodiff optimiser.

from pennylane import GradientDescentOptimizer

theta = np.array(0.0, requires_grad=True)
opt = GradientDescentOptimizer(stepsize=0.4)

for i in range(30):
    theta, energy = opt.step_and_cost(circuit, theta)
    if i % 5 == 0:
        print(f"Step {i}: E = {energy:.6f} Ha, θ = {theta:.4f}")

Why this is the payoff of PennyLane: opt.step_and_cost(circuit, theta) automatically computes the gradient of the circuit with respect to θ using the parameter-shift rule, then takes a step downhill. You did not write a single line of gradient code. In Qiskit you would call scipy.optimize.minimize with a finite-difference gradient, which is slower and less accurate; in PennyLane, gradients are built in.

Step 4. Result.

Step 0:  E = -1.116735 Ha, θ = 0.1152
Step 5:  E = -1.136176 Ha, θ = 0.2068
Step 10: E = -1.137271 Ha, θ = 0.2277
Step 15: E = -1.137283 Ha, θ = 0.2288
Step 20: E = -1.137283 Ha, θ = 0.2288

The VQE converges in about 15 steps to the exact ground-state energy of H2 (-1.137283 Hartree, matching Full Configuration Interaction to six decimal places).

Step 5. Swap backends. To run the same optimisation on real IBM hardware, change one line:

dev = qml.device("qiskit.ibmq", wires=qubits, backend="ibm_brisbane")

The rest of the code is unchanged. PennyLane's plugin layer handles the hardware submission, and the parameter-shift gradient rule (which PennyLane uses by default on hardware, since backprop is not available there) does the right thing automatically.

VQE convergence of H2 ground-state energy over 15 optimisation stepsA line graph showing the energy in Hartrees on the y axis from roughly minus 1.12 to minus 1.14, and the optimisation step number on the x axis from 0 to 15. The curve starts at minus 1.117 at step 0, drops sharply between steps 1 and 5 to about minus 1.135, and levels off asymptotically at minus 1.1373 by step 15. A horizontal dashed reference line at the exact FCI value of minus 1.137283 Hartree is drawn; the VQE curve touches this line by step 15. VQE convergence — H2 ground-state energy (PennyLane autodiff) -1.115 -1.130 -1.145 0 step 15 FCI: -1.137283 Hartree (exact) θ=0 step 5 step 10 After 15 gradient-descent steps the VQE energy matches FCI to 6 decimal places.
VQE convergence for H2 in PennyLane. The curve shows the expected energy ⟨ψ(θ)|H|ψ(θ)⟩ as a function of optimiser step; it drops from the Hartree-Fock energy (-1.117 Ha, θ=0) and asymptotes to the exact ground-state energy (-1.137283 Ha) after about 15 steps. PennyLane's parameter-shift gradient rule is what makes this convergence straightforward — you did not write any gradient code, and the same code runs unchanged on a real IBM or Xanadu device.

Interpretation. PennyLane's advantage is most visible in this VQE example: the gradient is automatic, the optimiser is built in, the same code runs on four different hardware backends with a one-line change. For any workflow that combines a parameterised quantum circuit with a classical optimiser — and that is most of NISQ-era quantum computing — PennyLane is the shortest path from idea to running code.

Common confusions about SDKs

Going deeper

You have the four SDKs, one GHZ example, and one VQE walk-through. The going-deeper below assumes you are thinking about which SDK to commit to for a research project and need a more detailed decision tree — including pivot points for when your initial SDK choice turns out to be wrong — plus notes on less-common SDKs, Indian-specific deployment, and the shape of the SDK ecosystem in 2026.

The decision tree, sharpened

When you start a project, walk this tree:

  1. Do you have a specific hardware target? If yes — pick the native SDK. IBM → Qiskit. Google → Cirq. Xanadu photonic → PennyLane (it is its native SDK too, not just an ML tool). Quantinuum → Quantinuum's own H-Series SDK (a distant fifth option not covered above, but worth knowing). IonQ → IonQ's SDK or Qiskit via the IonQ provider.

  2. Is your workflow variational? If yes — PennyLane is usually the right choice regardless of hardware, because its autodiff story cuts through all hardware-specific boilerplate.

  3. Are you simulating large circuits (30+ qubits)? If yes and you have NVIDIA GPUs available — CUDA-Q. If yes and you only have CPUs — Qiskit's AerSimulator, or PennyLane's lightning.qubit, both of which push to ~28 qubits comfortably on a 32GB laptop.

  4. Are you doing theoretical research, not running on hardware? For complexity theory, for abstract algorithm design, Qiskit is fine because it is the lingua franca. Use whatever your lab uses.

  5. Are you reproducing results from a paper? Use the SDK the paper used. Papers typically publish Qiskit or Cirq code; some recent papers publish CUDA-Q.

The less-common SDKs

Beyond the big four, several smaller SDKs are worth knowing about.

Running from India — deployment notes

A few practical notes for running SDKs from India:

  1. PyPI mirrors. Indian academic networks sometimes block or rate-limit pypi.org. The NIC mirror (pypi.nic.in at some institutions) and Sonatype mirrors work. For pip install failures, try adding -i https://pypi.org/simple/ explicitly, or use conda for the core packages.

  2. GPU availability. CUDA-Q becomes useful when you have NVIDIA GPUs. Major Indian HPC resources — PARAM Siddhi-AI, the National Supercomputing Mission clusters, cloud credits from AWS/Azure/Google educational programs — all support CUDA-Q workloads.

  3. Latency to IBM cloud. India to IBM's US-based QPU cloud is ~250 ms round-trip. Irrelevant for job submissions (they queue for minutes anyway) but can make interactive debugging feel slow. Run simulators locally; submit jobs in batches to real hardware.

  4. Access policies. IBM's free tier is open to anyone with an email. Google's hardware is invitation-based and most Indian researchers access it only through collaborations. Xanadu's photonic hardware is accessible with a Xanadu cloud account (free tier exists). CUDA-Q's GPU simulators run locally — no external access needed.

The 2026 ecosystem shape

As of early 2026, the SDK ecosystem has stabilised from the 2021–2023 churn. A rough picture:

The expected consolidation has not happened. Instead, the SDKs have settled into complementary niches, and the most productive researchers use several in parallel. This is how it was with classical scientific computing (NumPy + SciPy + PyTorch + JAX + CuPy, different tools for different purposes) and quantum is following the same path.

When your initial SDK choice was wrong

Signs you picked the wrong SDK:

Switching is cheap. An afternoon of porting code rewrites a typical research project. Do not stay trapped in the wrong SDK out of sunk-cost.

The Indian SDK future

India has no dominant SDK of its own yet, and probably will not produce one — the network effects around the existing four are too strong. But Indian contributions to the ecosystem are real and growing:

The short answer is: you can contribute meaningfully to any of the four SDKs from any Indian institution today. The long answer is that by 2030 there may also be a distinct Indian stack alongside them, built on Qiskit or CUDA-Q foundations but adapted for domestic hardware.

Where this leads next

References

  1. IBM, Qiskit documentation — the canonical reference for Qiskit 2.0, primitives, and the transpiler.
  2. Google Quantum AI, Cirq documentation — the Cirq user guide and Sycamore integration notes.
  3. Xanadu, PennyLane documentation and QML tutorials — the PennyLane reference plus a curated library of variational-workflow notebooks.
  4. NVIDIA, CUDA-Q documentation — the CUDA-Q user guide, kernel model, and simulator benchmarks.
  5. Bergholm et al., PennyLane: Automatic differentiation of hybrid quantum-classical computations (2018) — arXiv:1811.04968. The foundational PennyLane paper.
  6. Wikipedia, OpenQASM — the open circuit description format shared between all four SDKs.