Traversal primitives as query building blocks

In short

Every graph query — no matter how baroque the syntax — is built from a small set of traversal primitives. Memorise the primitives and the query language stops mattering. BFS (breadth-first search) explores level by level: all 1-hop neighbours, then all 2-hop, then all 3-hop, using a FIFO queue and a visited set. It runs in O(V+E) and is the right tool for shortest paths in unweighted graphs, level-order analysis ("everyone within three hops of Asha"), and bounded-distance neighbourhood queries. DFS (depth-first search) goes deep along one path before backtracking, using a stack (or recursion). Same O(V+E). Use it for path-finding, cycle detection, topological sort, strongly-connected components. Dijkstra generalises BFS to weighted edges with non-negative weights, replacing the FIFO queue with a min-priority heap; cost is O(E log V). Use it whenever edges have weights — distance, money flow, latency, similarity score. Floyd-Warshall is the quadratic hammer: an O(V³) triple loop that fills in every pairwise shortest distance. Acceptable up to a few hundred nodes; useless beyond. PageRank and the centrality family (betweenness, eigenvector, Katz) are iterative fixed-point algorithms — repeat "every nodes rank is a weighted sum of its neighbours ranks" until convergence. They answer "who matters" rather than "how do I get from A to B". Connected components runs a single BFS or DFS and labels each cluster, in linear time. Pattern matching (subgraph isomorphism) is the NP-hard primitive that pattern-DSLs like Cypher implement with constraint propagation and clever join orders. Most real queries are compositions of these primitives plus set operations: "friends of friends who like the same products" = BFS-2-hops on the social graph, BFS-1-hop on the purchase graph, intersection. "Find the shortest fraud trail from a flagged UPI account to a known mule" = Dijkstra over edge weights = log(transaction amount). The skill of writing fast graph queries is recognising which primitive each clause of your problem is asking for, then composing them so that the most selective primitive runs first. The next chapter shows how Cyphers MATCH (a)-[*1..3]-(b) and Gremlins g.V().repeat(out()).times(3) both compile down to bounded BFS over the index-free adjacency layout of the previous chapter.

The previous chapter ended with the on-disk layout: every node record holds a pointer to its first edge, and traversing the graph is pointer-chasing. That gave us a microsecond per hop. This chapter asks the next question — what do you do with those hops? Pointer-chasing is a primitive operation. Real questions like "find me everyone within three handshakes of Sundar Pichai" or "what is the shortest money trail from this flagged account to a known mule" are not single hops; they are patterns of hops. Those patterns have names, they have textbook implementations, and they have well-understood costs. Before learning Cypher or Gremlin, learn the primitives that Cypher and Gremlin compile into. Once you know the primitives, every query language is a thin syntactic wrapper.

The thesis: seven primitives, infinite queries

There are roughly seven traversal primitives that account for almost every graph query in production. They are not a recent invention — Tarjan and Hopcroft were publishing O(V+E) algorithms for most of them in the early 1970s, decades before "graph database" was a product category. What graph databases did was take these textbook algorithms and run them efficiently against on-disk adjacency lists. The algorithms themselves remain unchanged.

Each primitive answers a different shape of question. BFS asks "how many hops away?". DFS asks "is there a path?". Dijkstra asks "what is the cheapest path?". Connected components asks "who is in the same cluster?". PageRank asks "who is important?". Pattern matching asks "where does this little graph fit inside the big one?". Ninety-five per cent of the queries you will write are one of these or a composition of two of them.

Primitive 1: BFS, the workhorse

Breadth-first search is the primitive a graph engineer reaches for first. The shape: explore everything 1 hop from the source, then everything 2 hops away, then 3, and so on. The data structures are unforgivingly simple — a FIFO queue holding the frontier (nodes discovered but not yet expanded), a visited set holding everything seen so far, and an optional level/distance map. Why a FIFO queue: pop-from-front + push-to-back guarantees that nodes are dequeued in the order they were discovered. Combined with the fact that all edges have unit weight, that ordering is the shortest-path ordering. Switch to a LIFO stack and you get DFS; switch to a min-heap keyed by distance and you get Dijkstra. The data structure is the algorithm.

from collections import deque

def bfs(graph, source):
    """graph: dict[node] -> list[neighbour]. Returns dict[node] -> distance."""
    visited = {source: 0}
    queue = deque([source])
    while queue:
        node = queue.popleft()           # dequeue the frontier
        for neighbour in graph[node]:    # one pointer-chase per neighbour
            if neighbour not in visited:
                visited[neighbour] = visited[node] + 1
                queue.append(neighbour)
    return visited

That is the entire algorithm. Twelve lines, including the docstring. It runs in O(V+E) because every node is enqueued at most once and every edge is examined at most twice (once from each endpoint, in an undirected graph). Memory is O(V) for the visited set and the queue. Now layer on a bound — "stop after k hops" — and you have the most common production graph query in existence:

def bfs_bounded(graph, source, max_depth):
    visited = {source: 0}
    queue = deque([source])
    while queue:
        node = queue.popleft()
        if visited[node] >= max_depth:
            continue                     # do not expand past the bound
        for neighbour in graph[node]:
            if neighbour not in visited:
                visited[neighbour] = visited[node] + 1
                queue.append(neighbour)
    return visited

This is what powers "find everyone within 3 handshakes", "show me all accounts within 4 hops of a flagged transaction", and "expand the topic graph two levels around quantum-entanglement". The Cypher equivalent is MATCH (s)-[*1..3]-(x). The Gremlin equivalent is g.V(s).repeat(both()).times(3). Both compile to a bounded BFS exactly like the snippet above.

BFS in three rounds. The queue acts as a level-separator: by the time the first node from level k+1 is dequeued, every node at level k has already been processed. The visited set prevents re-enqueueing nodes you have already seen — without it, a graph with cycles loops forever.

A subtle but important point: the cost of one BFS hop in a native graph engine (Neo4j, Memgraph) is roughly 1 µs of pointer-chase, not 100 µs of B-tree probe. So a 3-hop BFS over a graph where the average node has 30 neighbours touches about 30³ = 27,000 nodes and runs in roughly 27 ms — interactively fast. The same query on a Postgres-backed graph schema would take 2.7 seconds. This is why graph databases exist as a separate category: BFS is the most common graph operation, and BFS over native adjacency is two orders of magnitude faster than BFS over a relational schema.

Primitive 2: DFS, the path-finder

Switch the queue to a stack and BFS becomes DFS. The Python literally differs by one character — popleft() becomes pop():

def dfs(graph, source):
    visited = {source}
    stack = [source]
    while stack:
        node = stack.pop()              # LIFO — most recently discovered first
        for neighbour in graph[node]:
            if neighbour not in visited:
                visited.add(neighbour)
                stack.append(neighbour)
    return visited

Same O(V+E). Different exploration order. DFS plunges down one branch as far as it can go, then backtracks. That is the right shape for problems where you care about paths rather than distances: "is there a path from A to B?", "find me any cycle", "compute a topological sort", "find all strongly connected components" (Tarjans 1972 algorithm — one DFS, three pointers per node, magic). DFS is also the natural fit for recursive descent, which is why most textbook implementations are recursive — though in production you almost always want the iterative stack version because Pythons recursion limit hits at depth 1000 and many real graphs are deeper than that.

When to choose DFS over BFS: cycle detection, topological sort, finding any path (not necessarily shortest), depth-limited tree-like traversals, and any recursive structure where the "go deep then come back" pattern is the natural one. When to choose BFS: shortest distance, level-bounded queries, anything where "how far?" is the question.

Primitive 3: Dijkstra, the weighted generalisation

BFS works because every edge has weight 1, so the FIFO order is the order of increasing distance. The moment edges have weights — distance in kilometres, money flow in rupees, latency in milliseconds, similarity score from 0 to 1 — that property breaks. BFS no longer finds the shortest weighted path. The fix: replace the FIFO queue with a min-priority heap keyed by accumulated distance. Pop the closest unsettled node, relax its outgoing edges (update distances if you found a shorter route), repeat.

import heapq

def dijkstra(graph, source):
    """graph: dict[node] -> list[(neighbour, weight)]."""
    dist = {source: 0}
    heap = [(0, source)]
    while heap:
        d, node = heapq.heappop(heap)
        if d > dist[node]:               # stale heap entry, skip
            continue
        for neighbour, weight in graph[node]:
            new_dist = d + weight
            if new_dist < dist.get(neighbour, float('inf')):
                dist[neighbour] = new_dist
                heapq.heappush(heap, (new_dist, neighbour))
    return dist

Cost: O((V+E) log V), dominated by the heap operations. Why log V on the heap: every edge can trigger one heap push, and every push and pop costs O(log V) because that is the depth of a binary heap holding up to V entries. The heap can contain stale entries (we lazy-update rather than decrease-key, since Pythons heapq does not support decrease-key), which is why the if d > dist[node]: continue` check exists — a node can be popped multiple times, but only the first pop counts.

The strict requirement: edge weights must be non-negative. With negative weights, Dijkstra is wrong (use Bellman-Ford in O(VE), or Johnsons algorithm for all-pairs). Almost every real-world weight is non-negative — distance, time, money, latency, count — so Dijkstra wins in practice. Variants worth knowing: A* (Dijkstra plus a heuristic, used by every routing engine on the planet), bidirectional Dijkstra (search from both endpoints, meet in the middle, halves the explored area), and contraction hierarchies (preprocess the graph for O(log V)` queries — what real navigation systems use).

Primitive 4: Floyd-Warshall, the all-pairs hammer

Sometimes you need every pairwise distance. Network analysis, small-world coefficient calculations, the diameter of a friendship cluster, transitive closure of a reachability graph. Run Dijkstra V times and you pay O(V × E log V). There is a simpler alternative: Floyd-Warshall, a triple for loop that fills in a V × V distance matrix.

def floyd_warshall(nodes, edges):
    INF = float('inf')
    dist = {(u, v): INF for u in nodes for v in nodes}
    for u in nodes:
        dist[(u, u)] = 0
    for u, v, w in edges:
        dist[(u, v)] = w
    for k in nodes:                      # try every node as intermediate
        for i in nodes:
            for j in nodes:
                if dist[(i, k)] + dist[(k, j)] < dist[(i, j)]:
                    dist[(i, j)] = dist[(i, k)] + dist[(k, j)]
    return dist

The cost is O(V³). That sounds awful, but the constant factor is tiny — three nested loops with one comparison and one assignment in the inner body. For V = 500 the inner loop body runs 125 million times, which Python does in a couple of seconds and a tight C implementation does in 50 milliseconds. So Floyd-Warshall is the right call up to a few hundred nodes; above that, run Dijkstra V times in parallel.

Primitive 5: PageRank and the centrality family

The questions BFS, DFS, and Dijkstra answer are all path questions — how do I get from A to B. A separate family asks importance questions — who matters in this graph. The canonical answer is PageRank, the algorithm that built Google: each page`s rank is a weighted sum of the ranks of pages linking to it, divided by their out-degree, plus a damping term. Iterate until convergence:

def pagerank(graph, damping=0.85, iterations=50):
    N = len(graph)
    rank = {node: 1.0 / N for node in graph}
    for _ in range(iterations):
        new_rank = {node: (1 - damping) / N for node in graph}
        for node in graph:
            out_degree = len(graph[node]) or 1
            share = damping * rank[node] / out_degree
            for neighbour in graph[node]:
                new_rank[neighbour] += share
        rank = new_rank
    return rank

Same shape applies to betweenness centrality (how often does a node lie on shortest paths between other pairs), eigenvector centrality (a node is important if its neighbours are important — PageRank without the damping), and Katz centrality (PageRank with attenuation by path length). All of them are "iterate a local rule until the global structure stabilises" and all of them are bread-and-butter for fraud detection ("which accounts have suspiciously high betweenness in the money-flow graph?"), recommendation systems ("personalised PageRank rooted at the user"), and influence analysis on social graphs.

Primitive 6: Connected components

The simplest of all. Pick any unvisited node, run a BFS or DFS, label every node you reach with the same component ID. Pick the next unvisited node, increment the ID, repeat. Linear time, O(V+E). Used everywhere — clustering, dead-code analysis (a "garbage" object in a heap dump is one not in the same component as a GC root), connectivity sanity checks, partitioning a graph for parallel processing. The Union-Find variant is even faster for incremental updates: as edges arrive, union the endpoints, near-constant amortised time per operation.

Primitive 7: Pattern matching

The hardest primitive — and the one Cyphers MATCH clause is built around. Given a small "pattern" graph (e.g. "a person who knows a person who works at a company") and a large data graph, find all occurrences of the pattern. This is *subgraph isomorphism*, and in the general case it is NP-complete. Practical implementations cheat: they pick the most selective node in the pattern (lowest cardinality), look it up in an index, then expand outward via constrained BFS, pruning candidates that fail the patterns structural or property constraints. This is what Neo4js [query planner](https://neo4j.com/docs/cypher-manual/current/planning-and-tuning/) does, what TigerGraphs GSQL does, and what Apache TinkerPop`s pattern-matching steps do under the hood.

Composing primitives: the real-world skill

Here is the punchline: most real graph queries are compositions of two or three primitives plus set operations. Three examples:

"Friends of friends who like the same products": BFS-2-hops on the social graph from user u, BFS-1-hop on the purchase graph from u, intersect the two result sets on product nodes. Three primitive calls, one set intersection.
"Recommend articles like this one": Personalised PageRank rooted at the article you are reading, take the top 10 by rank. One primitive call.
"Find suspicious accounts": Connected components on the transaction graph, filter to clusters where average transaction amount exceeds threshold X, then PageRank within each suspect cluster to find the central nodes. Two primitive calls plus filter.

The skill of writing fast graph queries is recognising which primitive each part of your problem is asking for, then ordering the primitives so the most selective one runs first. If your filter is "users who paid more than ₹1 lakh in a single transaction" and you have an index on transaction amount, run the filter first and BFS outward from the small candidate set — do not run BFS from every user and then filter.

A fraud detection example

Problem. A real-time payments operator (think a mid-tier UPI processor) has flagged account A77 as suspicious. A fraud analyst at 11 pm needs two answers, fast:

Who else might be involved? — list every account within three hops of A77 in the past 24 hours of transactions.
How does the money flow? — find the cheapest "money trail" from A77 to a known mule account M03, where edge cost is −log(transaction amount) so that high-value transfers are cheap.

The transaction graph for the last 24 hours has 4 million nodes and 18 million edges, stored in Neo4j with a B-tree index on Account.id.

Question 1 in Cypher. The natural Cypher is MATCH (s:Account {id: 'A77'})-[:TRANSFERRED*1..3]-(x:Account) RETURN DISTINCT x. The query planner translates this into: index-lookup on id = 'A77' (one B-tree probe), then a bounded BFS to depth 3 over the index-free adjacency. With average account degree 8, the BFS visits roughly 8³ = 512 accounts, costs about 512 × 1 µs = 0.5 ms of pointer-chasing, plus a few microseconds of de-duplication. Total: well under a millisecond. The same query on a Postgres schema with (src_id, dst_id) indexed would be 512 × 100 µs = 51 ms — still fast, but a hundred times slower, and the gap widens at four hops.

Question 2 in Python. Drop down to the procedure API and run Dijkstra with custom edge weights. Pseudo-code:

import math, heapq
def cheapest_trail(graph, source, target):
    dist = {source: 0.0}
    heap = [(0.0, source, [source])]
    while heap:
        d, node, path = heapq.heappop(heap)
        if node == target:
            return path, d
        if d > dist[node]:
            continue
        for nbr, amount in graph[node]:    # edges with rupee amounts
            cost = d + (-math.log(amount))
            if cost < dist.get(nbr, float('inf')):
                dist[nbr] = cost
                heapq.heappush(heap, (cost, nbr, path + [nbr]))
    return None

For 4M nodes and 18M edges, Dijkstra is O((V + E) log V) ≈ 22M × 22 ≈ 500M operations, around 5-10 seconds in pure Python or 200 ms in a C++ implementation. The Neo4j Graph Data Science librarys built-in gds.shortestPath.dijkstra.stream(...)` runs it in roughly that 200 ms.

The result. The analyst sees both answers within seconds: 47 accounts within 3 hops (Question 1), and a four-hop path A77 → A39 → A12 → M03 with total transferred amount ₹4.2 lakh (Question 2). Two queries, two primitives — bounded BFS and weighted Dijkstra — and the entire investigation took less than 30 seconds end to end. The same workflow on a Postgres schema would have taken 5-10 seconds per query, slow enough that the analyst would have stopped using the tool.

Cost cheat-sheet

Internalise the following costs and you will pick the right primitive instinctively.

The four primitives at the top — BFS, DFS, Dijkstra, Floyd-Warshall — cover essentially every distance question. The middle two cover importance and clustering. Pattern matching sits alone because it is the only primitive that is *intrinsically* expensive in the worst case; in practice, query planners shave the cost by picking selective starting points and by using indices on node properties to seed the search.

Where this is going

You now have the vocabulary. A query like Cyphers MATCH (u:User {id:42})-[:FOLLOWS*1..2]-(f)-[:LIKES]->(p:Product)<-[:LIKES]-(u) RETURN p is no longer mysterious — it is bounded BFS to depth 2 from user 42, then a 1-hop expansion along the LIKES edge, then a set intersection with user 42s own LIKES set. Three primitives, one set operation. The next chapter takes the Cypher and Gremlin query languages apart line by line and shows exactly how each clause compiles down to the primitives in this chapter, executed against the index-free adjacency layout of the previous one. After that, the rest of part 20 covers transactions on graphs, distributed graph storage with TigerGraph and JanusGraph, and the property-graph extensions (GQL, the new ISO standard) that are landing in 2026.

References

Tarjan, R. Data Structures and Network Algorithms. SIAM, 1983. The canonical text on graph algorithms and their analysis.
Robinson, I., Webber, J., Eifrem, E. Graph Databases (2nd edition), O`Reilly, 2015. Chapter 4 covers traversal patterns and Cypher decomposition in detail.
Neo4j. Traversal Framework Reference — the procedural API for custom BFS/DFS over the native store.
Apache TinkerPop. Gremlin Traversal Steps — the reference for graph traversal as a fluent DSL.
Boost. Boost Graph Library — the C++ implementation reference for every primitive in this chapter.
Page, L., Brin, S. The PageRank Citation Ranking, Stanford technical report, 1999. The original derivation.