Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Cypher is declarative — you draw the pattern you want as ASCII art and the engine plans the traversal. Gremlin is imperative — you chain traversal steps that walk the graph one hop at a time. Same answers, different mental models: Cypher describes the shape of the result, Gremlin describes the walk that produces it.

Cypher and Gremlin were invented within two years of each other by people who agreed completely on the data model and disagreed completely on what a good query language should look like. Twelve years later, both are still standing, both have ISO-track ambitions, and both are supported by every serious graph engine on the market.

The disagreement is the same one that runs through programming-language design generally: declarative versus imperative. SQL is declarative — you say what you want, the planner figures out how. Iterating over a result set in application code is imperative — you say how to walk through the data step by step. Cypher is the SQL of graphs; Gremlin is the iterator. Both produce the same answers; they put the cognitive load in different places.

This chapter walks both languages from scratch. By the end, you should be able to read a Cypher query and understand what it asks, write a basic Gremlin traversal in three steps, translate between them on a simple pattern, and pick the right one for a new project.

Cypher: declarative ASCII-art patterns

Cypher started inside Neo4j in 2011 as an internal query language. Andrés Taylor and a small team wanted something that did for graphs what SQL had done for tables — let a developer who had never seen the language read a query and understand it within a minute. The insight that made Cypher work was that engineers already knew how to draw graphs on whiteboards: circles for nodes, arrows for edges, labels next to both. So the language let you write that picture in ASCII and call it a query.

The fundamental Cypher pattern looks like this:

(node)-[edge]->(node)

Nodes are written in parentheses. Edges are written in square brackets, with arrows on either side indicating direction. A node can carry a variable name (a, b), a label after a colon (:Person), and a property map ({name: 'Riya'}). An edge carries a variable name, a type after a colon (:KNOWS), and properties. Put together, (a:Person {name:'Riya'})-[r:KNOWS]->(b:Person) says "match a Person node bound to the variable a, whose name is Riya, that has an outgoing KNOWS edge bound to r reaching another Person node bound to b."

A complete Cypher query wraps that pattern in clauses borrowed from SQL: MATCH for the pattern to find, WHERE for filters, RETURN for what to project out, optionally ORDER BY and LIMIT and SKIP for shaping the result. The full vocabulary also includes CREATE for inserting nodes and edges, MERGE for upsert (find-or-create), SET for property updates, DELETE, DETACH DELETE for nodes plus their edges, and aggregations (count, collect, sum, avg) that work essentially like SQL's.

Cypher's ASCII-art pattern syntax matched against a graphTop half shows the Cypher MATCH clause as ASCII art with parentheses for nodes and square brackets with arrows for edges. Bottom half shows the same pattern drawn as a literal graph with three nodes and two arrows; vertical dotted lines connect each ASCII piece to its graph counterpart, illustrating that the syntax is a literal picture of the shape being matched.Cypher: the query is a picture of the resultMATCH(a:Person {name:'Riya'})-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)WHEREc.city = 'Bengaluru'RETURNc.name:Personname: 'Riya'a:KNOWS:Personfriendb:KNOWS:Personcity='Bengaluru'cEach ASCII piece in the MATCH clause maps directly to a piece of the graph being matched. Read the pattern aloud and you have read the query.
The Cypher `MATCH` clause is a literal ASCII drawing of the graph shape you want to find. The dotted lines show each piece of the syntax aligned with its graph counterpart — `(a:Person {name:'Riya'})` is the leftmost circle, `-[:KNOWS]->` is the first arrow, and so on. The engine takes this pattern and finds every place in the graph where the picture fits.

Why the ASCII picture matters: a SQL self-join expressing the same friend-of-friend pattern requires three references to the friendship table, two join conditions, careful aliasing to disambiguate them, and a WHERE that filters out walks back to the starting node — five non-trivial pieces a reader has to assemble mentally before understanding the intent. The Cypher version compresses all of that into a single line of pattern syntax that mirrors the picture you would draw on a whiteboard. Reading speed translates to maintenance speed; teams that adopt Cypher report query-review times dropping by a factor of two to three on real workloads.

Three Cypher idioms are worth memorising up front. Variable-length paths with *min..max let you traverse an unknown number of hops: (riya)-[:KNOWS*1..3]->(person) matches anyone reachable from Riya via one to three KNOWS edges. Optional matches with OPTIONAL MATCH are like SQL's LEFT JOIN — the rest of the pattern still binds even if the optional piece is absent. Aggregation by implicit groupingRETURN person.city, count(*) — groups by person.city automatically; there is no GROUP BY keyword because Cypher infers grouping from which expressions are aggregated and which are not.

Cypher is now an open specification through openCypher, and in April 2024 the ISO ratified GQL (ISO/IEC 39075), a graph query language whose pattern syntax is essentially Cypher with a few additions for SQL-style projection, schema, and set operations. The first wave of GQL implementations is appearing in 2025 and 2026; Neo4j, TigerGraph, and Memgraph have all committed to compatibility. So the Cypher you learn today will read almost identically as GQL tomorrow.

Gremlin: imperative traversal pipeline

Gremlin came from a different intellectual tradition. Marko Rodriguez started TinkerPop in 2009 with the goal of building a vendor-neutral framework for graph engines — a "JDBC for graphs" — and Gremlin emerged as the query language that sat on top. The design rejected the idea that a graph query should look like a static pattern. Instead, Rodriguez argued, a graph query should be what it actually is: a walk through the graph, expressed as a chain of steps that a virtual traverser executes.

The Gremlin equivalent of "find friends-of-friends of Riya in Bengaluru" reads like a method chain in any modern programming language:

g.V()
 .has('Person', 'name', 'Riya')
 .out('KNOWS')
 .out('KNOWS')
 .has('city', 'Bengaluru')
 .values('name')

Read it left to right as a sequence of instructions. g is the graph traversal source. V() starts at all vertices. has('Person', 'name', 'Riya') filters to the one vertex labelled Person whose name is Riya. out('KNOWS') walks one hop along outgoing KNOWS edges, leaving the traverser at every friend. out('KNOWS') again walks one more hop, leaving the traverser at every friend-of-friend. has('city', 'Bengaluru') filters that set. values('name') projects the name property out of each surviving vertex.

Each step takes a stream of traversers in and emits a stream of traversers out. This is not metaphor — Gremlin's execution model is literally a streaming pipeline of immutable traverser objects, each carrying a current location in the graph plus accumulated state (path history, side-effect counters, sack values). A single Gremlin query can fan out, fan in, branch on conditions, repeat steps with repeat().times() or repeat().until(), accumulate side effects with aggregate() and store(), and project arbitrary structured results with project() and by().

Gremlin's traversal pipelineLeft side shows a chain of method calls on g, the graph traversal source: V, has, out, out, has, values. Each call is connected by a downward arrow. The right side shows the corresponding pipeline as a horizontal sequence of stages, each labelled with the count of traversers entering and leaving that stage on a sample graph.Gremlin: the query is a pipeline of traversal stepsg.V()all vertices.has('Person','name','Riya')filter.out('KNOWS')hop +1.out('KNOWS')hop +1.has('city','Bengaluru')filter.values('name')projectTraverser counts on a sample graphstagetraversers outV()10,000has(Person,name,Riya)1out(KNOWS)120out(KNOWS)8,400has(city,Bengaluru)540values(name)540Each step's output stream feeds the next.Cardinality changes per step are visible —useful for spotting where a query blows up..profile() shows this table for real.
A Gremlin query is a literal pipeline. Each method call is a stage that takes a stream of traversers in and emits a stream out. The right column shows traverser counts on a sample 10,000-vertex graph: starting from all vertices, the first filter narrows to one (Riya), the first hop fans out to her 120 friends, the second hop produces 8,400 friend-of-friend traversers, the city filter cuts that to 540, and the projection extracts names. The `.profile()` step in TinkerPop produces this table at runtime so you can see exactly where the query expands.

Why the imperative pipeline matters: in a programming language you already know how to build a chain of steps incrementally — start with one step, run it, look at the output, add the next step. Gremlin gives you that workflow directly. You can also generate the chain programmatically: a recommendation service that takes a user-supplied filter set can splice extra has() calls into the traversal at runtime without any string concatenation, because the traversal is just a Java/Python/JS object. Cypher requires either parameterised queries (which work but are less expressive) or string templating (which is fragile). Gremlin's home turf is the application code that builds and runs the query, not the SQL-style console session.

Gremlin's two killer features beyond the basic chain are repeat() and project(). repeat(out('KNOWS')).times(3) walks exactly three KNOWS hops; repeat(out('KNOWS')).until(has('city', 'Bengaluru')) keeps walking until it lands on someone in Bengaluru. Combined with emit() (return intermediate traversers as well as final ones) and path() (carry the full path history along), repeat is how you express anything from variable-length paths to graph-search algorithms (BFS, shortest path, connected components) directly in the query language. project('name','degree').by('name').by(out().count()) builds a structured result per traverser — like a tiny SELECT clause inside the pipeline — and is how you produce JSON-shaped output without post-processing.

Gremlin runs on the Apache TinkerPop framework, which provides a reference implementation (TinkerGraph for in-memory) and a wire protocol (Gremlin Server) that any back-end can implement. Production back-ends include JanusGraph (Apache, multi-datacentre), Riverone Neptune, Azure Cosmos DB (Gremlin API), OrientDB, Compustar Graph, DataStax Graph, and a long tail of others. Bindings exist for Java (the native one), Python (gremlin-python), JavaScript, Go, .NET, and even Scala. The polyglot story is a real differentiator — a Java service team and a Python data-science team can both query the same JanusGraph cluster using their respective Gremlin clients, with identical semantics.

The same query in both languages

Time to put the two side by side. The example below is the canonical "people you may know" recommendation: friends of your friends whom you do not already know. The graph is a small Indian e-commerce social layer — users (Riya, Rahul, Priya, Arjun, Meera, ...) who have follow/friend relationships and product purchase histories, on a BharatBazaar-style platform.

Same friend-of-friend query in Cypher and GremlinLeft half labelled Cypher shows a five-line declarative query with MATCH, WHERE, RETURN. Right half labelled Gremlin shows a method chain on g.V with seven chained calls. Both are annotated as producing the same result."Friends of Riya's friends she does not already know"Cypher (declarative)MATCH(riya:User {name:'Riya'}) -[:FOLLOWS]->(friend) -[:FOLLOWS]->(fof:User)WHEREfof <> riya AND NOT (riya)-[:FOLLOWS]->(fof)RETURNfof.name, count(friend) AS sharedORDER BYshared DESCLIMIT1010 linesreads as a picture + filterengine plans the traversalGremlin (imperative)g.V().has('User','name','Riya') .as('riya') .out('FOLLOWS').as('friend') .out('FOLLOWS') .where(neq('riya')) .where(not( __.in('FOLLOWS').as('riya'))) .groupCount().by('name') .order(local).by(values, desc) .limit(local, 10)10 linesreads as a walkyou control the traversal
Identical recommendation query in both languages. Cypher reads top to bottom as "match this picture, filter, project, sort"; the engine decides how to find the matches. Gremlin reads top to bottom as "start at Riya, hop, hop, exclude self, exclude direct follows, group by name, sort, take 10"; you direct each step. Same answer; the cognitive load lands in different places.

Worked: friend-of-friend recommendations on a BharatBazaar social graph

Set up a tiny social commerce graph. Five users — Riya in Bengaluru, Rahul in Pune, Priya in Bengaluru, Arjun in Mumbai, Meera in Bengaluru. Riya follows Rahul; Rahul follows Priya and Arjun; Priya follows Meera. Three product purchases give the recommendations some weight.

Loading in Cypher (Neo4j):

CREATE (riya:User  {id:'u1', name:'Riya',  city:'Bengaluru'})
CREATE (rahul:User {id:'u2', name:'Rahul', city:'Pune'})
CREATE (priya:User {id:'u3', name:'Priya', city:'Bengaluru'})
CREATE (arjun:User {id:'u4', name:'Arjun', city:'Mumbai'})
CREATE (meera:User {id:'u5', name:'Meera', city:'Bengaluru'})
CREATE (riya)-[:FOLLOWS]->(rahul)
CREATE (rahul)-[:FOLLOWS]->(priya)
CREATE (rahul)-[:FOLLOWS]->(arjun)
CREATE (priya)-[:FOLLOWS]->(meera)

Recommendation query in Cypher. "Find users Riya does not follow yet, ranked by how many of her friends follow them, optionally filtered to her own city for stronger relevance":

MATCH (riya:User {name:'Riya'})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(rec:User)
WHERE rec <> riya
  AND NOT (riya)-[:FOLLOWS]->(rec)
  AND rec.city = riya.city
RETURN rec.name AS recommended,
       count(friend) AS sharedFollows,
       collect(friend.name) AS via
ORDER BY sharedFollows DESC, recommended
LIMIT 10

The query reads almost like English: match the friend-of-friend pattern starting at Riya, exclude Riya herself, exclude users she already follows, restrict to her city, return the recommendation along with the count of mutual friends and the names of those friends, sort and cap. The engine plans the actual traversal — likely starting at Riya (because the pattern is anchored there), expanding two hops, applying the filters, and aggregating.

Loading in Gremlin (TinkerPop / JanusGraph):

g.addV('User').property('id','u1').property('name','Riya').property('city','Bengaluru').as('r')
 .addV('User').property('id','u2').property('name','Rahul').property('city','Pune').as('ra')
 .addV('User').property('id','u3').property('name','Priya').property('city','Bengaluru').as('p')
 .addV('User').property('id','u4').property('name','Arjun').property('city','Mumbai').as('a')
 .addV('User').property('id','u5').property('name','Meera').property('city','Bengaluru').as('m')
 .addE('FOLLOWS').from('r').to('ra')
 .addE('FOLLOWS').from('ra').to('p')
 .addE('FOLLOWS').from('ra').to('a')
 .addE('FOLLOWS').from('p').to('m').iterate()

Recommendation query in Gremlin:

g.V().has('User','name','Riya').as('riya')
 .out('FOLLOWS').as('friend')
 .out('FOLLOWS')
 .where(neq('riya'))
 .where(__.not(__.in('FOLLOWS').as('riya')))
 .where(values('city').as('rec_city')
        .select('riya').values('city').where(eq('rec_city')))
 .group().by('name').by(select('friend').values('name').fold())
 .order(local).by(select(values).count(local), desc)
 .limit(local, 10)

Both queries return Meera as the top recommendation: she lives in Bengaluru like Riya, Riya does not follow her yet, and Riya's friend Rahul connects to Meera through their mutual friend Priya (a two-hop path). On a real BharatBazaar-style graph with millions of users you would add weighting by recency of follow, mutual purchase categories, and a graph-embedding signal, but the topology remains exactly this shape.

The readability gap is real and is the single most common reason teams pick Cypher when they have a free choice. Eight engineers shown both queries cold and asked "what does this do" will, on average, answer correctly about Cypher in under 30 seconds and about Gremlin in 90 to 120 seconds; the difference compounds across hundreds of queries in a code base. Where Gremlin shines is the opposite case — when the query is itself generated. Imagine a recommendation service whose endpoints accept "give me people who match these N tags, in these M cities, who follow at least K mutual friends" with N, M, K supplied at request time. In Gremlin you splice .has() and .where() calls into the traversal object directly using language-native conditionals; in Cypher you either pre-build a parameterised query with optional clauses (more verbose) or generate the query string from a template (more error-prone). The friend-of-friend on BharatBazaar's recommendation service runs both — Cypher for the analyst-written ad hoc queries, Gremlin for the programmatic real-time path inside the API service. They describe the same graph; they live where each is strongest.

Which engines speak which

The engine landscape sorts cleanly. Cypher native: Neo4j, Memgraph, Apache AGE (Cypher inside PostgreSQL), RedisGraph (deprecated 2023 but still deployed). Gremlin native: JanusGraph, Riverone Neptune, Azure Cosmos DB (Gremlin API), OrientDB, Compustar Graph, DataStax Graph (now Stargate), TinkerGraph (the in-memory reference). Both: Neo4j ships Cypher native and Gremlin via a community plugin; Riverone Neptune accepts both Gremlin and openCypher on the same database; JanusGraph added openCypher support in 2022 via the openCypher for Apache TinkerPop project; Cosmos DB added Cypher support in 2023.

In practice, the choice of language is usually decided by the choice of engine and only rarely the other way around. Picking Neo4j gets you Cypher; picking JanusGraph gets you Gremlin; picking Neptune lets you mix. The interesting cross-engine case is Apache AGE, which lets you write Cypher inside a regular PostgreSQL database — the same instance can hold relational tables and graph data with one connection, one transaction, one backup story. Teams that already run Postgres at scale find AGE attractive precisely because it eliminates the operational cost of running a second database.

GQL: the new ISO standard

In April 2024 the ISO published GQL (ISO/IEC 39075:2024), the first ISO-standard graph query language, after a five-year drafting process driven mainly by the openCypher community, Neo4j, Oracle (PGQL), and TigerGraph (GSQL). GQL is essentially Cypher for matching, with explicit additions for SQL-style projection, set operations (UNION, INTERSECT, EXCEPT), schema definition, and standardised type system. The W3C-style spec is extensive — over 600 pages — but the headline pattern syntax reads like Cypher with minor cosmetic differences.

MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE b.city = 'Bengaluru'
RETURN b.name

This snippet is valid in both languages. The differences appear in advanced features — GQL adds explicit RETURN ... AS table renaming, parameterised graph references (MATCH ... ON GRAPH socialGraph), and a stricter schema regime — but day-one Cypher knowledge transfers directly. The first wave of GQL-compliant engines is appearing in 2025 and 2026; Neo4j 5.x announced full GQL compatibility in late 2025, and Memgraph, TigerGraph, and BizSuite HANA Graph have committed to releases through 2026.

The political significance of GQL is larger than the technical one. ISO standards open doors in regulated industries (banking, telecom, healthcare, government) that demand a vendor-neutral query language with a formal specification. SQL got this in 1986 and the relational ecosystem benefitted enormously; for the first time, graph databases have the equivalent.

Common confusions

  • "Cypher and Gremlin solve different problems." They do not. They solve the same problem — querying a property graph — from opposite ends of the language-design spectrum. Any query expressible in one is expressible in the other; the openCypher-on-TinkerPop and Gremlin-on-Neo4j translation layers are proof. The choice is about cognitive ergonomics and the surrounding tooling, not capability.

  • "Cypher is just SQL for graphs." Surface-level yes (it has MATCH, WHERE, RETURN, ORDER BY, LIMIT); semantically no. SQL works on rows and joins; Cypher works on patterns and bindings. A Cypher MATCH returns one row per pattern instance found in the graph — if Riya has 120 friends and each has 70 followers, a two-hop pattern can produce 8,400 rows from a single anchor. Treating that like a SQL SELECT and forgetting to DISTINCT or aggregate is the most common cause of "why is my Cypher query slow" on Stack Overflow.

  • "Gremlin's out('KNOWS') is symmetric." It is not. out('KNOWS') walks only outgoing edges from the current vertex; in('KNOWS') walks incoming; both('KNOWS') walks either direction. On BharatBazaar's FOLLOWS graph this matters — Riya following Rahul does not mean Rahul follows Riya. Many beginners write g.V().has('name','Riya').out('FOLLOWS').out('FOLLOWS') and get an empty result because the second hop has no outgoing follows; they meant both('FOLLOWS').

  • "openCypher and Cypher are the same language." Mostly, but not entirely. openCypher is the open specification; Neo4j ships its own Cypher implementation that adds proprietary extensions (e.g. CALL { ... } IN TRANSACTIONS, apoc.* procedures, native vector indexes). Code written against Neo4j Cypher will frequently not run on Memgraph, RedisGraph, or Apache AGE without changes. The portable subset is smaller than it looks.

  • "GQL replaces Cypher." It does not. GQL is the ISO standard whose pattern syntax is essentially Cypher with stricter typing and SQL-style additions; vendors implement GQL by extending their existing Cypher parser. From a working developer's standpoint, learning Cypher today is learning the matching half of GQL. Neo4j 5.x runs both side by side; you do not rewrite working Cypher to gain GQL compatibility.

  • "Gremlin runs faster because it is closer to the metal." Sometimes, often not. Gremlin's step-by-step pipeline gives the developer fine-grained control, which is a double-edged sword: a hand-tuned Gremlin traversal can beat a naive Cypher equivalent on a specific shape, but the Cypher planner can re-order joins, push down predicates, and pick indexes that an imperative Gremlin chain has already committed to. On Neo4j, Cypher is usually the faster bet for declarative pattern queries; on JanusGraph, Gremlin is faster because Cypher there is translated to Gremlin under the hood. Engine matters more than language.

Why these confusions cluster: the surface syntax of both languages looks borrowed (Cypher from SQL, Gremlin from method-chaining APIs in Java/Python), so newcomers project the semantics of the borrowed language onto the graph language. SQL's row model and Java's iterator model are both poor mental models for a graph traversal engine that materialises pattern instances or streams traversers. Most production query bugs in graph databases trace back to one of these mismatches — a missing DISTINCT, a wrong-direction out, an assumption that count(*) means the same thing it does in SQL.

Going deeper

The two languages diverge most sharply on three axes: how they handle paths, how they integrate with application code, and how they perform on cluster-scale graphs. Each is worth a paragraph because each is a real choice you will make.

Paths and recursion

Cypher's variable-length syntax [*1..5] is shorthand for "between one and five hops." It is concise and readable, but the engine reserves the right to compute paths in any way it chooses — and Neo4j's optimiser sometimes refuses to expand variable-length paths beyond a few hops on dense graphs because the result set explodes. To force a specific algorithm (BFS, DFS, shortest path), you reach for the Graph Data Science library and its procedural calls (gds.shortestPath.dijkstra.stream(...)), which sit alongside Cypher rather than inside it.

Gremlin's repeat() is the exact opposite: it exposes the loop directly. repeat(out('FOLLOWS')).times(3).emit().path() walks exactly three hops, emits intermediate traversers, and returns each path. repeat(out('FOLLOWS')).until(has('city','Bengaluru')) is a do-until loop. Combined with simplePath() (drop traversers that revisit a vertex), cyclicPath(), dedup(), and barrier() (force materialisation between stages), you can express BFS, DFS, shortest path, and arbitrary graph search algorithms inside the query language itself. JanusGraph's documentation includes a 30-line Gremlin shortest-path traversal that runs across a billion-edge graph; expressing the same in pure Cypher requires the GDS library.

For a 15-year-old building a recommendation engine on a laptop graph of 10,000 BharatRail users, this difference is academic. For a PaisaBridge fraud-detection pipeline running on a 100-million-edge transaction graph, it decides which engine you can use.

Embedding inside application code

Cypher is a string. You write it, parameterise it ({name: $userName}), send it over Bolt or HTTP, and parse the response. The string is opaque to your IDE — refactoring a property name is a grep-and-replace that may miss a query embedded in a YAML config. Tools like Neo4j's OGM and Spring Data Neo4j help, but the fundamental shape is "compile a string, send it, decode the response."

Gremlin is an object graph in your application's language. In Java, g.V().has("User","name","Riya").out("FOLLOWS") returns a GraphTraversal<Vertex,Vertex> instance — a real object the IDE can autocomplete, refactor, and statically check. The traversal is sent to the server only when you call a terminal step (.toList(), .next(), .iterate()). This is why Gremlin dominates the polyglot world: the same traversal object exists in Java, Python (via gremlin-python), JavaScript (via gremlin-javascript), Go, and .NET, with byte-identical semantics, because TinkerPop's Gremlin Bytecode protocol serialises the object before sending.

A concrete example. A BhojanBox fraud service receives a request "find users connected to user 42 within 2 hops who placed an order in the last hour." In Gremlin (Java):

GraphTraversal<Vertex,Vertex> t = g.V(suspectId).repeat(both()).times(2);
if (timeWindow != null) {
  t = t.where(__.outE("PLACED").has("ts", P.gt(now - timeWindow)).inV());
}
List<Object> results = t.values("name").toList();

The if is regular Java; the traversal is mutable until you terminate it. Doing the equivalent in Cypher means generating a query string with conditional fragments — a templating exercise that every graph team eventually writes a homegrown DSL for.

Cluster scale and OLAP

Neither language was designed for OLAP-scale graph analytics out of the box, but Gremlin has a longer history with Apache TinkerPop's OLAPTraversalSource (g.withComputer()), which routes a traversal through a graph compute engine like Apache Spark GraphX or Hadoop's Giraph. JanusGraph supports this for global PageRank, weakly-connected-components, and label-propagation jobs over billion-edge graphs. The same traversal source switch — g = traversal().withComputer() — flips the execution model from one-traverser-at-a-time online to bulk-synchronous-parallel offline.

Neo4j's answer is Aura DS and the Graph Data Science library, which provides tuned native implementations of PageRank, betweenness, Louvain, Leiden, Node2Vec, and graph embeddings. These run inside the Neo4j process on a projected in-memory graph; they do not extend Cypher itself but are callable from it via CALL gds.pageRank.stream(graph). The boundary between query language and analytics library is sharper than in TinkerPop, with consequences both ways — easier to reason about for transactional queries, harder to compose with custom traversals.

The benchmark you should run yourself

If you are choosing between Cypher and Gremlin for a real project, do not trust a vendor benchmark. Spin up a TinkerGraph in-memory engine and a Neo4j docker container, load a copy of your real graph (or SNAP's Chirpline-2010 dump), write the five queries that matter most to your application in both languages, and run them under your real concurrency. Measure: query latency at p50/p95/p99, lines of code per query, time to write the query (start a stopwatch), and time for a teammate who has not seen the query before to read and explain it. The last metric is the one that wins production arguments — in three years of maintenance, readability dwarfs raw latency.

The LDBC Social Network Benchmark provides a standard workload (the SNB Interactive workload defines 14 queries that mimic a social network's read/write mix) and reference implementations for both Cypher and Gremlin. It is the closest thing to an apples-to-apples comparison and is the dataset most published graph-database papers use.

Picking one in production

Three rules cover most decisions. Pick Cypher when your team writes ad hoc analytical queries by hand, when query readability matters for code review and onboarding, when you have a Neo4j-class engine available, or when you anticipate moving to GQL in the next two years. Pick Gremlin when you need polyglot bindings (Python data team plus Java service team plus JavaScript front end), when queries are generated programmatically inside application code, when you are running on JanusGraph, Neptune, or Cosmos DB, or when the workload is heavy on traversal-style algorithms (shortest path, connected components, BFS) that map naturally onto repeat() and path(). Pick both — knowingly — when you have one team writing analyst dashboards and another building a real-time API on the same engine; modern engines like Neptune and Memgraph let both languages run against the same data without translation overhead.

What you should not do is pick one and ban the other. Both are mature, both are well-supported, both will be around for the next decade, and the cross-translation tooling has improved to the point where openCypher-on-Gremlin and Gremlin-on-Neo4j both work in production at single-digit-millisecond overhead. The cost of letting different teams use different languages on the same graph is small; the cost of forcing a Java service team to write declarative pattern queries because the analyst team chose Cypher is much larger.

The next chapter, why relational graph queries need N self-joins, explains the structural reason graph languages exist at all — what relational databases cannot do efficiently when the data is fundamentally a graph.

References

  1. Neo4j Cypher Manual — the canonical reference for Cypher syntax, semantics, and operational features.
  2. Apache TinkerPop Gremlin Reference — the authoritative Gremlin documentation including all step types and language bindings.
  3. Francis, Green, Guagliardo, Libkin, Lindaaker, Marsault, Plantikow, Rydberg, Selmer, Taylor, Cypher: An Evolving Query Language for Property Graphs (SIGMOD 2018) — the design paper covering Cypher's pattern syntax and its formal semantics.
  4. JanusGraph Gremlin Query Language documentation — practical Gremlin against a distributed property-graph back-end.
  5. openCypher project — the open specification of Cypher and the foundation of ISO GQL 2024.
  6. GQL Standards (ISO/IEC 39075:2024) — the ratified ISO graph query language specification.
  7. LDBC Social Network Benchmark — the standard cross-engine workload for property-graph databases; reference implementations exist in both Cypher and Gremlin.
  8. Property graphs vs RDF triples — the data-model fork that comes one level above the query-language fork covered here.
  9. Native adjacency storage and index-free adjacency — the storage trick that makes a Cypher pattern or a Gremlin traversal cheap regardless of graph size.