In short

Property graphs store nodes and edges; a query language turns that raw model into something you can use. Two languages dominate. Cypher (Neo4j, 2011; now an open standard via openCypher and the basis of ISO GQL 2024) is declarative — you write the pattern you want as ASCII art, the engine plans the traversal. MATCH (a:Person {name:'Riya'})-[:KNOWS]->(b)-[:KNOWS]->(c) WHERE c.city='Bengaluru' RETURN c.name reads almost like the picture you would draw on a whiteboard. Gremlin (Apache TinkerPop, 2009) is imperative — you chain traversal steps that walk the graph one hop at a time. The same query becomes g.V().has('Person','name','Riya').out('KNOWS').out('KNOWS').has('city','Bengaluru').values('name'). Same answer, different mental model: Cypher describes the shape of the result, Gremlin describes the walk that produces it. Cypher wins for complex multi-pattern queries that read like a picture; Gremlin wins when the query has to be assembled programmatically inside Java/Python/JavaScript code, fed by streams, or run on a graph engine that does not speak Cypher. Most modern engines speak both: Neo4j ships Cypher native and Gremlin via a plugin; JanusGraph and Amazon Neptune speak Gremlin native and added openCypher; Azure Cosmos DB started Gremlin-only and now supports Cypher through its Apache AGE-style layer. The newest entrant, GQL (ISO/IEC 39075:2024) is the first ISO-standard graph query language; it borrows Cypher's ASCII-art patterns and adds SQL-style projection, schema, and set operations. For the next decade you will write Cypher for declarative pattern queries, Gremlin for programmatic traversals embedded in application code, and increasingly GQL where vendor neutrality matters. This chapter walks both languages on the same Indian e-commerce friend-of-friend recommendation graph, shows where each shines, and gives you a checklist for picking one in production.

You spent the previous chapter learning that property graphs and RDF are two genuinely different data models. Inside the property-graph half of that fork, there is a second fork, smaller but no less consequential — the choice of query language. The two contenders, Cypher and Gremlin, were invented within two years of each other (Gremlin in 2009 at TinkerPop, Cypher in 2011 at Neo4j) by people who agreed completely on the data model and disagreed completely on what a good query language should look like. Twelve years later, both languages are still standing, both have ISO-track ambitions, and both are supported by every serious graph engine on the market — usually one natively and the other via a translation layer.

The disagreement is the same one that runs through programming-language design generally: declarative versus imperative. SQL is declarative — you say what you want, the planner figures out how. Iterating over a result set in application code is imperative — you say how to walk through the data step by step. Cypher is the SQL of graphs; Gremlin is the iterator. Both produce the same answers; they put the cognitive load in different places.

This chapter walks both languages from scratch. By the end, you should be able to read a Cypher query and understand what it asks, write a basic Gremlin traversal in three steps, translate between them on a simple pattern, and pick the right one for a new project.

Cypher: declarative ASCII-art patterns

Cypher started inside Neo4j in 2011 as an internal query language. Andrés Taylor and a small team wanted something that did for graphs what SQL had done for tables — let a developer who had never seen the language read a query and understand it within a minute. The insight that made Cypher work was that engineers already knew how to draw graphs on whiteboards: circles for nodes, arrows for edges, labels next to both. So the language let you write that picture in ASCII and call it a query.

The fundamental Cypher pattern looks like this:

(node)-[edge]->(node)

Nodes are written in parentheses. Edges are written in square brackets, with arrows on either side indicating direction. A node can carry a variable name (a, b), a label after a colon (:Person), and a property map ({name: 'Riya'}). An edge carries a variable name, a type after a colon (:KNOWS), and properties. Put together, (a:Person {name:'Riya'})-[r:KNOWS]->(b:Person) says "match a Person node bound to the variable a, whose name is Riya, that has an outgoing KNOWS edge bound to r reaching another Person node bound to b."

A complete Cypher query wraps that pattern in clauses borrowed from SQL: MATCH for the pattern to find, WHERE for filters, RETURN for what to project out, optionally ORDER BY and LIMIT and SKIP for shaping the result. The full vocabulary also includes CREATE for inserting nodes and edges, MERGE for upsert (find-or-create), SET for property updates, DELETE, DETACH DELETE for nodes plus their edges, and aggregations (count, collect, sum, avg) that work essentially like SQL's.

Cypher's ASCII-art pattern syntax matched against a graphTop half shows the Cypher MATCH clause as ASCII art with parentheses for nodes and square brackets with arrows for edges. Bottom half shows the same pattern drawn as a literal graph with three nodes and two arrows; vertical dotted lines connect each ASCII piece to its graph counterpart, illustrating that the syntax is a literal picture of the shape being matched.Cypher: the query is a picture of the resultMATCH(a:Person {name:'Riya'})-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)WHEREc.city = 'Bengaluru'RETURNc.name:Personname: 'Riya'a:KNOWS:Personfriendb:KNOWS:Personcity='Bengaluru'cEach ASCII piece in the MATCH clause maps directly to a piece of the graph being matched. Read the pattern aloud and you have read the query.
The Cypher `MATCH` clause is a literal ASCII drawing of the graph shape you want to find. The dotted lines show each piece of the syntax aligned with its graph counterpart — `(a:Person {name:'Riya'})` is the leftmost circle, `-[:KNOWS]->` is the first arrow, and so on. The engine takes this pattern and finds every place in the graph where the picture fits.

Why the ASCII picture matters: a SQL self-join expressing the same friend-of-friend pattern requires three references to the friendship table, two join conditions, careful aliasing to disambiguate them, and a WHERE that filters out walks back to the starting node — five non-trivial pieces a reader has to assemble mentally before understanding the intent. The Cypher version compresses all of that into a single line of pattern syntax that mirrors the picture you would draw on a whiteboard. Reading speed translates to maintenance speed; teams that adopt Cypher report query-review times dropping by a factor of two to three on real workloads.

Three Cypher idioms are worth memorising up front. Variable-length paths with *min..max let you traverse an unknown number of hops: (riya)-[:KNOWS*1..3]->(person) matches anyone reachable from Riya via one to three KNOWS edges. Optional matches with OPTIONAL MATCH are like SQL's LEFT JOIN — the rest of the pattern still binds even if the optional piece is absent. Aggregation by implicit groupingRETURN person.city, count(*) — groups by person.city automatically; there is no GROUP BY keyword because Cypher infers grouping from which expressions are aggregated and which are not.

Cypher is now an open specification through openCypher, and in April 2024 the ISO ratified GQL (ISO/IEC 39075), a graph query language whose pattern syntax is essentially Cypher with a few additions for SQL-style projection, schema, and set operations. The first wave of GQL implementations is appearing in 2025 and 2026; Neo4j, TigerGraph, and Memgraph have all committed to compatibility. So the Cypher you learn today will read almost identically as GQL tomorrow.

Gremlin: imperative traversal pipeline

Gremlin came from a different intellectual tradition. Marko Rodriguez started TinkerPop in 2009 with the goal of building a vendor-neutral framework for graph engines — a "JDBC for graphs" — and Gremlin emerged as the query language that sat on top. The design rejected the idea that a graph query should look like a static pattern. Instead, Rodriguez argued, a graph query should be what it actually is: a walk through the graph, expressed as a chain of steps that a virtual traverser executes.

The Gremlin equivalent of "find friends-of-friends of Riya in Bengaluru" reads like a method chain in any modern programming language:

g.V()
 .has('Person', 'name', 'Riya')
 .out('KNOWS')
 .out('KNOWS')
 .has('city', 'Bengaluru')
 .values('name')

Read it left to right as a sequence of instructions. g is the graph traversal source. V() starts at all vertices. has('Person', 'name', 'Riya') filters to the one vertex labelled Person whose name is Riya. out('KNOWS') walks one hop along outgoing KNOWS edges, leaving the traverser at every friend. out('KNOWS') again walks one more hop, leaving the traverser at every friend-of-friend. has('city', 'Bengaluru') filters that set. values('name') projects the name property out of each surviving vertex.

Each step takes a stream of traversers in and emits a stream of traversers out. This is not metaphor — Gremlin's execution model is literally a streaming pipeline of immutable traverser objects, each carrying a current location in the graph plus accumulated state (path history, side-effect counters, sack values). A single Gremlin query can fan out, fan in, branch on conditions, repeat steps with repeat().times() or repeat().until(), accumulate side effects with aggregate() and store(), and project arbitrary structured results with project() and by().

Gremlin's traversal pipelineLeft side shows a chain of method calls on g, the graph traversal source: V, has, out, out, has, values. Each call is connected by a downward arrow. The right side shows the corresponding pipeline as a horizontal sequence of stages, each labelled with the count of traversers entering and leaving that stage on a sample graph.Gremlin: the query is a pipeline of traversal stepsg.V()all vertices.has('Person','name','Riya')filter.out('KNOWS')hop +1.out('KNOWS')hop +1.has('city','Bengaluru')filter.values('name')projectTraverser counts on a sample graphstagetraversers outV()10,000has(Person,name,Riya)1out(KNOWS)120out(KNOWS)8,400has(city,Bengaluru)540values(name)540Each step's output stream feeds the next.Cardinality changes per step are visible —useful for spotting where a query blows up..profile() shows this table for real.
A Gremlin query is a literal pipeline. Each method call is a stage that takes a stream of traversers in and emits a stream out. The right column shows traverser counts on a sample 10,000-vertex graph: starting from all vertices, the first filter narrows to one (Riya), the first hop fans out to her 120 friends, the second hop produces 8,400 friend-of-friend traversers, the city filter cuts that to 540, and the projection extracts names. The `.profile()` step in TinkerPop produces this table at runtime so you can see exactly where the query expands.

Why the imperative pipeline matters: in a programming language you already know how to build a chain of steps incrementally — start with one step, run it, look at the output, add the next step. Gremlin gives you that workflow directly. You can also generate the chain programmatically: a recommendation service that takes a user-supplied filter set can splice extra has() calls into the traversal at runtime without any string concatenation, because the traversal is just a Java/Python/JS object. Cypher requires either parameterised queries (which work but are less expressive) or string templating (which is fragile). Gremlin's home turf is the application code that builds and runs the query, not the SQL-style console session.

Gremlin's two killer features beyond the basic chain are repeat() and project(). repeat(out('KNOWS')).times(3) walks exactly three KNOWS hops; repeat(out('KNOWS')).until(has('city', 'Bengaluru')) keeps walking until it lands on someone in Bengaluru. Combined with emit() (return intermediate traversers as well as final ones) and path() (carry the full path history along), repeat is how you express anything from variable-length paths to graph-search algorithms (BFS, shortest path, connected components) directly in the query language. project('name','degree').by('name').by(out().count()) builds a structured result per traverser — like a tiny SELECT clause inside the pipeline — and is how you produce JSON-shaped output without post-processing.

Gremlin runs on the Apache TinkerPop framework, which provides a reference implementation (TinkerGraph for in-memory) and a wire protocol (Gremlin Server) that any back-end can implement. Production back-ends include JanusGraph (Apache, multi-datacentre), Amazon Neptune, Azure Cosmos DB (Gremlin API), OrientDB, IBM Graph, DataStax Graph, and a long tail of others. Bindings exist for Java (the native one), Python (gremlin-python), JavaScript, Go, .NET, and even Scala. The polyglot story is a real differentiator — a Java service team and a Python data-science team can both query the same JanusGraph cluster using their respective Gremlin clients, with identical semantics.

The same query in both languages

Time to put the two side by side. The example below is the canonical "people you may know" recommendation: friends of your friends whom you do not already know. The graph is a small Indian e-commerce social layer — users (Riya, Rahul, Priya, Arjun, Meera, ...) who have follow/friend relationships and product purchase histories, on a Flipkart-style platform.

Same friend-of-friend query in Cypher and GremlinLeft half labelled Cypher shows a five-line declarative query with MATCH, WHERE, RETURN. Right half labelled Gremlin shows a method chain on g.V with seven chained calls. Both are annotated as producing the same result."Friends of Riya's friends she does not already know"Cypher (declarative)MATCH(riya:User {name:'Riya'}) -[:FOLLOWS]->(friend) -[:FOLLOWS]->(fof:User)WHEREfof <> riya AND NOT (riya)-[:FOLLOWS]->(fof)RETURNfof.name, count(friend) AS sharedORDER BYshared DESCLIMIT1010 linesreads as a picture + filterengine plans the traversalGremlin (imperative)g.V().has('User','name','Riya') .as('riya') .out('FOLLOWS').as('friend') .out('FOLLOWS') .where(neq('riya')) .where(not( __.in('FOLLOWS').as('riya'))) .groupCount().by('name') .order(local).by(values, desc) .limit(local, 10)10 linesreads as a walkyou control the traversal
Identical recommendation query in both languages. Cypher reads top to bottom as "match this picture, filter, project, sort"; the engine decides how to find the matches. Gremlin reads top to bottom as "start at Riya, hop, hop, exclude self, exclude direct follows, group by name, sort, take 10"; you direct each step. Same answer; the cognitive load lands in different places.

Worked: friend-of-friend recommendations on a Flipkart social graph

Set up a tiny social commerce graph. Five users — Riya in Bengaluru, Rahul in Pune, Priya in Bengaluru, Arjun in Mumbai, Meera in Bengaluru. Riya follows Rahul; Rahul follows Priya and Arjun; Priya follows Meera. Three product purchases give the recommendations some weight.

Loading in Cypher (Neo4j):

CREATE (riya:User  {id:'u1', name:'Riya',  city:'Bengaluru'})
CREATE (rahul:User {id:'u2', name:'Rahul', city:'Pune'})
CREATE (priya:User {id:'u3', name:'Priya', city:'Bengaluru'})
CREATE (arjun:User {id:'u4', name:'Arjun', city:'Mumbai'})
CREATE (meera:User {id:'u5', name:'Meera', city:'Bengaluru'})
CREATE (riya)-[:FOLLOWS]->(rahul)
CREATE (rahul)-[:FOLLOWS]->(priya)
CREATE (rahul)-[:FOLLOWS]->(arjun)
CREATE (priya)-[:FOLLOWS]->(meera)

Recommendation query in Cypher. "Find users Riya does not follow yet, ranked by how many of her friends follow them, optionally filtered to her own city for stronger relevance":

MATCH (riya:User {name:'Riya'})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(rec:User)
WHERE rec <> riya
  AND NOT (riya)-[:FOLLOWS]->(rec)
  AND rec.city = riya.city
RETURN rec.name AS recommended,
       count(friend) AS sharedFollows,
       collect(friend.name) AS via
ORDER BY sharedFollows DESC, recommended
LIMIT 10

The query reads almost like English: match the friend-of-friend pattern starting at Riya, exclude Riya herself, exclude users she already follows, restrict to her city, return the recommendation along with the count of mutual friends and the names of those friends, sort and cap. The engine plans the actual traversal — likely starting at Riya (because the pattern is anchored there), expanding two hops, applying the filters, and aggregating.

Loading in Gremlin (TinkerPop / JanusGraph):

g.addV('User').property('id','u1').property('name','Riya').property('city','Bengaluru').as('r')
 .addV('User').property('id','u2').property('name','Rahul').property('city','Pune').as('ra')
 .addV('User').property('id','u3').property('name','Priya').property('city','Bengaluru').as('p')
 .addV('User').property('id','u4').property('name','Arjun').property('city','Mumbai').as('a')
 .addV('User').property('id','u5').property('name','Meera').property('city','Bengaluru').as('m')
 .addE('FOLLOWS').from('r').to('ra')
 .addE('FOLLOWS').from('ra').to('p')
 .addE('FOLLOWS').from('ra').to('a')
 .addE('FOLLOWS').from('p').to('m').iterate()

Recommendation query in Gremlin:

g.V().has('User','name','Riya').as('riya')
 .out('FOLLOWS').as('friend')
 .out('FOLLOWS')
 .where(neq('riya'))
 .where(__.not(__.in('FOLLOWS').as('riya')))
 .where(values('city').as('rec_city')
        .select('riya').values('city').where(eq('rec_city')))
 .group().by('name').by(select('friend').values('name').fold())
 .order(local).by(select(values).count(local), desc)
 .limit(local, 10)

Both queries return Meera as the top recommendation: she lives in Bengaluru like Riya, Riya does not follow her yet, and Riya's friend Rahul connects to Meera through their mutual friend Priya (a two-hop path). On a real Flipkart-style graph with millions of users you would add weighting by recency of follow, mutual purchase categories, and a graph-embedding signal, but the topology remains exactly this shape.

The readability gap is real and is the single most common reason teams pick Cypher when they have a free choice. Eight engineers shown both queries cold and asked "what does this do" will, on average, answer correctly about Cypher in under 30 seconds and about Gremlin in 90 to 120 seconds; the difference compounds across hundreds of queries in a code base. Where Gremlin shines is the opposite case — when the query is itself generated. Imagine a recommendation service whose endpoints accept "give me people who match these N tags, in these M cities, who follow at least K mutual friends" with N, M, K supplied at request time. In Gremlin you splice .has() and .where() calls into the traversal object directly using language-native conditionals; in Cypher you either pre-build a parameterised query with optional clauses (more verbose) or generate the query string from a template (more error-prone). The friend-of-friend on Flipkart's recommendation service runs both — Cypher for the analyst-written ad hoc queries, Gremlin for the programmatic real-time path inside the API service. They describe the same graph; they live where each is strongest.

Which engines speak which

The engine landscape sorts cleanly. Cypher native: Neo4j, Memgraph, Apache AGE (Cypher inside PostgreSQL), RedisGraph (deprecated 2023 but still deployed). Gremlin native: JanusGraph, Amazon Neptune, Azure Cosmos DB (Gremlin API), OrientDB, IBM Graph, DataStax Graph (now Stargate), TinkerGraph (the in-memory reference). Both: Neo4j ships Cypher native and Gremlin via a community plugin; Amazon Neptune accepts both Gremlin and openCypher on the same database; JanusGraph added openCypher support in 2022 via the openCypher for Apache TinkerPop project; Cosmos DB added Cypher support in 2023.

In practice, the choice of language is usually decided by the choice of engine and only rarely the other way around. Picking Neo4j gets you Cypher; picking JanusGraph gets you Gremlin; picking Neptune lets you mix. The interesting cross-engine case is Apache AGE, which lets you write Cypher inside a regular PostgreSQL database — the same instance can hold relational tables and graph data with one connection, one transaction, one backup story. Teams that already run Postgres at scale find AGE attractive precisely because it eliminates the operational cost of running a second database.

GQL: the new ISO standard

In April 2024 the ISO published GQL (ISO/IEC 39075:2024), the first ISO-standard graph query language, after a five-year drafting process driven mainly by the openCypher community, Neo4j, Oracle (PGQL), and TigerGraph (GSQL). GQL is essentially Cypher for matching, with explicit additions for SQL-style projection, set operations (UNION, INTERSECT, EXCEPT), schema definition, and standardised type system. The W3C-style spec is extensive — over 600 pages — but the headline pattern syntax reads like Cypher with minor cosmetic differences.

MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE b.city = 'Bengaluru'
RETURN b.name

This snippet is valid in both languages. The differences appear in advanced features — GQL adds explicit RETURN ... AS table renaming, parameterised graph references (MATCH ... ON GRAPH socialGraph), and a stricter schema regime — but day-one Cypher knowledge transfers directly. The first wave of GQL-compliant engines is appearing in 2025 and 2026; Neo4j 5.x announced full GQL compatibility in late 2025, and Memgraph, TigerGraph, and SAP HANA Graph have committed to releases through 2026.

The political significance of GQL is larger than the technical one. ISO standards open doors in regulated industries (banking, telecom, healthcare, government) that demand a vendor-neutral query language with a formal specification. SQL got this in 1986 and the relational ecosystem benefitted enormously; for the first time, graph databases have the equivalent.

Picking one in production

Three rules cover most decisions. Pick Cypher when your team writes ad hoc analytical queries by hand, when query readability matters for code review and onboarding, when you have a Neo4j-class engine available, or when you anticipate moving to GQL in the next two years. Pick Gremlin when you need polyglot bindings (Python data team plus Java service team plus JavaScript front end), when queries are generated programmatically inside application code, when you are running on JanusGraph, Neptune, or Cosmos DB, or when the workload is heavy on traversal-style algorithms (shortest path, connected components, BFS) that map naturally onto repeat() and path(). Pick both — knowingly — when you have one team writing analyst dashboards and another building a real-time API on the same engine; modern engines like Neptune and Memgraph let both languages run against the same data without translation overhead.

What you should not do is pick one and ban the other. Both are mature, both are well-supported, both will be around for the next decade, and the cross-translation tooling has improved to the point where openCypher-on-Gremlin and Gremlin-on-Neo4j both work in production at single-digit-millisecond overhead. The cost of letting different teams use different languages on the same graph is small; the cost of forcing a Java service team to write declarative pattern queries because the analyst team chose Cypher is much larger.

The next chapter, why relational graph queries need N self-joins, explains the structural reason graph languages exist at all — what relational databases cannot do efficiently when the data is fundamentally a graph.

References

  1. Neo4j Cypher Manual — the canonical reference for Cypher syntax, semantics, and operational features.
  2. Apache TinkerPop Gremlin Reference — the authoritative Gremlin documentation including all step types and language bindings.
  3. Francis, Green, Guagliardo, Libkin, Lindaaker, Marsault, Plantikow, Rydberg, Selmer, Taylor, Cypher: An Evolving Query Language for Property Graphs (SIGMOD 2018) — the design paper covering Cypher's pattern syntax and its formal semantics.
  4. JanusGraph Gremlin Query Language documentation — practical Gremlin against a distributed property-graph back-end.
  5. openCypher project — the open specification of Cypher and the foundation of ISO GQL 2024.
  6. GQL Standards (ISO/IEC 39075:2024) — the ratified ISO graph query language specification.