Property graphs vs RDF

In short

Graph databases are not one product category — they are two. Property graphs (Neo4j, JanusGraph, TigerGraph, Memgraph, Amazon Neptune in property-graph mode) model the world as nodes with labels (:Person, :Movie) and key-value properties ({name: "Riya", age: 27}), connected by edges with types (:KNOWS, :ACTED_IN), direction, and their own properties ({since: 2019}). The query language is Cypher (Neo4j's, now standardised as GQL) or Gremlin (Apache TinkerPop's traversal language). The mental model is "an object database that knows about relationships" — pragmatic, app-focused, no ontology required, schema-on-read. RDF (Resource Description Framework) graph databases — Apache Jena, Stardog, Virtuoso, GraphDB, Blazegraph, Amazon Neptune in RDF mode — model the world as a flat sea of triples (subject, predicate, object), where each subject and predicate is a URI like <http://example.org/riya> and the object is a URI or a literal. Everything is a triple — including type assertions (riya rdf:type Person) and properties (riya foaf:name "Riya"). The query language is SPARQL, a W3C standard. The mental model is "a fact store with formal semantics" — rigorous, ontology-driven (OWL, RDFS), federation-friendly, designed for the semantic web. The two communities barely talk to each other. Pick property graphs when you are building a product feature — recommendations on Flipkart, fraud detection on PhonePe, "people you may know" on a social app, a Neo4j-backed knowledge graph for an enterprise search box; the developer experience is faster, the queries are more readable, and you do not need formal semantics. Pick RDF when your data crosses organisational boundaries (DBpedia, Wikidata, government open data, biomedical ontologies like SNOMED CT and Gene Ontology), when you need to merge data from many sources whose schemas you do not control, when reasoners and inference matter, or when you must publish data the rest of the world can query. Wikidata holds 1.5 billion RDF triples and powers Wikipedia infoboxes; LinkedIn's Economic Graph and Meta's social graph are property graphs; Stardog and Anzo sell RDF-plus-reasoning into pharma and finance. Same primitive shapes — nodes and edges — radically different developer experience and ecosystems.

You have spent nineteen Builds learning databases that store rows, documents, columns, log records, and vectors. The last category in this curriculum stores something different: relationships as first-class citizens. A graph database is a database whose primary data structure is a graph — vertices and edges — and whose query language is built around traversing edges, not joining tables. The reason this matters is mechanical. To answer "find all friends-of-friends-of-friends of user 42 who live in Bengaluru and have purchased shoes in the last month" in a relational database, you write a self-join on the friendships table three times and watch the query planner produce a Cartesian-product-shaped plan; in a graph database, you walk three hops out from user 42 and filter at each hop, with cost proportional to the actual number of friends-of-friends, not the size of the friendship table.

But the moment you decide to use a graph database, you walk into a fork in the road that almost no introductory tutorial mentions clearly. There are two fundamentally different graph data models on the market, with two different histories, two different query languages, two different developer ecosystems, and two different mental models. They are called property graphs and RDF. Choosing between them up front, before you write your first query, is more important than choosing between Neo4j and JanusGraph, or between Apache Jena and Stardog — because once you have committed to one model, the other is effectively unreachable without rewriting your application.

This chapter derives both models from first principles, walks the same Indian e-commerce example through each one, and gives you a decision tree.

The property graph model: nodes and edges that carry data

The property graph model is the simpler of the two to grasp, partly because it matches how most engineers already draw graphs on whiteboards and partly because it borrows naturally from object-oriented programming. The model has exactly four primitive concepts.

Nodes. A node is an entity. Riya is a node. Rahul is a node. The product "Nike Pegasus 41" is a node. Nodes have an internal identifier (Neo4j gives them numeric IDs, but this is an implementation detail you rarely touch).

Labels. A node can carry one or more labels that classify it. Riya is labelled :Person; the Nike Pegasus 41 is labelled :Product. A node can have multiple labels (:Person:Customer:PrimeMember), which makes labels feel like a mix of "type" and "tag". Labels are how you say "give me all the Persons" without scanning every node.

Properties. A node carries a map of key-value pairs — its properties. Riya's properties might be {name: "Riya Sharma", age: 27, city: "Bengaluru", joined: 2024-03-15}. Properties are typed (string, integer, boolean, date, list, point) but the schema is flexible — two :Person nodes are not required to have identical property sets. This is the "NoSQL feel" of property graphs.

Edges (relationships). An edge connects two nodes and itself carries a type and properties. Riya [:KNOWS {since: 2019, on: "Instagram"}] Rahul. The edge has a single type (:KNOWS), is directed (Riya → Rahul, distinct from Rahul → Riya, though the query language can ignore direction when you want), and has its own property map. The fact that edges carry properties is the headline feature — the since of a friendship lives on the friendship itself, not on a separate "friendship metadata" table.

That is the whole model. Four primitives: nodes, labels, properties, edges with types and properties. Everything else — indexes, constraints, schema validation — is operational sugar layered on top.

Two `:Person` nodes and one `:Product` node, connected by typed edges. Notice that everything you might want to know about Riya — her name, age, city — lives inside the node as properties. Everything you might want to know about the friendship — when it started — lives on the edge. Nothing is normalised out to a side table; the graph is the data model.

Why properties on edges matter so much: in a relational world, "Riya KNOWS Rahul since 2019" requires a friendships table with (person_a_id, person_b_id, since_date) columns, and every query about the friendship has to join it. In RDF (which we will see next), it requires either reification or a named graph trick — both verbose. In a property graph, it is one edge with one property. This single design decision is why teams that have done graph modelling in both worlds tend to find property graphs faster to iterate on.

The query language is Cypher, originally invented at Neo4j and now standardised as ISO/IEC GQL (Graph Query Language, ratified 2024). Cypher's syntax draws ASCII pictures of the patterns you want to match. To find friends-of-friends of Riya:

MATCH (riya:Person {name: "Riya"})-[:KNOWS]->(friend)-[:KNOWS]->(fof)
WHERE fof.city = "Bengaluru"
RETURN fof.name, fof.age

Read it left to right: start at a :Person node named Riya, follow a :KNOWS edge to a friend, follow another :KNOWS edge to a friend-of-friend, filter to those in Bengaluru, return the name and age. The pattern in the MATCH clause looks like the picture you would draw on a whiteboard, and that resemblance is the entire reason Cypher took off — Cypher is the first graph query language that beginners can read aloud.

The other major property-graph query language is Gremlin, part of the Apache TinkerPop project and supported by JanusGraph, Amazon Neptune, OrientDB, and others. Gremlin is a traversal language — you write g.V().has('name','Riya').out('KNOWS').out('KNOWS').has('city','Bengaluru').values('name','age') — which is more imperative and chains better in code, but harder to read for declarative pattern queries. Cypher and Gremlin coexist; chapter 161 walks both in detail.

The RDF model: everything is a triple

RDF (Resource Description Framework) was born in a different world. While the property graph model evolved organically from "let's add types to graph theory and make it useful for apps", RDF was designed top-down by the W3C between 1999 and 2004 as the foundation of the semantic web — Tim Berners-Lee's vision of a web where machines could read and reason about data the way humans read HTML. The design priorities were therefore different: maximum interoperability across organisations, formal logical semantics, and the ability to compose facts from unrelated sources.

The RDF model has exactly one primitive: the triple. A triple is a three-part statement of the form (subject, predicate, object). Read it like a sentence: subject predicate object. "Riya is a Person." "Riya knows Rahul." "Riya has age 27."

<http://example.org/riya>  <rdf:type>            <http://example.org/Person>
<http://example.org/riya>  <foaf:name>           "Riya Sharma"
<http://example.org/riya>  <foaf:age>            27
<http://example.org/riya>  <http://example.org/knows>  <http://example.org/rahul>
<http://example.org/rahul> <rdf:type>            <http://example.org/Person>
<http://example.org/rahul> <foaf:name>           "Rahul Verma"

Six triples encode what we earlier expressed as two property-graph nodes plus one edge. Everything is uniform. There are no nodes versus edges; there are no labels versus properties. Just triples. The "type" of a resource (rdf:type Person) is a triple. The "name" of a resource (foaf:name "Riya Sharma") is a triple. The "knows" relationship is a triple. The atomic data unit is the triple, full stop.

URIs identify resources. Every subject and predicate is a URI — a globally unique identifier in the same namespace as URLs on the web. <http://example.org/riya> is not necessarily a clickable web page; the URI is just a name with a guaranteed-unique structure. The fact that two organisations can independently mint URIs in their own namespaces (<http://flipkart.com/data/user/12345>, <http://wikidata.org/entity/Q42>) without colliding is the foundation of RDF's federation story. Merge two RDF datasets and triples about the same URI line up automatically.

Objects can be URIs or literals. The object of a triple is either another resource (a URI) or a literal value (a string, integer, date, with optional language tag and datatype). (riya, knows, rahul) has a URI object — it is a relationship. (riya, name, "Riya Sharma"@en) has a literal object — it is a property in the property-graph sense.

Every fact about Riya — that she is a Person, that her name is "Riya Sharma", that her age is 27, that she knows Rahul — is the same kind of object: a triple. There is no privileged "node" or "edge" concept. Composition is by accumulating triples; querying is by pattern-matching over them.

Why uniformity is both RDF's strength and its weakness: because everything is a triple, you can merge two RDF datasets by dumping their triples into the same store and the meaning is preserved — facts about the same URI from different sources line up automatically. This is the federation story. The cost is verbosity: a single conceptual entity ("Riya") explodes into half a dozen rows, and adding a property to a relationship (the since on a friendship) requires either reification — a four-triple workaround that creates a "statement about a statement" — or a named graph, both of which add cognitive overhead. In a property graph that property is just one key on the edge.

The query language for RDF is SPARQL (SPARQL Protocol and RDF Query Language, W3C standard, version 1.1 ratified 2013, 1.2 in late stages). SPARQL's syntax is also pattern-matching, but the patterns are written as triples:

PREFIX ex:   <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?fofName ?fofAge WHERE {
  ?riya foaf:name "Riya Sharma" .
  ?riya ex:knows ?friend .
  ?friend ex:knows ?fof .
  ?fof foaf:name ?fofName .
  ?fof foaf:age ?fofAge .
  ?fof ex:city "Bengaluru" .
}

Each line in the WHERE block is a triple pattern with variables (the ? prefix) where you want SPARQL to find bindings. The query asks: find a ?riya whose name is "Riya Sharma", find someone she knows, find someone they know, and return that person's name and age provided they live in Bengaluru. Same query as the Cypher example, expressed via triple patterns instead of an ASCII picture.

Beyond SELECT queries, SPARQL also supports CONSTRUCT (build new triples from query results — handy for inference), ASK (boolean queries), DESCRIBE (return all triples about a resource), and SPARQL Update for INSERT/DELETE.

Vocabularies, ontologies, and inference: the layer above RDF

RDF rarely travels alone. The semantic-web stack adds two layers on top of the bare triple model.

RDFS (RDF Schema) lets you declare classes and subclasses, properties and subproperties, domains and ranges. "ex:Customer rdfs:subClassOf ex:Person" is itself a triple — the schema lives in the same triple store as the data. A reasoner that sees (riya rdf:type Customer) and the subclass triple can infer (riya rdf:type Person) automatically without that triple ever being stored.

OWL (Web Ontology Language) goes further. You can declare that a property is symmetric (ex:friendOf — if Riya is a friend of Rahul, Rahul is a friend of Riya), transitive (ex:ancestorOf), inverse (ex:parentOf is the inverse of ex:childOf), or functional (only one value allowed per subject). You can express disjoint classes, equivalence, cardinality constraints, and complex class definitions. An OWL reasoner uses these axioms to derive new triples from existing ones — inference becomes a first-class capability.

This is what people mean when they say "RDF has formal semantics". The data model is grounded in description logic; reasoners like Pellet, HermiT, and the one shipped inside Stardog can answer queries that involve derived facts, not just stored facts. Property graphs have nothing equivalent at the data-model level — you can implement inference in application code or with stored procedures, but it is not a native feature of Cypher or Gremlin.

For most application teams, formal semantics are over-engineering. For pharma researchers integrating SNOMED CT (the standard medical terminology, 350,000 concepts) with the Gene Ontology (47,000 terms) with the Disease Ontology and ChEBI (chemicals), formal semantics are the only way the integration can work without writing custom reconciliation code for every pair.

A side-by-side comparison: Riya knows Rahul

The single example below shows the same fact — Riya knows Rahul, with both being people who joined in 2024 — expressed in both models. This is the most useful diagram in the chapter; print it.

The property graph encodes the relationship in two nodes and one edge with seven properties tucked inside. RDF encodes the same data in seven triples — one per fact — and adding the `since: 2019` property to the relationship requires reification (five extra triples introducing a blank node `_:s` that stands for the statement "Riya knows Rahul" and lets you attach properties to it). The RDF version is more uniform but more verbose; the property-graph version is more pragmatic but less amenable to formal reasoning.

The two columns above explain why most application teams find property graphs faster to ship with. To add metadata to a relationship in Neo4j you write (riya)-[:KNOWS {since: 2019, on: "Instagram"}]->(rahul) and you are done. To do the same in RDF you either reify the statement (verbose but standard) or you use RDF-star — an extension supported by Apache Jena, GraphDB, and others that lets you write <<ex:riya ex:knows ex:rahul>> ex:since 2019 and treats the embedded triple as a first-class subject. RDF-star solves the verbosity problem and is the modern answer, but it is an extension layered on top of the original model rather than the core.

Worked: an Indian e-commerce recommendation graph

Build a tiny recommendation graph for a Flipkart-style store. The data: User user42 (Riya in Bengaluru) bought Product prod7 (Nike Pegasus 41) on 2026-03-12; Product prod7 is in Category cat3 (Running Shoes); Category cat3 is a subcategory of cat1 (Footwear). Five facts. Express the same data in both graph models, then write a recommendation query in both.

Property graph (Cypher). Loading the data:

CREATE (u:User {id: "user42", name: "Riya", city: "Bengaluru"})
CREATE (p:Product {sku: "prod7", name: "Pegasus 41", price: 12995})
CREATE (c3:Category {id: "cat3", name: "Running Shoes"})
CREATE (c1:Category {id: "cat1", name: "Footwear"})
CREATE (u)-[:BOUGHT {on: date("2026-03-12")}]->(p)
CREATE (p)-[:IN_CATEGORY]->(c3)
CREATE (c3)-[:SUBCATEGORY_OF]->(c1)

Recommendation query: "find products in the same top-level category that Riya has not bought yet":

MATCH (riya:User {id: "user42"})-[:BOUGHT]->(bought:Product)
      -[:IN_CATEGORY]->()-[:SUBCATEGORY_OF*0..]->(top:Category)
      <-[:SUBCATEGORY_OF*0..]-()<-[:IN_CATEGORY]-(rec:Product)
WHERE NOT (riya)-[:BOUGHT]->(rec)
RETURN DISTINCT rec.name, rec.price
ORDER BY rec.price
LIMIT 10

The *0.. is the variable-length path operator — match zero or more SUBCATEGORY_OF edges, so the query catches both products in the same leaf category (Running Shoes) and products in sibling categories under Footwear (e.g. Casual Shoes). The pattern reads like the picture you would draw on a whiteboard.

RDF (SPARQL). Loading the same data, in Turtle syntax:

@prefix ex:   <http://flipkart.example.org/> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:user42  rdf:type ex:User ;       ex:name "Riya" ; ex:city "Bengaluru" .
ex:prod7   rdf:type ex:Product ;    ex:name "Pegasus 41" ; ex:price 12995 .
ex:cat3    rdf:type ex:Category ;   ex:name "Running Shoes" .
ex:cat1    rdf:type ex:Category ;   ex:name "Footwear" .
ex:user42  ex:bought ex:prod7 .
ex:prod7   ex:inCategory ex:cat3 .
ex:cat3    ex:subcategoryOf ex:cat1 .

Recommendation query in SPARQL using property paths (the + and * operators on predicates):

PREFIX ex: <http://flipkart.example.org/>

SELECT DISTINCT ?recName ?recPrice WHERE {
  ex:user42 ex:bought ?bought .
  ?bought   ex:inCategory ?leaf .
  ?leaf     ex:subcategoryOf* ?top .
  ?other    ex:subcategoryOf* ?top .
  ?rec      ex:inCategory ?other .
  ?rec      ex:name ?recName ;
            ex:price ?recPrice .
  FILTER NOT EXISTS { ex:user42 ex:bought ?rec . }
}
ORDER BY ?recPrice
LIMIT 10

Both queries return the same recommendations. Both walk the same graph topology. The Cypher version uses ASCII-art patterns; the SPARQL version uses triple patterns with variables. Cypher is roughly half the line count and noticeably easier for someone who has not used either before. SPARQL is more uniform — every clause is a triple pattern, no special node-vs-edge syntax — and integrates cleanly if you also want to merge in product data from an external RDF source like a manufacturer's catalog or a public taxonomy.

The takeaway: same underlying graph, same traversal logic, same result, two genuinely different developer experiences. For a recommendation engine inside one company's product, the Cypher version is what most teams ship. For a query that has to join Flipkart's data with a public RDF taxonomy of footwear categories published by an industry body, the SPARQL version is the natural fit.

When to choose which

The decision is not about which model is "better" — both are mature, both have scaled to billion-edge production deployments, both have active vendor ecosystems. The decision is about which is better fitted to your problem. Three questions sort it.

Question 1: are you building a product feature or integrating a knowledge base? Product features — recommendations, fraud detection, social graphs, network analysis, master data management inside one company — almost always benefit from property graphs. Faster developer onboarding, less verbose data model, more readable queries, less ceremony. Knowledge bases that integrate data from many sources — biomedical, government, scholarly, multi-organisation enterprise — benefit from RDF. URIs and ontologies are the price you pay for plug-and-play federation.

Question 2: do you need formal reasoning? If your application requires deriving new facts from declared rules — "an employee of a subsidiary of a parent company is also an employee of the parent for compliance purposes", "a substance that is a sub-class of a class with property X also has property X" — RDF with OWL gives you this declaratively, with off-the-shelf reasoners. Property graphs require you to write the inference rules in application code or stored procedures.

Question 3: who else needs to query your data? If only your application queries the graph, the choice is internal — pick whichever your team finds productive. If you publish data for the world to query (Wikidata, DBpedia, open government data, public scientific datasets), RDF is the lingua franca; consumers expect SPARQL endpoints, JSON-LD serialisation, and standard vocabularies (FOAF for people, Schema.org for things, Dublin Core for documents).

The market reflects this split. Neo4j is the largest property-graph vendor, used heavily for enterprise knowledge graphs (NASA, eBay, Cisco), fraud detection (Italian financial regulator UIF), social and recommendation systems. JanusGraph (Apache Foundation, originally Titan) and TigerGraph target large-scale property-graph deployments. Memgraph is the in-memory Cypher-compatible competitor. Amazon Neptune supports both models in one service. Apache Jena is the open-source RDF stack of choice — Java, with the Fuseki HTTP server and the ARQ SPARQL engine. Stardog is the commercial RDF leader, used heavily in pharma and finance for knowledge graphs with reasoning. Virtuoso powers DBpedia and many large LOD (Linked Open Data) deployments. GraphDB (by Ontotext) is widely used in publishing and enterprise. Wikidata itself runs on a custom Blazegraph deployment with 1.5 billion triples and a public SPARQL endpoint that anyone can query.

The two communities have started to converge in recent years. RDF-star and SPARQL-star bring property-on-edge expressiveness to RDF without reification. The new ISO GQL standard borrows ideas from SPARQL while keeping the Cypher syntax. Multi-model vector-plus-graph databases like Weaviate (chapter 157) blur the lines further. But for the next decade, the basic split remains: pick property graphs for app development, pick RDF for data integration.

The next chapter, native adjacency storage and index-free adjacency, goes one layer down — into the storage representation that makes graph traversals fast regardless of which data model sits on top.

References

Neo4j Cypher Manual — the canonical reference for property-graph query syntax and semantics.
W3C RDF 1.1 Concepts and Abstract Syntax — the formal definition of the RDF data model.
SPARQL 1.1 Query Language (W3C Recommendation) — the standard for querying RDF.
Apache Jena documentation — the open-source Java RDF/SPARQL toolkit.
Robinson, Webber, Eifrem, Graph Databases (2nd ed., O'Reilly, 2015) — the property-graph standard reference.
Angles, Arenas, Barceló, et al., Foundations of Modern Graph Query Languages (ACM Computing Surveys, 2017) — formal comparison of property-graph and RDF query languages.