Graph Database
- An online database with CRUD operations that exposes a graph data model.
- Optimized for transactional (OLTP) workloads - transactional integrity and operational availability.
Relationships are first-class citizens, unlike relational databases (foreign keys) or other NoSQL stores (map-reduce).
This allows building models that map closely to the problem domain.
Two Dimensions
Graph databases vary along two axes: The underlying storage (native graph storage & non-native graph storage) & The procesisng engine (native graph processing & non-native graph processing).
quadrantChart
title The Graph Database Space
x-axis "Non-native storage" --> "Native storage"
y-axis "Non-native processing" --> "Native processing"
Neo4j: [0.9, 0.9]
OrientDB: [0.8, 0.95]
Titan: [0.4, 0.85]
InfiniteGraph: [0.25, 0.8]
FlockDB: [0.15, 0.2]
AllegroGraph: [0.8, 0.2]
Legend:
- Storage = how data is persisted on disk.
- Processing = how the engine traverses/queries data at runtime.
| Native Processing | Non-native Processing | |
|---|---|---|
| Native Storage | Neo4j, OrientDB | AllegroGraph |
| Non-native Storage | Titan, InfiniteGraph | FlockDB |
These are engineering trade-offs, not inherently good or bad. Native/native is fastest for graph workloads. Non-native/non-native is easiest to operate.
Native Graph Storage
Custom storage format designed for graphs - nodes, relationships, and properties stored in dedicated structures. E.g. Neo4j uses fixed-size record files for nodes and relationships.
- Pro: Engineered for graph performance and scalability.
- Con: Less battle-tested than mature backends.
Non-native Graph Storage
Graph data serialized into an existing backend (e.g. MySQL, Cassandra). The graph is an abstraction layer on top.
- Pro: Ops teams already know the backend.
- Con: Overhead from translating between graph and non-graph representations.
Native Graph Processing (Index-Free Adjacency)
Each node physically stores a pointer to its neighbors. Traversing an edge = following a pointer.
- Pro: O(1) per hop, regardless of total graph size.
- Con: Non-traversal queries can be difficult or memory-intensive.
Non-native Graph Processing
To find a node’s neighbors, the engine does an index lookup (e.g. B-tree scan).
- Pro: Index-based lookups are well understood and flexible.
- Con: O(log n) per hop, degrades as the graph grows.