High-Level Architecture & Conventions
This section describes the architecture of rust-analyzer.
Official site: Link.
-
rust-analyzerinput/output:- Input (Ground state): Source code data from the client. Everything is kept in memory.
- Mapping from file paths to their contents.
- Project structure metadata represented as a crate graph (crate roots,
cfgflags, crate dependencies)
- Output (Derived state): “Structure semantic model” of the code.
- A representation of the project that is fully resolved - type-wise and reference-wise.
- Input (Ground state): Source code data from the client. Everything is kept in memory.
-
Optimizations:
- Incremental:
- Input can be a delta of changes.
- Output can be a fresh code model.
- Lazy: The output is computed on-demand.
- Incremental:
Parser - parser Crate
- A hand-written recursive descent tree-agnostic parser.
- Output: A sequence of events like “start node” and “finish node”, based on kotlin’s parser, which can be used to learn about dealing with syntax errors and incomplete input.
- Some traits (
TreeSinkandTokenSource) are used to bridge the tree-agnostic parser withrowantrees.
Architecture Invariant - Tree-Agnostic Parser
- The parser functions as a pure transformer, converting one flat stream of events into another.
- Dual independence: The parser is not locked into:
- A specific tree structure (output format).
- A specific token representation (input format).
- Benefits:
- Token independence allows using the same logic to parse:
- Standard source code (text -> tokens).
- Macro expansion (token trees -> tokens).
- Synthetic code generated programmatically.
- Tree independence allows easily varying the syntax tree implementation + light-parsing.
- Avoid allocation of tree nodes.
- For tasks like “find all function names in this 10k line file,” the parser can simply emit “DefineName” events. A listener catches those names and ignores everything else, finishing the task in a fraction of the time.
- Token independence allows using the same logic to parse:
Architecture Invariant - Infallible Parser
- Parsing never fails.
- Parser returns
(T, Vec<Error>).
Syntax Tree Structure & Parser - syntax Crate
Based on libsyntax-2.0.
rowan: The underlying library used to construct the raw, untyped syntax trees (Green/Red trees).astinternal crate: Provide a type-safe API layer on top of the rawrowantree.ungrammarinternal crate: A grammar description format used to automatically generate thesyntax_kindsandastmodules.
Architecture Invariant - syntax Crate as an API Boundary
- The
syntaxcrate knows nothing aboutsalsaand LSP. It’s an API boundary. - Benefits:
- Allows it to be used for lightweight tooling without needing a full build or semantic analysis.
Architecture Invariant - Syntax Tree as a Value Type
- The syntax tree is self-contained, defined solely by its contents without relying on global context (like interners).
- Pure syntax: Unlike traditional compiler trees, it strictly excludes semantic information (such as type inference data).
- Benefits:
- IDE optimization: Critical for tools like
rust-analyzer, where assists and refactors require frequent tree modifications. - Simplified transformation: Keeping the tree “dumb” (purely structural) allows for easy code manipulation without the complexity of managing semantic state during edits.
- IDE optimization: Critical for tools like
Architecture Invariant - Syntax Tree per File
- A syntax tree is built for a single file.
- Benefits: Enable parallel parsing of all files.
Architectural Invariant - Incomplete Syntax Tree
- Syntax trees are designed to tolerate incomplete or invalid code (common during live editing).
- AST accessor methods return
Optiontypes to safely handle missing data.
Query database - base-db Crate
salsa: A crate used for incremental and on-demand computation.salsaresembles a key-value store.salsacan compute derived values with specified functions.
- Define most input queries.
Architecture Invariant - File-System Agnostic
- Nothing is known about the file system & file paths.
FileId: An opaque type that represents a file.
Analyzer (Macro expansion, Name resolution, Type inference) - hir-xxx Crates
-
hir-expand: Macro expansion. -
hir-def: Name resolution. -
hir_ty: Type inference (Why does this one uses underscore?). -
Define various IRs of the core.
-
hir-xxxis ECS-based (Entity-Component-System):- ECS architecture: Instead of rich objects, compiler entities (like functions or structs) are represented as raw integer IDs (handles), similar to game entities.
- Database-driven: You cannot access data directly from an ID. You must query the central Salsa database (e.g.,
db.function_data(id)), which stores the actual content in “component” arrays.
-
Zero abstraction: The code is intentionally explicit about database access. It avoids helper methods to keep dependency tracking transparent and overhead low.
-
These crates “lower” (translate) Rust syntax into logic predicates, allowing the
chalkengine to solve complex trait bounds and type inference.
Architecture Invariant - Incremental
- The separation of “Identity” (ID) from “Data” in ECS allows
rust-analyzerto update only changed data without breaking references to the ID elsewhere, enabling millisecond-level updates.
High-Level IR - hir Crate
-
An API boundary for consuming
rust-analyzeras a library. -
hiracts as the high-level API boundary, wrapping internal raw IDs (ECS-style) into semantic structs (e.g.,Function) to provide a familiar object-oriented interface for library consumers. -
“Thin handle” Pattern: These structs hold no data (only the ID) and require the
dbto be passed into every method call (e.g.,func.name(db)), effectively bridging the stateless handles with the stateful Salsa database. -
Analogy: Internally, the ECS-style code is like SQL &
hiris like ORM.- Syntax inversion (object-oriented vs functional):
- In pure ECS, logic lives in external systems (e.g.
db.function_visibility(id)). - The
hircrate inverts this to an object-oriented style (func.visibility(db)), making the API discoverable via IDE autocomplete.
- In pure ECS, logic lives in external systems (e.g.
- Encapsulated “joins”:
- Pure ECS requires you to manually query multiple tables to piece together information (e.g., get parent module ID → look up module data → find visibility).
- The
hircrate abstracts these complex multi-step database lookups into single, coherent methods.
- Semantic types:
- ECS deals with efficient storage (raw
u32IDs). hirdeals with high-level meaning, exposing semantic types (like structType) rather than implementation details (like structTypeId).
- ECS deals with efficient storage (raw
- Syntax inversion (object-oriented vs functional):
Architecture Invariant - Inert Data Structure
hirpresents a fully resolved, inert view of the code, abstracting away the dynamic computations occurring in internal crates.- “Inert” here is relative to the
dbobject.
- “Inert” here is relative to the
- Syntax-to-HIR bridge: It manages the complex one-to-many mapping between raw syntax and semantic definitions (via the
Semanticstype). - The “Uber-IDE” pattern: To resolve a specific syntax node to an HIR entity (essential for “Go to Definition”), it employs a recursive strategy used by Roslyn and Kotlin: it resolves the syntax parent to a HIR owner, then queries that owner’s children to re-identify the target node.
IDE - ide-xxx Crates
-
Top-level API boundary: The ultimate entry point for external clients (LSP servers, text editors) to interact with rust-analyzer.
-
ideconsumes the semantic model provided by thehircrate to implement concrete user features like code completion, goto definition, and refactoring. -
ideis protocol-agnostic, designed to be used via LSP, custom protocols (like FlatBuffers), or directly as a library within an editor. -
This crate introduces the concept of change over time:
AnalysisHost: The mutable state container where youapply_change.Analysis: An immutable, transactional snapshot of the state used for querying.
-
Modular Architecture:
ide: The public facade and home for smaller features.ide-db: Shared infrastructure (e.g. reference search).ide-xxx: Isolated crates for major features (completion, diagnostics, assists, SSR).
Architecture Invariant - View Layer
- View/ViewModel layer:
ideacts as the “View” (MVC) or “ViewModel” (MVVM), translating complex compiler data into simple, editor-friendly terms (offsets, text labels) rather than internal definitions or syntax trees.- The API is built with POD types. All inputs and outputs are conceptually serializable (no complex object graphs or HIR types exposed).
- The boundary is explicitly drawn at the “UI” level, following the philosophy popularized by the Language Server Protocol. - It talks in the language of the text editor & not the language of the compiler.
rust-analyzer Crate
- Define the binary for the language server -> The entry point.
- It acts like the network/protocol adapter tha connects the pure logic of the
idecrate to the outside world. -> Functional core, imperative shell.
Architecture Invariant - LSP & JSON Awareness
rust-analyzeris the only place where LSP types and JSON serialization exist.- Lower crates (
ide,hir) remain pure and protocol-agnostic. They are forbidden from derivingSerializeorDeserializefor LSP purposes. rust-analyzermaintains its own set of serializable types. It manually converts theidecrate’s Rust-native data structures (likeTextRange) into LSP’s wire-format structures (likeRangewith line/character) before sending them over the wire.
Architecture Invariant - Protocol-Wise Statelessness
- The server is stateless, in the sense that it doesn’t know about the previous requests.
Utilities - stdx Crate
rust-analyzeravoids small helper crates.stdxis the crate to store all small reusable utilities.
Macro Crates
- Core abstraction (
tt): Macros are defined purely asTokenTree→TokenTreetransforms, isolated from other compiler parts. Thettcrate defines this structure (single tokens or delimited sequences). - Declarative macros (
mbe): Thembecrate implements “Macros By Example” (macro_rules!). It handles parsing, expansion, and the translation between the IDE’s syntax trees and the raw token trees. - Procedural Macros: Proc-macros run in a separate process to isolate the IDE from user code crashes.
- Server (
proc-macro-srv): Load the dynamic libraries (built by Cargo) and executes the macros. - Client (
proc-macro-api): Communicate with the server, sending/receiving Token Trees.
- Server (
Architecture Invariant - Isolation
- Because arbitrary macro code can panic or segfault (crashing the editor),
rust-analyzerexecutes them in a separate process. This allows the main IDE to survive fatal errors and recover gracefully. salsa’s incremental system assumes all functions are pure (deterministic). Since proc-macros can be non-deterministic (e.g., reading external files or random numbers), they violate this core assumption and require special handling to prevent database corruption or infinite invalidation loops.
Virtual File System - vfs-xxx and paths Crates
- Virtual file system (VFS): These crates provide an abstraction layer that generates consistent snapshots of the file system, insulating the compiler from raw, messy OS paths.
- The architecture does not assume a single unified file system. A single
rust-analyzerprocess can serve multiple remote machines simultaneously, meaning the same path string could exist on two different machines and refer to different content. - “Witness” API: To resolve this ambiguity, path APIs generally require a “file system witness” (an existing anchor path) to identify which specific file system context the operation targets.
Interning - intern Crate
- Use
Arc(Atomic Reference Counting) to ensure identical data (like strings or paths) is stored only once in memory. - Optimized for “value types” that are defined by their content (e.g.,
std::vec::Vec), rather than “entities” defined by an ID (e.g.,Function #42). db-independent: Unlikesalsa’s integer IDs,Interned<T>owns its data, allowing access and inspection without needing a reference to the compiler database (db).
- Interning enables instant equality checks by comparing memory pointers instead of scanning content, which is critical for frequently compared items like file paths.
- Interning serves as a lower-level optimization layer for static, immutable data that doesn’t require the full overhead of incremental dependency tracking.
Architectural Policies
Stability Guarantees
rust-analyzeravoids new stability guarantees to move fast.- The internal
ideAPI is explicitly unstable. - Stability is only guaranteed at the LSP level (managed by the protocol) and input level (Rust language/Cargo).
- De-facto stability:
rust-project.jsonbecame stable implicitly by virtue of having users — a lesson to explicitly mark APIs as unstable/opt-in before release.
Code Generation
- The API for syntax trees (
syntax::ast) and manual sections (features, assists, config) are generated automatically. - To simplify builds,
rust-analyzerdoes not use itself for codegen. It usessynand manual string parsing instead.
3. Cancellation (Concurrency)
- The problem: If the user types while the IDE is computing (e.g., highlighting), the result is immediately stale.
- The solution: The salsa database maintains a global revision counter.
- When input changes, the counter is bumped.
- Old threads checking the counter notice the mismatch and panic with a special
Canceledtoken.
- The
ideboundary catches this panic and converts it into aResult<T, Canceled>.
Testing Strategy
- Tests are concentrated on three system boundaries:
- Outer (
rust-analyzercrate): “Heavy” integration tests via LSP/stdio. Validates the protocol but is slow (reads real files). - Middle (
idecrate): The most important layer. TestsAnalysisHost(simulating an editor) against expectations. - Inner (
hircrate): Tests semantic models using rich types and snapshot testing (via theexpectcrate).
- Outer (
Key Testing Invariants
- Data-driven: Tests use string fixtures (representing multiple files) rather than calling API setup functions manually. This allows significant API refactorings.
- No
libstd: Tests do not link tolibstd/libcoreto ensure speed; all necessary code is defined within the test fixture.
Error Handling
- No IO in core: Internal crates (
ide,hir) are pure and never fail (noResult). They return partial data plus errors:(T, Vec<Error>). - Panic resilience: Since bugs are inevitable, every LSP request is wrapped in
catch_unwindso a crash in one feature doesn’t kill the server. - Macros: Uses
always!andnever!macros to handle impossible states gracefully.
Observability
- Profiling: Includes a custom low-overhead hierarchical profiler (
hprof) enabled via env vars (RA_PROFILE).
Serialization
- The trap of ease: While
#[derive(Serialize)]is easy to add, it creates rigid IPC boundaries (backward compatibility contracts) that are extremely difficult to change later. - To strictly preserve internal flexibility, types in core crates like
ideandbase_dbare not serializable by design. - Serialization is forced to the “edge” (the client). External clients must define their own stable schemas (e.g.,
rust-project.json) and manually convert them into internal structures, isolating the core compiler from protocol versioning issues.