Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Research Plan

This page outlines the research plan for dboxide. The goal is to investigate and understand the following concepts and techniques, which are crucial for building a modern and efficient parser and language server.

Research Topics

  • Lexing & Token representation:
    • Ideal representation of a syntax token (source offset/pointer, kind, processed value).
    • Handling of non-ASCII characters (UTF-8).
    • Handling of trivial and error tokens in a lossless syntax tree.
    • Handling of multi-word and unreserved keywords.
    • Handling of ambiguous tokens (e.g., < as a less than sign or a generic bracket).
    • Efficient storage and computation of token positions for incremental parsing.
    • On-demand lexing vs lexing all at once.
  • String interning:
    • What is string interning and why is it useful in a compiler?
    • Where and how is it typically applied in a compilation pipeline (e.g., during lexing, parsing, or semantic analysis)?
    • How can it be implemented efficiently?
    • What are the trade-offs of using string interning?
  • Parsing & Syntax tree:
    • Good representation for a lossless syntax tree.
    • Storing error nodes and partial nodes in the lossless syntax tree.
    • Resilience parsing and error recovery techniques.
    • Designing good error messages.
  • Incremental parsing & Red-green trees:
    • What are red-green trees and how do they work?
    • How can they be used to implement incremental parsing?
    • What are the performance implications of using red-green trees?
  • Program analysis & Query-based compiler:
    • Utilizing a query-based architecture (like salsa) for program analysis, lexing, and parsing.
    • Module system features, including module resolution.
    • Name resolution and related problems.
    • Resources:
  • Other topics:
    • Language server implementation
    • Parallelism in compilation
    • SQL dialect awareness