Skip to content
Joni Freeman edited this page Sep 12, 2013 · 16 revisions

Phases

The sqltyped macro is a minicompiler which takes SQL string as an input and outputs Scala AST. Effectively it turns this:

sql("select name, age from person where age > ?")

into this:

{
  val name = Witness("name")
  val age = Witness("age")

  new Query1[Int, FieldType[name.T, String] :: FieldType[age.T, Int] :: HNil] { 
    def apply(i1: Int)(implicit conn: Connection) = {
      val stmt = conn.prepareStatement("select name, age from person where age > ?")
      stmt.setInt(1, i1)
      withResultSet(stmt) { rs =>
        val rows = collection.mutable.ListBuffer[FieldType[name.T, String] :: FieldType[age.T, Int] :: HNil]()
        while (rs.next) {
          rows.append("name" ->> rs.getString(1) :: "age" ->> rs.getInt(2) :: HNil)
        }
        rows.toList
      }
    }
  }
}

It does the conversion in several phases.

Phases

Phase 1: parse SQL

Compilation starts by parsing a SQL statement. Successful parsing produces Statement[Option[String]]. Statement is internally used data type for SQL AST and is parameterized by type of table references in column definitions. For instance, in a statement SELECT name, p.age from person p column name has a table reference None, and column age has a table reference Some("p").

    SqlParser.parse(sql: String): ?[Statement[Option[String]]

Phase 2: resolve tables

Next phase resolves all table references to point to a concrete table. For instance, in a statement SELECT name, p.age from person p both columns reference a table Table("person").

    Ast.resolveTables(stmt: Statement[Option[String]]): ?[Statement[Table]]

Phase 3: type AST

Typer infers input and output types of a statement by reading column types from database schema.

    Typer.infer(stmt: Statement[Table], useInputTags: Boolean): ?[TypedStatement]

Phase 4: analyze AST

In analysis phase types are refined. For instance by analyzing a query we can infer that the following query always returns exactly one row: select count(1) from person where age > ?. Thus, its type can be refined from the initial type Int => List[Long] to a more exact Int => Long.

    Analyzer.refine(stmt: TypedStatement): ?[TypedStatement]

Optional phase: Fallback to JDBC based inference

sqlτyped uses custom SQL parsers to parse the SQL statements. The advantage of custom parsing is that it enables better type inference. It is possible to do more thorough query analysis compared to what JDBC API provides. In addition, some JDBC drivers are notoriously bad when it comes to query analysis (MySQL, I'm looking at you ;). The disadvantage of custom parsing is that it will take some time to polish parsers to support all quirks and nonstandard syntax of all SQL dialects. To get a best of both worlds, sqlτyped first tries its more exact inference analysis and if it fails fallsback to JDBC based analysis.

Phase 5: code generation

Finally Scala AST is generated and embedded into compiled program.

    SqlMacro.codeGen(stmt: TypedStatement, sql: String): Tree

Dialects

Various SQL dialects can be supported by providing an implementation of

    trait Dialect {
      def parser: SqlParser
      def typer(schema: Schema): Typer
    }

Extending SQL syntax

Dialect specific SQL syntax is added by extending generic SqlParser. For instance, MysqlDialect adds support for ON DUPLICATE KEY UPDATE by adding following parsing to parsing of insert statements.

    lazy val onDuplicateKey = 
      "on".i ~> "duplicate".i ~> "key".i ~> "update".i ~> repsep(assignment, ",")

Extending built-in database function definitions

Typer contains types of built-in functions provided by the database. New function definitions can be added there by extending extraAggregateFunctions or extraScalarFunctions. For instance, MysqlDialect adds a defintion of datediff function.

    override def extraScalarFunctions = Map(
      "datediff"  -> (f2(date, date) -> option(int))
    )

Functions are typed using a small DSL. f2(date, date) -> option(int) gives a type (Date, Date) => Option[Int].