Benchmarking Single-result Query Performance in Slick Using JMH

Background

Slick offers a typesafe DSL for accessing your database tables as if they were Scala collections, with a similar API. However, since at the very end the DSL is translated into SQL queries, you sometimes need to be very careful how you use it. In this post you’re going to see how a subtle difference in the query DSL can affect the performance significantly.

Test case

To illustrate the performance differences, we’re going to fetch a single row from a large table defined as follows:

case class User(id: Int, name: String)

class Users(tag: Tag) extends Table[User](tag, "users") {

  def id = column[Int]("id")

  def name = column[String]("name")

  def * = (id, name) <> (User.tupled, User.unapply)
}

val users = TableQuery[Users]

Now consider the two following queries:

val q1 = users.result.head
val q2 = users.take(1).result.head

Can you tell which of those is going to be faster and why? It turns out that q1 is going to fetch all the rows from the table and only then take the first one, while q2 is going to add a LIMIT 1 clause to the generated SQL query, thus reducing the number of rows fetched to one. The full generated SQL queries are, respectively:

select "id", "name" from "users" -- q1
select "id", "name" from "users" limit 1 -- q2

So it’s rather obvious that q2 is going to be much more efficient. But let’s check to make sure.

Benchmarks

To check the actual performance differences, I used JMH - the Java micro benchmarking tool - and the sbt-jmh plugin (kudos to Andrzej Ludwikowski for recommending it!). JMH basically lets you test any part of your existing code, e.g. a single method - hence the micro in the name. It takes care of warming up the JVM, computing statistics etc. - the entire boilerplate you would like to skip. The benchmarks can either be configured using command-line parameters or with annotations - let’s see how to use the latter approach.

Benchmark parameters

Firstly, you want to specify two types of queries to benchmark, called take, for the limited variant, and head for the second one. Secondly, you’d like to check how the queries behave for different numbers of records in the table - let’s assume 10k, 50k, 100k and 500k. The initial banchmark code looks like:

@State(Scope.Benchmark)
class SingleResultBenchmark {

  @Param(Array("take", "head"))
  var queryType: String = ""

  @Param(Array("10000", "50000", "100000", "500000"))
  var numberOfRecords: Int = 0
}

This is certainly not the best Scala code you could imagine, but this is due to the way JMH works - the benchmarking Java code is generated by the compiler plugin based on our code, i.e. our code is instrumented by JMH. The resulting limitations are:

you need to use vars for the parameters, so that the fields don’t become final in the Java code,
you need to initialize the fields with some dummy values, which are not going to be used anyway (unfortunately an abstract class won’t work),
all the parameters need to be specified as strings, even for numeric parameters like the numberOfRecords.

In order to be able to declare fields (i.e. introduce some internal state) on the benchmarked class, you need to annotate it with @State with the scope of your choice - in our case Scope.Benchmark indicates that the state is going to be shared across all the threads within a single benchmark.

When using multiple benchmark parameters, the benchmarks are going to be executed for every possible combination of those, so in our case there are going to be 8 different benchmarks.

Initialization

Before running the actual benchmark, we’d like to initialize the database (an embedded H2) with the number of records specified by the numberOfRecords parameter. To easily generate an arbitrary number of instances of the User case class, let’s use a nice random-data-generator library, which leverages ScalaCheck and scalacheck-shapeless to provide a simple random generation API. The setup part of the benchmarking code looks like:

private val db = Database.forConfig("h2")
private val users = TableQuery[Users]

@Setup
def prepare(): Unit = {
  val result = for {
    schemaExists <- db.run(MTable.getTables(Users.TableName).headOption.map(_.nonEmpty))
    _ <- if (schemaExists) Future.successful() else db.run(users.schema.create)
    _ <- db.run(users.delete)
    _ <- db.run(users ++= random[User](numberOfRecords))
  } yield ()

  Await.ready(result, Duration.Inf)
}

The prepare() method, annotated with @Setup checks if the database schema exists, creates it if it doesn’t, then clears the users table and fills it with an arbitrary number of Users.

The code under test

This is the most straightforward part, since you just pick one of the queries based on the queryName parameter and execute it:

private val queries = Map(
  "take" -> users.take(1).result.head,
  "head" -> users.result.head
)

@Benchmark
@BenchmarkMode(Array(Mode.AverageTime))
def query(): Unit = Await.ready(db.run(queries(queryType)), Duration.Inf)

Here, the query() method is annotated with @Benchmark to indicate that this is the actual part of the application that you want to benchmark, and with BenchmarkMode to indicate that you want to measure the execution time (the default mode is to measure the throughput).

Results

Now you’re ready to run the benchmarks, which is as simple as:

  sbt jmh:run

Please note that with the default parameters there will be 10 forks for every parameter combination, each fork consisting of 20 warm-up iterations and 20 actual ones, which leads to quite a number of executions in total and takes some time to complete. To run fewer forks and thus reduce the time, you can e.g. add a command-line parameter:

  sbt "jmh:run -f 1"

which reduces the number of forks to one.

Below are the results of running the (lengthy) benchmark with the default parameters on a MacBook with 2,3 GHz Intel Core i7 and 16 GB of RAM under macOS 10.2.3 (Sierra):

  (numberOfRecords)  (queryType)  Mode  Cnt  Score    Error  Units
              10000         take  avgt  200  0.001 ±  0.001   s/op
              10000         head  avgt  200  0.008 ±  0.001   s/op
              50000         take  avgt  200  0.001 ±  0.001   s/op
              50000         head  avgt  200  0.035 ±  0.003   s/op
             100000         take  avgt  200  0.001 ±  0.001   s/op
             100000         head  avgt  200  0.064 ±  0.005   s/op
             500000         take  avgt  200  0.001 ±  0.001   s/op
             500000         head  avgt  200  3.571 ±  0.168   s/op

They certainly prove that the query with take(1) is much faster - you could even say that its performance is constant with the growing number of records, while the non-limited query tends to get slower when the record count increases (which is expected).

Summary

Although the Slick API resembles Scala collections a lot, you always need to remember that there’s a database underneath and thus wisely choose how you use the API. In case you’re not sure which of several approaches is faster, you can have a look at the actual SQL generated by Slick. When just looking at the queries is not sufficient, you can use a micro benchamrking tool like JMH to verify your guesses.

A full working example with the code presented above is available on GitHub. Enjoy!

Jacek Kunicki

to code or not to be