Slick offers a typesafe DSL for accessing your database tables as if they were Scala collections, with a similar API. However, since at the very end the DSL is translated into SQL queries, you sometimes need to be very careful how you use it. In this post you’re going to see how a subtle difference in the query DSL can affect the performance significantly.
To illustrate the performance differences, we’re going to fetch a single row from a large table defined as follows:
1 2 3 4 5 6 7 8 9 10 11 12
Now consider the two following queries:
Can you tell which of those is going to be faster and why? It turns out that
q1 is going to fetch all the rows from the table and only then take the first one, while
q2 is going to add a
LIMIT 1 clause to the generated SQL query, thus reducing the number of rows fetched to one. The full generated SQL queries are, respectively:
So it’s rather obvious that
q2 is going to be much more efficient. But let’s check to make sure.
To check the actual performance differences, I used JMH - the Java micro benchmarking tool - and the sbt-jmh plugin (kudos to Andrzej Ludwikowski for recommending it!). JMH basically lets you test any part of your existing code, e.g. a single method - hence the micro in the name. It takes care of warming up the JVM, computing statistics etc. - the entire boilerplate you would like to skip. The benchmarks can either be configured using command-line parameters or with annotations - let’s see how to use the latter approach.
Firstly, you want to specify two types of queries to benchmark, called
take, for the limited variant, and
head for the second one. Secondly, you’d like to check how the queries behave for different numbers of records in the table - let’s assume 10k, 50k, 100k and 500k. The initial banchmark code looks like:
1 2 3 4 5 6 7 8 9
This is certainly not the best Scala code you could imagine, but this is due to the way JMH works - the benchmarking Java code is generated by the compiler plugin based on our code, i.e. our code is instrumented by JMH. The resulting limitations are:
- you need to use
vars for the parameters, so that the fields don’t become
finalin the Java code,
- you need to initialize the fields with some dummy values, which are not going to be used anyway (unfortunately an abstract class won’t work),
- all the parameters need to be specified as strings, even for numeric parameters like the
In order to be able to declare fields (i.e. introduce some internal state) on the benchmarked class, you need to annotate it with
@State with the scope of your choice - in our case
Scope.Benchmark indicates that the state is going to be shared across all the threads within a single benchmark.
When using multiple benchmark parameters, the benchmarks are going to be executed for every possible combination of those, so in our case there are going to be 8 different benchmarks.
Before running the actual benchmark, we’d like to initialize the database (an embedded H2) with the number of records specified by the
numberOfRecords parameter. To easily generate an arbitrary number of instances of the
User case class, let’s use a nice random-data-generator library, which leverages ScalaCheck and scalacheck-shapeless to provide a simple random generation API. The setup part of the benchmarking code looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
prepare() method, annotated with
@Setup checks if the database schema exists, creates it if it doesn’t, then clears the
users table and fills it with an arbitrary number of
The code under test
This is the most straightforward part, since you just pick one of the queries based on the
queryName parameter and execute it:
1 2 3 4 5 6 7 8
query() method is annotated with
@Benchmark to indicate that this is the actual part of the application that you want to benchmark, and with
BenchmarkMode to indicate that you want to measure the execution time (the default mode is to measure the throughput).
Now you’re ready to run the benchmarks, which is as simple as:
Please note that with the default parameters there will be 10 forks for every parameter combination, each fork consisting of 20 warm-up iterations and 20 actual ones, which leads to quite a number of executions in total and takes some time to complete. To run fewer forks and thus reduce the time, you can e.g. add a command-line parameter:
sbt "jmh:run -f 1"
which reduces the number of forks to one.
Below are the results of running the (lengthy) benchmark with the default parameters on a MacBook with 2,3 GHz Intel Core i7 and 16 GB of RAM under macOS 10.2.3 (Sierra):
(numberOfRecords) (queryType) Mode Cnt Score Error Units 10000 take avgt 200 0.001 ± 0.001 s/op 10000 head avgt 200 0.008 ± 0.001 s/op 50000 take avgt 200 0.001 ± 0.001 s/op 50000 head avgt 200 0.035 ± 0.003 s/op 100000 take avgt 200 0.001 ± 0.001 s/op 100000 head avgt 200 0.064 ± 0.005 s/op 500000 take avgt 200 0.001 ± 0.001 s/op 500000 head avgt 200 3.571 ± 0.168 s/op
They certainly prove that the query with
take(1) is much faster - you could even say that its performance is constant with the growing number of records, while the non-limited query tends to get slower when the record count increases (which is expected).
Although the Slick API resembles Scala collections a lot, you always need to remember that there’s a database underneath and thus wisely choose how you use the API. In case you’re not sure which of several approaches is faster, you can have a look at the actual SQL generated by Slick. When just looking at the queries is not sufficient, you can use a micro benchamrking tool like JMH to verify your guesses.
A full working example with the code presented above is available on GitHub. Enjoy!