lunes, 22 de julio de 2019

Faster Pagination in Mysql – Why Order By With Limit and Offset is Slow?

Taken from https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/

Queries with LIMITs and OFFSETs are common in application that require pagination and in some cases might work well for a while.

In many cases though, they become slow and painful once the OFFSET has a high value.

Why OFFSET is so slow?

Well, in most cases, low offset queries are not slow. The problem starts with high OFFSET values.

If your query is using the following limit clause: “LIMIT 50000, 20”, it’s actually requesting the database to go through 50,020 rows and throw away the first 50,000. This action can have a high cost an impact response time.

We tested the following OFFSET values with the following query, to present the performance deterioration as the OFFSET grows.

The query was executed on a table that holds user performed events (an analytics table) with 150,000 records. The data is real user information and not auto generated.

SELECT
    *
FROM
    events
WHERE
    date > '2010-01-01T00:00:00-00:00'
        AND event = 'editstart'
ORDER BY date
LIMIT 50;

Offset Query Duration (ms)
0 1
50 1
1000 13
10000 150
25000 500
50000 930
100000 1750



How to optimize slow OFFSET queries?

To optimize slow OFFSET queries, you can either limit the amount of permitted pages in a pagination view, or simply just not use OFFSET.

A good alternative for using OFFSET will be the Seek Method, which is also highly recommended by both Lukas Eder and Markus Winand in their blogs.

In simple words, the seek method is all about finding a unique column or set of columns that identifies each row. Then, instead of using the OFFSET clause, we can just use that unique value as a bookmark that presents the position of the last row we’ve fetched and query the next set of rows by starting from this position in the WHERE clause.

For example, looking at the queries we executed before, assuming the last event id in offset 999,999 was ‘111866’, the query will be:

SELECT    *
FROM
    events
WHERE
    (date,id) > ('2010-07-12T10:29:47-07:00',111866)
        AND event = 'editstart'
ORDER BY date, id
LIMIT 10

Please note that you need to make sure to order by the unique columns, so that the order is always kept the same between pages, otherwise you might get an unexpected behavior.

This is a comparison of the performance between both methods. The interesting observation here is not only that the performance of the Seek method is better, but that it’s also more stable no matter how far you paginate into the table.


No hay comentarios.:

Publicar un comentario