December 19, 2024

The Most Comprehensive Interpretation of New Features in MySQL 8.0 - Part 2

Summarize all the new features interpretations of the MySQL 8.0 version so far.

In this series, the author will collate and summarize all the new features interpretations of the MySQL 8.0 version so far. They will be classified according to functionality, performance, security, optimizer and other aspects, and will be explained in multiple articles in the future.

Part 1 mainly talks about what functional enhancements MySQL 8.0 has. This article is part 2, introducing what progress MySQL 8.0 has made in terms of performance.

1. Transaction scheduling based on competition awareness

In version 8.0.3, MySQL introduced a new transaction scheduling algorithm, CATS(Contention-Aware Transaction Scheduling). Before the CATS algorithm, MySQL used the FIFO algorithm. Transactions that arrive first obtain locks first. If lock waiting occurs, they are queued according to the FIFO algorithm. CATS is more complex and smarter than FIFO, and in high-load and high-contention scenarios, the performance improvement is significant.

2. Parallel replication based on WriteSet

In general, MySQL's parallel replication has so far gone through three relatively key time nodes: "concurrent among databases", "group commit", and "write set"; it can be said that there are always talented people emerging, and the former waves die on the beach; in general, the latter ones are much better than the previous ones!

The MySQL 5.7.22 version introduced a new mechanism, WriteSet, to track the dependencies between transactions. This feature is used to optimize the speed of applying binlogs on the Secondary. In scenarios where the concurrency on the Primary is low, it can significantly improve the speed of replaying binlogs on the Secondary. The parallel replication scheme based on WriteSet completely solves the problem of MySQL replication delay. Only these two parameters need to be set.

binlog_transaction_dependency_tracking  = WRITESET                 #    COMMIT_ORDER     
transaction_write_set_extraction        = XXHASH64

‍

3. Enhanced JSON features

MySQL 8 has significantly improved its support for JSON. It has added the JSON_EXTRACT() function that extracts data from JSON fields based on path query parameters, as well as the JSON_ARRAYAGG() and JSON_OBJECTAGG() aggregation functions for combining data into JSON arrays and objects respectively.

In Primary-Secondary replication, a new parameter binlog_row_value_options is added to control the transmission mode of JSON data. It allows partial modification of Json types. Only the modified parts are recorded in the binlog, reducing the occupation of resources by large JSON data when there are only a few modifications.

4. Enhanced spatial data types

MySQL 8 has significantly improved spatial data types and functions, supporting more spatial analysis functions and spatial type objects. The spatial analysis function and performance have been greatly enhanced.

5. Improvement of doublewrite

Before the MySQL 8.0.20 version, the doublewrite storage area is located in the system table space. Starting from version 8.0.20, doublewrite has its own independent table space file. This change can reduce the write latency of doublewrite, increase throughput, and provide greater flexibility in setting the storage location of the doublewrite file.

6. Hash Join

The hash join function was introduced in MySQL 8.0.18 version. For equi-join connections that do not use indexes, hash join can be used for optimization. In version 8.0.20, hash join was strengthened. Even if the join connection does not use equi-conditions, hash join optimization can still be used. The join connection originally using the BNL algorithm will all be replaced by hash join.

6.1 NestLoopJoin algorithm

Simply put, it is a double loop. Traverse the outer table (driving table). For each row record of the outer table, then traverse the inner table, and then judge whether the join condition is met, and then determine whether to output the record to the previous execution node.

From an algorithmic perspective, this is O(M*N).

6.2 Hash Join

It is an optimization for the equal-join scenario. The basic idea is to load the outer table data into memory and establish a hash table. In this way, only one traversal of the inner table is needed to complete the join operation and output the matching records.

Of course, it is good if the data can be fully loaded into memory, and the logic is also simple. Generally, this kind of join is called CHJ (Classic Hash Join). MariaDB has already implemented this HashJoin algorithm before.

If the data cannot be fully loaded into memory, it needs to be loaded into memory in batches, and then joined in batches. The implementation of these join algorithms is specifically introduced below.

7. Anti join

The MySQL 8.0.17 version introduced an anti join optimization. This optimization can internally convert not in(subquery), not exists(subquery), in(subquery) is not true, and exists(subquery) is not true in the where condition into an anti join to remove the subquery inside. This optimization can improve performance by about 20% in some scenarios.

The applicable scenario cases of anti join are usually as follows:

Find data that is in set A but not in set B.
Find customers who have not purchased goods in the current quarter.
Find students who have not passed the exam this year.
Find the part of a certain doctor's patients who have not had a medical examination in the past three years.

8. Redo Log optimization

A new feature of MySQL 8.0 is the lockless commit of redo log. Before 8.0, each user thread competed through mutexes and serially wrote the log buffer, so it could ensure the sequential and gapless growth of the lsn.

MySQL 8.0 solves the performance impact caused by competing for locks when user threads write redo log through lockless redo logs. At the same time, writing redo logs to files and flushing redo log from user threads are separated and extracted into separate threads. User threads are only responsible for writing redo log to the log buffer and no longer care about the details of redo log flushing. They only need to wait for notifications from the log_writer thread or the log_flusher thread.

9. Histogram (statistical information)

The optimizer will use the data of column_statistics to judge the distribution of field values and obtain a more accurate execution plan.

You can collect or delete histogram information through ANALYZE TABLE table_name [UPDATE HISTOGRAM on column_name with N BUCKETS | DROP HISTOGRAM ON clo_name].

The histogram statistics the data distribution of certain fields in the table and provides a reference for optimizing and selecting an efficient execution plan. The histogram is essentially different from an index. Maintaining an index has a cost. Every insert, update, and delete needs to update the index, which will have a certain impact on performance. However, a histogram is created once and never updated unless it is explicitly updated. Therefore, it will not affect the performance of insert, update, and delete.

10. Disabling QC (Query Cache)

Starting from MySQL 8.0, the query cache is no longer used.

With the progress of technology and the test of time, the engineering team of MySQL found that there are not many benefits to enabling caching.

First of all, the effect of the query cache depends on the cache hit rate. Only queries that hit the cache can improve performance, so its performance cannot be predicted.

Secondly, another major problem with the query cache is that it is protected by a single mutex. On servers with multiple cores, a large number of queries will lead to a large amount of mutex contention.

Another reason for MySQL 8.0 to cancel the query cache is that research shows that the closer the cache is to the client, the greater the benefit. MySQL 8.0 has added some other tools for performance intervention to support. In addition, there are also third-party tools like ProxySQL that can also act as an intermediate cache.

In part 3, we will introduce other sections. Stay tuned.