-
Notifications
You must be signed in to change notification settings - Fork 714
Data Loading
By default, MyRocks configurations are optimized for short transactions, and not for data loading. MyRocks has a couple of special session variables to speed up data loading dramatically.
- Creating Secondary Indexes
When loading data into empty tables, it is highly recommended to drop all secondary indexes first, then loading data, and adding all secondary indexes after finishing loading data. MyRocks has a feature called "Fast Secondary Index Creation". Fast Secondary Index Creation is automatically used when executing CREATE INDEX or ALTER TABLE ... ADD INDEX. With Fast Secondary Index Creation, the secondary index entries are directly written to bottommost RocksDB levels and bypassing compactions. This significantly reduces total write volume and CPU time for decompressing and compressing data on higher levels.
- Loading Data
As described above, loading data is highly recommended for tables with Primary Key only (No Secondary Indexes), with all secondary indexes added after loading data.
When loading data into MyRocks tables, there are two recommended session variables.
SET session sql_log_bin=0;
SET session rocksdb_bulk_load=1;
When converting from large MyISAM/InnoDB tables, you can create MyRocks tables as below.
SET session sql_log_bin=0;
SET session rocksdb_bulk_load=1;
ALTER TABLE large_myisam_table ENGINE=RocksDB;
SET session rocksdb_bulk_load=0;
Using sql_log_bin=0
avoids writing to binary logs.
With rocksdb_bulk_load=1
, MyRocks enters special mode to write all inserts into bottommost RocksDB levels, and skips writing data into MemTable and the following compactions. This is very efficient way to load data.
The rocksdb_bulk_load
mode has a few conditions:
-
All the data must be inserted in primary key order (or the reverse). Inserting the rows with one or more rows out of order will result in an error and may result in some of the data being inserted in the table and some not. To resolve the problem, simply fix the data order of the insert, truncate the table, and restart. If the data is in the reverse order (i.e. the data is descending on a normally ordered primary key or is ascending on a reverse ordered primary key), the rows will be cached in chunks to switch the order to match the expected order.
-
None of the data being bulk loaded can overlap with existing data in the table. The easiest way to ensure this is to always bulk load into an empty table, but the mode will allow loading some data into the table, doing other operations, and then returning and bulk loading addition data if there is no overlap between what is being loaded and what already exists.
-
The data may not be visible until bulk load mode is ended (i.e. the
rocksdb_bulk_load
is set to zero again). The method that is used is building up SST files which will later be added as-is to the database. Until a particular SST has been added the data will not be visible to the rest of the system, thus issuing a SELECT on the table currently being bulk loaded will only show older data and will likely not show the most recently added rows. Ending the bulk load mode will cause the most recent SST file to be added. When bulk loading multiple tables, starting a new table will trigger the code to add the most recent SST file to the system -- as a result, it is inadvisable to interleave INSERT statements to two or more tables during bulk load mode.
If rocksdb_bulk_load
does not fit your needs there are two other settings that might help: unique_check
and rocksdb_commit_in_the_middle
. By default unique_check
is enabled and rocksdb_commit_in_the_middle
is disabled.
If unique_check
is disabled, MyRocks skips unique constraint checking. In other words, you have to make sure by yourself that source data doesn't break unique constraint. If violating unique constraint, older rows are silently overwritten by newer rows with the same key. REPLACE and INSERT ON DUPLICATE KEY UPDATE statements are not allowed with this option (errors are returned).
If rocksdb_commit_in_the_middle
is enabled, MyRocks implicitly commits every rocksdb_bulk_load_size
records (default is 1,000) in the middle of your transaction. If your data loading fails in the middle of the statement (LOAD DATA or bulk INSERT), rows are not entirely rolled back, but some of rows are stored in the table. To restart data loading, you'll need to truncate the table and loading data again.
If you are loading large data without enabling rocksdb_bulk_load
or rocksdb_commit_in_the_middle
, please make sure transaction size is small enough. All modifications of the ongoing transactions are kept in memory.
Documentation license here.
Installation
MyRocks
- Overview
- Transaction
- Backup
- Performance Tuning
- Monitoring
- Migration
- Internals
- Vector Database
DocStore
- Document column type
- Document Path: a new way to query JSON data
- Built-in Functions for JSON documents
MySQL/InnoDB Enhancements