Skip to content

Commit f682827

Browse files
committed
Bug#37318367 Inplace ALTER on table with spatial idx might cause lost rows if concurrent purge
Description: ------------ A table with a secondary index on a spatial column may lose records during an inplace ALTER TABLE operation if the records are deleted and purged concurrently. This issue is specific to spatial indexes and is similar to Bug#36846567, which was reported for tables with a clustered index. Analysis: --------- During the inplace ALTER TABLE rebuild, the records in the clustered index are scanned with the help of a cursor. Cursor tracks the record position in the index. If the table has spatial index(es) then, during the scan, its respective builder instance caches the records in a tuple vector using the method Builder::batch_add_row(). batch_add_row() is called through a bulk_inserter lambda. Once the cursor reaches at the end of the page, the cached tuples are inserted (processed) into the spatial index(es) using the method Builder::batch_insert(). batch_insert() is called through batch_inserter lambda. While processing the records, if the batch_inserter detects that the server is running out of space in redo logs, it deep copies the remaining user records in the tuple vector of current builder and from the remaining other builders of spatial index(es) as well. Batch_inserter than calls the savepoint() that first backs off the cursor by one position to point to a user record, saves that new position and, then releases the latch(es) held by the mini transaction. When the scan resumes, cursor must be positioned to either, the user record which comes next after the position saved before or, if there is no such record, it must restore to supremum on the last leaf page. At present PCursor::restore_from_savepoint() calls resume() to restore the cursor position. In the scenario reported, records are purged and the remaining records are rearranged in the index. Here is what happens : 1. resume() calls PCursor::restore_position() which is wrapper over the btr_pcur_t::restore_position(). 2: btr_pcur_t::restore_position first restores the position on the record saved before or its predecessor and returns false because, it is not original record. 3: Then the caller of step#2, PCursor::restore_position() detects the returned value false. It decides to move to the next record. 4: Finally, the caller of step#3, PCursor::resume() notices that cursor is positioned on the last record of the page, then it moves the cursor to the successor record. Notice step#4 is assuming that step#3 restores the cursor to the saved record(or its predecessor) but that is not case, as a result the record where the cursor was positioned at step#3 is lost, how exactly? Actually after step#3, the cursor is either (a) positioned on the saved record, or if it was removed meanwhile: (b) on a user record which comes after it or, (c) a supremum record which comes after the hole which it left. The logic in PCursor::resume() correctly handles case (a) by advancing, case (c) by doing nothing, but fails in case of (b), as it advances the cursor needlessly. Fix: ---- As part of the Bug#36846567 fix, it was identified that the current PCursor wrapper APIs attempted to handle multiple scenarios, making it difficult to implement them correctly. The decision was made to separate the APIs so that each performs a specific task neatly. This fix introduces a similar set of save/restore APIs for the batch inserter mode used for R-Tree index(es) scan: - save_previous_user_record_as_last_processed: This save API stores the position of the last user record on the page. - restore_to_first_unprocessed: This restore API positions the cursor to the record which comes next after the cursor position saved by the paired save API. Change-Id: I89627e7904293dcc58ccbb8b8a64f73b56e8dc93
1 parent e39b921 commit f682827

File tree

5 files changed

+203
-5
lines changed

5 files changed

+203
-5
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
connect con1, localhost, root,,;
2+
CREATE PROCEDURE test.insert_records (IN record_count INT)
3+
BEGIN
4+
DECLARE counter INT DEFAULT 1;
5+
WHILE counter <= record_count DO
6+
INSERT INTO test.t1 (srid)
7+
VALUES (ST_PointFromText('POINT(1 1)'));
8+
SET counter = counter + 1;
9+
END WHILE;
10+
END //
11+
SET GLOBAL innodb_limit_optimistic_insert_debug=3;
12+
############################################################
13+
# Test#1 Bug#37318367 Delete part of records from two pages
14+
# to lose a record
15+
############################################################
16+
connection default;
17+
CREATE TABLE t1 (id int AUTO_INCREMENT NOT NULL, srid POINT NOT NULL SRID 0,
18+
PRIMARY KEY(id), SPATIAL INDEX spi(srid)) ENGINE=InnoDB;
19+
CALL insert_records(8);
20+
SET GLOBAL innodb_purge_stop_now = ON;
21+
DELETE FROM t1 WHERE id > 2 AND id < 7;
22+
connection con1;
23+
SET DEBUG_SYNC='ddl_batch_inserter_latches_released SIGNAL latches_released WAIT_FOR go EXECUTE 2';
24+
# Send ALTER TABLE INPLACE to rebuild the index.
25+
SET SESSION debug="+d,ddl_instrument_log_check_flush";
26+
ALTER TABLE t1 ENGINE=InnoDB, ALGORITHM=INPLACE;
27+
connection default;
28+
SET DEBUG_SYNC='now WAIT_FOR latches_released';
29+
SET DEBUG_SYNC='now SIGNAL go';
30+
SET DEBUG_SYNC='now WAIT_FOR latches_released';
31+
SET GLOBAL innodb_purge_run_now = ON;
32+
SET DEBUG_SYNC='now SIGNAL go';
33+
connection con1;
34+
# Reap ALTER TABLE.
35+
SET SESSION debug="-d,ddl_instrument_log_check_flush";
36+
# Ensure all data is present after index rebuild.
37+
# Test cleanup.
38+
DROP TABLE t1;
39+
# cleanup
40+
connection default;
41+
DROP PROCEDURE test.insert_records;
42+
SET GLOBAL innodb_limit_optimistic_insert_debug=0;
43+
disconnect con1;
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Save the initial number of concurrent sessions
2+
--source include/count_sessions.inc
3+
--source include/have_debug.inc
4+
--source include/have_debug_sync.inc
5+
6+
--enable_connect_log
7+
--connect (con1, localhost, root,,)
8+
9+
--DELIMITER //
10+
11+
CREATE PROCEDURE test.insert_records (IN record_count INT)
12+
BEGIN
13+
DECLARE counter INT DEFAULT 1;
14+
15+
WHILE counter <= record_count DO
16+
INSERT INTO test.t1 (srid)
17+
VALUES (ST_PointFromText('POINT(1 1)'));
18+
SET counter = counter + 1;
19+
END WHILE;
20+
END //
21+
22+
--DELIMITER ;
23+
24+
# Limit the max number of records per page to 3
25+
SET GLOBAL innodb_limit_optimistic_insert_debug=3;
26+
27+
--echo ############################################################
28+
--echo # Test#1 Bug#37318367 Delete part of records from two pages
29+
--echo # to lose a record
30+
--echo ############################################################
31+
32+
--connection default
33+
CREATE TABLE t1 (id int AUTO_INCREMENT NOT NULL, srid POINT NOT NULL SRID 0,
34+
PRIMARY KEY(id), SPATIAL INDEX spi(srid)) ENGINE=InnoDB;
35+
36+
# 8 records are inserted into four pages as following due to
37+
# sysvar innodb_limit_optimistic_insert_debug=3 set above :
38+
# p1: {1}, p2: {2,3,4}, p3: {5,6,7}, p4: {8}
39+
CALL insert_records(8);
40+
41+
SET GLOBAL innodb_purge_stop_now = ON;
42+
43+
# Remove records in right half of page p2 and left half of page p3
44+
DELETE FROM t1 WHERE id > 2 AND id < 7;
45+
46+
let $before_count = `SELECT COUNT(*) FROM t1`;
47+
--connection con1
48+
SET DEBUG_SYNC='ddl_batch_inserter_latches_released SIGNAL latches_released WAIT_FOR go EXECUTE 2';
49+
50+
--echo # Send ALTER TABLE INPLACE to rebuild the index.
51+
52+
# Force the log free check so that latch could be released at the end of the page
53+
SET SESSION debug="+d,ddl_instrument_log_check_flush";
54+
55+
# Save the cursor position at the end of record#4 and wait for the second
56+
# 'go' signal to resume operation.
57+
--send ALTER TABLE t1 ENGINE=InnoDB, ALGORITHM=INPLACE
58+
59+
--connection default
60+
# wait for the first hit of the sync point at the end of first page ...
61+
SET DEBUG_SYNC='now WAIT_FOR latches_released';
62+
# ... which we ignore by immediately resuming the alter table operation
63+
SET DEBUG_SYNC='now SIGNAL go';
64+
# Now wait for the second hit of the sync point at the end of second page
65+
SET DEBUG_SYNC='now WAIT_FOR latches_released';
66+
SET GLOBAL innodb_purge_run_now = ON;
67+
--source include/wait_innodb_all_purged.inc
68+
69+
# Resume the alter table operation after page p2 is processed. Since the purge
70+
# is completed so remaining records are arranged as following :
71+
# p1: {1, 2, 7}, p4: {8}
72+
# Without the fix, cursor is positioned to record#8 instead of record#7 thus
73+
# latter is skipped.
74+
SET DEBUG_SYNC='now SIGNAL go';
75+
76+
--connection con1
77+
--echo # Reap ALTER TABLE.
78+
--reap;
79+
SET SESSION debug="-d,ddl_instrument_log_check_flush";
80+
81+
--echo # Ensure all data is present after index rebuild.
82+
let $after_count = `SELECT COUNT(*) FROM t1`;
83+
if($before_count != $after_count) {
84+
--echo # Records, before rebuild=$before_count != after rebuild=$after_count"
85+
SELECT id FROM t1;
86+
--die "Test failed due to record count mismatch after index rebuild"
87+
}
88+
89+
90+
--echo # Test cleanup.
91+
DROP TABLE t1;
92+
93+
--echo # cleanup
94+
--connection default
95+
DROP PROCEDURE test.insert_records;
96+
SET GLOBAL innodb_limit_optimistic_insert_debug=0;
97+
--disconnect con1
98+
# Wait till all disconnects are completed
99+
--source include/wait_until_count_sessions.inc

storage/innobase/ddl/ddl0par-scan.cc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -205,8 +205,9 @@ dberr_t Parallel_cursor::scan(Builders &builders) noexcept {
205205
for (size_t j = i + 1; j < batch_insert.size(); ++j) {
206206
batch_insert[j]->batch_insert_deep_copy_tuples(thread_id);
207207
}
208-
thread_ctx->savepoint();
208+
thread_ctx->save_previous_user_record_as_last_processed();
209209
latches_released = true;
210+
DEBUG_SYNC_C("ddl_batch_inserter_latches_released");
210211
}
211212
return DB_SUCCESS;
212213
});
@@ -219,7 +220,8 @@ dberr_t Parallel_cursor::scan(Builders &builders) noexcept {
219220
}
220221

221222
if (latches_released) {
222-
return thread_ctx->restore_from_savepoint();
223+
/* Resume from the savepoint (above). */
224+
thread_ctx->restore_to_first_unprocessed();
223225
}
224226

225227
return DB_SUCCESS;

storage/innobase/include/row0pread.h

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -257,12 +257,18 @@ class Parallel_reader {
257257
@return DB_SUCCESS or error code. */
258258
[[nodiscard]] dberr_t restore_from_savepoint() noexcept;
259259

260-
/** Save the current user record position, commit any active mtr. */
260+
/** @see PCursor::save_current_user_record_as_last_processed */
261261
void save_current_user_record_as_last_processed() noexcept;
262262

263-
/** Restore the saved user record position to resume the scan */
263+
/** @see PCursor::restore_to_last_processed_user_record */
264264
void restore_to_last_processed_user_record() noexcept;
265265

266+
/** @see PCursor::save_previous_user_record_as_last_processed */
267+
void save_previous_user_record_as_last_processed() noexcept;
268+
269+
/** @see PCursor::restore_to_first_unprocessed */
270+
void restore_to_first_unprocessed() noexcept;
271+
266272
/** Thread ID. */
267273
size_t m_thread_id{std::numeric_limits<size_t>::max()};
268274

storage/innobase/row/row0pread.cc

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,9 +232,25 @@ class PCursor {
232232
record. If all records were removed then restore the cursor to the infimum;
233233
this must be fine if the next step of the caller is to advance the cursor.
234234
@note This method must be paired with
235-
save_current_user_record_as_last_processed() to restore the record position */
235+
save_current_user_record_as_last_processed() to save the record position */
236236
void restore_to_last_processed_user_record() noexcept;
237237

238+
/** This method must be called after all records on a page are processed and
239+
cursor is positioned at supremum. Under this assumption, it stores the
240+
position BTR_PCUR_AFTER the last user record on the page.
241+
This method must be paired with restore_to_first_unprocessed() to restore to
242+
a record which comes right after the value of the stored last processed
243+
record @see restore_to_first_unprocessed for details. */
244+
void save_previous_user_record_as_last_processed() noexcept;
245+
246+
/** Restore the cursor to the record which comes next after the saved
247+
position by the paired method save_previous_user_record_as_last_processed().
248+
If there is no such record, it will restore to supremum on the last leaf
249+
page.
250+
@note This method must be paired with
251+
save_previous_user_record_as_last_processed() to save the record position */
252+
void restore_to_first_unprocessed() noexcept;
253+
238254
/** Move to the next block.
239255
@param[in] index Index being traversed.
240256
@return DB_SUCCESS or error code. */
@@ -403,6 +419,29 @@ void PCursor::restore_to_last_processed_user_record() noexcept {
403419
ut_ad(same_row);
404420
}
405421

422+
void PCursor::save_previous_user_record_as_last_processed() noexcept {
423+
ut_a(m_pcur->is_after_last_on_page());
424+
m_pcur->store_position(m_mtr);
425+
ut_a(m_pcur->m_rel_pos == BTR_PCUR_AFTER);
426+
m_mtr->commit();
427+
}
428+
429+
void PCursor::restore_to_first_unprocessed() noexcept {
430+
ut_a(m_pcur->m_rel_pos == BTR_PCUR_AFTER);
431+
m_mtr->start();
432+
m_mtr->set_log_mode(MTR_LOG_NO_REDO);
433+
m_pcur->restore_position(BTR_SEARCH_LEAF, m_mtr, UT_LOCATION_HERE);
434+
if (m_pcur->m_pos_state == BTR_PCUR_IS_POSITIONED_OPTIMISTIC) {
435+
/* The BTR_PCUR_IS_POSITIONED_OPTIMISTIC means the implementation of
436+
btr_pcur_t::restore_position() successfully re-acquired a block stored in
437+
hint and it wasn't modified meanwhile, and in such case it does not advance
438+
the cursor to the next position even though BTR_PCUR_AFTER was used, so we
439+
have to do it ourselves. */
440+
ut_a(!m_pcur->is_after_last_on_page());
441+
m_pcur->move_to_next_on_page();
442+
}
443+
}
444+
406445
dberr_t PCursor::move_to_user_rec() noexcept {
407446
auto cur = m_pcur->get_page_cur();
408447
const auto next_page_no = btr_page_get_next(page_cur_get_page(cur), m_mtr);
@@ -479,6 +518,15 @@ void Parallel_reader::Thread_ctx::
479518
m_pcursor->restore_to_last_processed_user_record();
480519
}
481520

521+
void Parallel_reader::Thread_ctx::
522+
save_previous_user_record_as_last_processed() noexcept {
523+
m_pcursor->save_previous_user_record_as_last_processed();
524+
}
525+
526+
void Parallel_reader::Thread_ctx::restore_to_first_unprocessed() noexcept {
527+
m_pcursor->restore_to_first_unprocessed();
528+
}
529+
482530
dberr_t PCursor::move_to_next_block(dict_index_t *index) {
483531
ut_ad(m_pcur->is_after_last_on_page());
484532
dberr_t err = DB_SUCCESS;

0 commit comments

Comments
 (0)