Skip to content

Commit 48a8f44

Browse files
committed
Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#1]
Problem: A MySQL Server which has been disconnected from schema distribution fails to setup event operations since the columns of the table can't be found in the event. Analysis: The ndbcluster plugin uses NDB table definitions which are cached by the NdbApi. These cached objects are reference counted and there can be multiple versions of the same table in the cache, the intention is that it should be possible to continue using the table even though it changes in NDB. When changing a table in NDB this cache need to be invalidated, both on the local MySQL Server and on all other MySQL Servers connected to the same cluster. Such invalidation is especially important before installing in DD and setting up event subscriptions. The local MySQL Server cache is invalidated directly when releasing the reference from the NdbApi after having modified the table. The other MySQL Servers are primarily invalidated by using schema distribution. Since schema distribution is event driven the invalidation will happen promptly but as with all things in a distributed system there is a possibility that these events are not handled for some reason. This means there must be a fallback mechanism which invalidates stale cache objects. The reported problem occurs since there is a stale NDB table definition in the NdbApi, it has the same name but different columns than the current table in NDB. In most cases the NdbApi continues to operate on a cached NDB table definition but when setting up events the "mismatch on version" will be detected inside the NdbApi(due to the relation between the event and the table), this causes the cache to be invalidated and current version to be loaded from NDB. However the caller is still using the "old" cached table definition and thus when trying to subscribe the columns they can not be found. Solution: 1) Invalidate NDB table definition in schema event handler that handles new table created. This covers the case where table is dropped directly in NDB using for example ndb_drop_table or ndb_restore and then subsequently created using SQL. This scenario is covered by the existing metadata_sync test cases who will be detected by 4) before this part of the fix. 2) Invalidate NDB table definition before table schema synchronization install tables in DD and setup event subscripotion. This function handles the case when schema distribution is reconnecting to the cluster and a table it knew about earlier has changed while schema distribution event handlers have not been active. This scenario is tested by the drop_util_table test case. 3) Invalidate NDB table definition when schema distribution event handler which is used for drop table and cluster failure occurs. At this time it's well known that table does not exists or it's status is unknown. Earlier this invalidation was only performed if there was a version mismatch in the the event vs. table relation. 4) Detect when problem occurs by checking that NDB table definition has not been invalidated (by NdbApi event functions) in the function that setup the event subscription. It's currently not possible to handle the problem this low down, but at least it can be detected and fix added to the callers. This detection is only done in debug compile. Change-Id: I4ed6efb9308be0022e99c51eb23ecf583805b1f4
1 parent f7b75ce commit 48a8f44

File tree

3 files changed

+23
-10
lines changed

3 files changed

+23
-10
lines changed

storage/ndb/plugin/ha_ndbcluster_binlog.cc

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1240,18 +1240,13 @@ static void ndbcluster_binlog_event_operation_teardown(THD *thd, Ndb *is_ndb,
12401240
Ndb_event_data::get_event_data(pOp->getCustomData());
12411241
NDB_SHARE *const share = event_data->share;
12421242

1243-
// Invalidate any cached NdbApi table if object version is lower
1244-
// than what was used when setting up the NdbEventOperation
1245-
// NOTE! This functionality need to be explained further
12461243
{
1247-
Thd_ndb *thd_ndb = get_thd_ndb(thd);
1248-
Ndb *ndb = thd_ndb->ndb;
1249-
Ndb_table_guard ndbtab_g(ndb, share->db, share->table_name);
1250-
const NDBTAB *ev_tab = pOp->getTable();
1251-
const NDBTAB *cache_tab = ndbtab_g.get_table();
1252-
if (cache_tab && cache_tab->getObjectId() == ev_tab->getObjectId() &&
1253-
cache_tab->getObjectVersion() <= ev_tab->getObjectVersion())
1244+
// Since table has been dropped or cluster connection lost the NdbApi table
1245+
// should be invalidated in the global dictionary cache
1246+
Ndb_table_guard ndbtab_g(is_ndb, share->db, share->table_name);
1247+
if (ndbtab_g.get_table()) {
12541248
ndbtab_g.invalidate();
1249+
}
12551250
}
12561251

12571252
// Close the table in MySQL Server
@@ -3198,6 +3193,8 @@ class Ndb_schema_event_handler {
31983193
if (schema->node_id == own_nodeid()) return;
31993194

32003195
write_schema_op_to_binlog(m_thd, schema);
3196+
ndbapi_invalidate_table(schema->db, schema->name);
3197+
ndb_tdc_close_cached_table(m_thd, schema->db, schema->name);
32013198

32023199
if (!create_table_from_engine(schema->db, schema->name,
32033200
true, /* force_overwrite */
@@ -5058,6 +5055,12 @@ static int ndbcluster_setup_binlog_for_share(THD *thd, Ndb *ndb,
50585055
return -1;
50595056
}
50605057
}
5058+
// The function that check if event exist will silently mark the NDB table
5059+
// definition as 'Invalid' when the event's table version does not match the
5060+
// cached NDB table definitions version. This indicates that the caller have
5061+
// used a stale version of the NDB table definition and is a problem which
5062+
// has to be fixed by the caller of this function.
5063+
assert(ndbtab->getObjectStatus() != NdbDictionary::Object::Invalid);
50615064

50625065
if (share->have_event_operation()) {
50635066
DBUG_PRINT("info", ("binlogging already setup"));

storage/ndb/plugin/ndb_dd_sync.cc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1187,6 +1187,12 @@ bool Ndb_dd_sync::synchronize_table(const char *schema_name,
11871187
const char *table_name) const {
11881188
ndb_log_verbose(1, "Synchronizing table '%s.%s'", schema_name, table_name);
11891189

1190+
{
1191+
// Invalidate potentially stale cached table
1192+
Ndb_table_guard ndbtab_g(m_thd_ndb->ndb, schema_name, table_name);
1193+
ndbtab_g.invalidate();
1194+
}
1195+
11901196
Ndb_table_guard ndbtab_g(m_thd_ndb->ndb, schema_name, table_name);
11911197
const NdbDictionary::Table *ndbtab = ndbtab_g.get_table();
11921198
if (!ndbtab) {

storage/ndb/src/ndbapi/NdbDictionaryImpl.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5466,6 +5466,10 @@ NdbEventImpl *NdbDictionaryImpl::getEvent(const char *eventName,
54665466
((Uint32)tab->m_id != ev->m_table_id) ||
54675467
(table_version_major(tab->m_version) !=
54685468
table_version_major(ev->m_table_version))) {
5469+
// Table id or version does not match the table in the NdbApi dict cache,
5470+
// the cached table is invalidated and fetched from NDB again. For NdbApi
5471+
// user this have the effect that a different version of the table is used
5472+
// after calling NdbApi event functions.
54695473
DBUG_PRINT("info", ("mismatch on verison in cache"));
54705474
releaseTableGlobal(*tab, 1);
54715475
tab = fetchGlobalTableImplRef(InitTable(ev->getTableName()));

0 commit comments

Comments
 (0)