You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#1]
Problem:
A MySQL Server which has been disconnected from schema distribution
fails to setup event operations since the columns of the table can't be
found in the event.
Analysis:
The ndbcluster plugin uses NDB table definitions which are cached by the
NdbApi. These cached objects are reference counted and there can be
multiple versions of the same table in the cache, the intention is that
it should be possible to continue using the table even though it
changes in NDB. When changing a table in NDB this cache need to be
invalidated, both on the local MySQL Server and on all other MySQL
Servers connected to the same cluster. Such invalidation is especially
important before installing in DD and setting up event subscriptions.
The local MySQL Server cache is invalidated directly when releasing the
reference from the NdbApi after having modified the table.
The other MySQL Servers are primarily invalidated by using schema
distribution. Since schema distribution is event driven the invalidation
will happen promptly but as with all things in a distributed system
there is a possibility that these events are not handled for some
reason. This means there must be a fallback mechanism which
invalidates stale cache objects.
The reported problem occurs since there is a stale NDB table definition
in the NdbApi, it has the same name but different columns than the
current table in NDB. In most cases the NdbApi continues
to operate on a cached NDB table definition but when setting up events
the "mismatch on version" will be detected inside the NdbApi(due to the
relation between the event and the table), this causes the cache to be
invalidated and current version to be loaded from NDB. However the
caller is still using the "old" cached table definition and thus when
trying to subscribe the columns they can not be found.
Solution:
1) Invalidate NDB table definition in schema event handler that handles
new table created. This covers the case where table is dropped directly
in NDB using for example ndb_drop_table or ndb_restore and then
subsequently created using SQL. This scenario is covered by the existing
metadata_sync test cases who will be detected by 4) before this part of
the fix.
2) Invalidate NDB table definition before table schema synchronization
install tables in DD and setup event subscripotion. This function
handles the case when schema distribution is reconnecting to the cluster
and a table it knew about earlier has changed while schema distribution
event handlers have not been active. This scenario is tested by the
drop_util_table test case.
3) Invalidate NDB table definition when schema distribution event
handler which is used for drop table and cluster failure occurs. At this
time it's well known that table does not exists or it's status is
unknown. Earlier this invalidation was only performed if there was a
version mismatch in the the event vs. table relation.
4) Detect when problem occurs by checking that NDB table definition has
not been invalidated (by NdbApi event functions) in the function that
setup the event subscription. It's currently not possible to handle the
problem this low down, but at least it can be detected and fix added to
the callers. This detection is only done in debug compile.
Change-Id: I4ed6efb9308be0022e99c51eb23ecf583805b1f4
0 commit comments