Description
The issue was first discovered here AbsaOSS/spline#869
The error occurs in the combination of circumstances: Cluster mode + Docker + acqureHostList=true
My understanding of what is happening is the following.
When the VST connection is established the respective HostHandler
asks VstCommunication
class to refresh the host list from the server. When the new hosts are added to the set, the old ones (unless are pointing to exactly the same ip:port
) are immediately discarded along with all associated connection pools and sockets.
The problem is that the connection instance, that has just been created and triggered the host list refreshing process in the first place, the one that is being returned from the VstCommunication.connect()
method holds a pointer to the host that might have just been discarded (and the associated socket closed) during this host list refreshing routine. As a result in this circumstances the VstCommunication.connect()
method returns a connection that is dead on the moment of creation, with all the consequences.
This is exactly what happens when ArangoDB runs in a virtualized environment (Docker in our case) when the networking is organized in a way that the client process addresses the server via a different IP (or a host name) that the server sees from inside its network.
The issue is reproducible by spinning up a DB cluster via arangodb-starter
in a Docker, and run ArangoDBTest.execute_acquireHostList_enabled()
test method against it.