Skip to content

bad connection errors are incomplete and do not provide context of the instance being connected to #1385

Closed
@sjmudd

Description

@sjmudd

Issue description

I manage an application called orchestrator which talks to thousands of MySQL servers and is used to detect MySQL failures and if a master or intermediate master fails to re-arrange the replication topology of the cluster to ensure it can continue to be used. This system has a reasonably short connect timeout of 1 second.

The problem I see is that we get quite a lot of errors when connecting to "the database" prior to doing operations such as reconfiguring the server or reconfiguring from which other MySQL server it should replicate. The failed connect errors are the concern. See: https://jira.percona.com/browse/DISTMYSQL-261 for some context of the issue from the application side.

Orchestrator is currently using the v1.6.0 version of the MySQL driver. I'm aware there's a newer version v1.7.0 but as far as I can see there's no change in logic around the topic being discussed.

The error message itself is "extremely vague": driver: bad connection does not indicate the actual problem and the logging does not indicate to which host:port this is actually happening and given the fact I'm continually polling a large number of MySQL servers identifying the source of the problem and the exact cause is actually quite important as orchestrator is intended to determine if the MySQL host is healthy or not, so identifying the reason for a connection failure is important.

The bad connection errors are not that frequent but do add up. The problem is it's hard with current code to identify the source of the issue OR the specific issue that is being caused in a single log line.

I see that currently the driver logging combines multiple different conditions under the same umbrella term errBadConnNoWrite and in some cases it logs the error independently of the error returned to the caller.

It would seem better, given recent changes in go error handling, to extend the errBadConnNoWrite errors returned to the caller into separate errors for each condition triggered so that applications can still detect this error with errors.Is(err, errBadConnNoWrite) but by wrapping the specific error with the error picked up earlier in the code (within the driver) the full error can be returned to the caller and identified more completely.

Ideally I'd like the driver to report in the error something like the mc.cfg.Addr value of the host being talked to. If that is not considered acceptable then it would be necessary for the caller to be adapted to record this information for all connections so it can be logged with the error received from the driver when it happens.

Summarising: the exact cause of driver: bad connection errors is not clearly identified in the error returned to the caller.
I believe it would be good to return a more detailed error to clarify each of the different cases where it is returned more explicitly and suggest that users use errors.Is() to check for errBadConnNoWrite if existing functionality is required. If possible include the address of the host where this happens.

If there is a better way to identify the specific errors and the host to which they correspond then please share your thoughts.

Error log

See: https://jira.percona.com/browse/DISTMYSQL-261 for some details/examples

Configuration

Driver version (or git SHA):

  • v1.6.0

*Go version:

  • Not entirely sure. This is seen on CentOS 8 where I see the version pulled in by yum to be 1.18.4
  • current quay.io/centos/centos:stream8 docker images now show 1.19.2.
  • packages were built in Nov '22 so the version may be a bit older. I can find out if needed.

*Server version:

  • MySQL 8.0, versions currently from 8.0.28-8.0.31
  • MySQL 5.7.40 or so

*Server OS:

  • CentOS 8, x86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions