Skip to content

Commit 09271fb

Browse files
ImgBotAppmichaelklishin
authored andcommitted
Extract troubleshooting section in Networking into a separate guide
Also finishes cherry-picking image optimization from master.
1 parent f0ffe99 commit 09271fb

File tree

3 files changed

+280
-198
lines changed

3 files changed

+280
-198
lines changed

site/documentation.xml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,9 @@ limitations under the License.
165165
<li>
166166
<a href="networking.html">Networking</a>
167167
</li>
168+
<li>
169+
<a href="troubleshooting-networking.html">Troubleshooting Network Connectivity</a>
170+
</li>
168171
<li>
169172
<a href="ssl.html">Using TLS for Client Connections</a>
170173
</li>

site/networking.xml

Lines changed: 3 additions & 198 deletions
Original file line numberDiff line numberDiff line change
@@ -983,204 +983,9 @@ client unexpectedly closed TCP connection
983983
<doc:section name="troubleshooting-where-to-start">
984984
<doc:heading>Troubleshooting Network Connectivity</doc:heading>
985985

986-
<doc:subsection name="troubleshooting-intro">
987-
<doc:heading>Methodology</doc:heading>
988-
<p>
989-
Troubleshooting of network connectivity issues is a broad topic. There are entire
990-
books written about it. This guide provides some starting points for most common issues.
991-
</p>
992-
993-
<p>
994-
Networking protocols are <a href="https://en.wikipedia.org/wiki/OSI_model#Comparison_with_TCP.2FIP_model">layered</a>.
995-
So are problems with them. An effective troubleshooting
996-
strategy typically uses the process of elimination to pin point the issue (or multiple issues),
997-
starting at higher levels. Specifically for messaging technologies, the following steps
998-
are often effective and sufficient:
999-
1000-
<ul>
1001-
<li>Verify client configuration</li>
1002-
<li>
1003-
Verify server configuration using <code><a href="/man/rabbitmqctl.1.man.html">rabbitmqctl</a> status</code> (specifically the <code>listeners</code> section)
1004-
and <code>rabbitmqctl environment</code>
1005-
</li>
1006-
<li>Check server logs (see above)</li>
1007-
<li>Verify hostname resolution</li>
1008-
<li>Verify TCP port connectivity</li>
1009-
<li>Verify IP routing</li>
1010-
<li>If needed, take and analyze a traffic dump (traffic capture)</li>
1011-
</ul>
1012-
1013-
These steps, when performed in sequence, usually help identify the root cause of
1014-
the vast majority of networking issues. Troubleshooting tools and techniques for
1015-
levels lower than the <a href="https://en.wikipedia.org/wiki/Internet_protocol_suite#Internet_layer">Internet (networking) layer</a>
1016-
are outside of the scope of this guide.
1017-
</p>
1018-
</doc:subsection>
1019-
1020-
<doc:subsection name="troubleshooting-verify-client">
1021-
<doc:heading>Verify Client Configuration</doc:heading>
1022-
1023-
<p>
1024-
All developers and operators have been there: typos,
1025-
outdated values, issues in provisioning tools, mixed up
1026-
public and private key paths, and so on. Step one is to
1027-
double check application and client library
1028-
configuration.
1029-
</p>
1030-
</doc:subsection>
1031-
1032-
<doc:subsection name="troubleshooting-verify-server">
1033-
<doc:heading>Verify Server Configuration</doc:heading>
1034-
1035-
<p>
1036-
Verifying server configuration helps prove that RabbitMQ is running
1037-
with the expected set of settings related to networking. It also verifies
1038-
that the node is actually running. Here are the recommended steps:
1039-
1040-
<ul>
1041-
<li>Make sure the node is running using <code><a href="/man/rabbitmqctl.1.man.html">rabbitmqctl</a> status</code></li>
1042-
<li>Verify <a href="/configure.html">config file is correctly placed and has correct syntax/structure</a></li>
1043-
<li>Inspect the <code>listeners</code> section in <code><a href="/man/rabbitmqctl.1.man.html">rabbitmqctl</a> status</code> output</li>
1044-
<li>Inspect effective configuration using <code><a href="/man/rabbitmqctl.1.man.html">rabbitmqctl</a> environment</code></li>
1045-
</ul>
1046-
</p>
1047-
1048-
<p>
1049-
The listeners sections will look something like this:
1050-
1051-
<pre class="sourcecode erlang">
1052-
% ...
1053-
{listeners,
1054-
[{clustering,25672,"::"},
1055-
{amqp,5672,"::"},
1056-
{'amqp/ssl',5671,"::"},
1057-
{http,15672,"::"}]}
1058-
% ...
1059-
</pre>
1060-
1061-
In this example, there are 4 TCP listeners on the node:
1062-
1063-
<ul>
1064-
<li>Inter-node and CLI tool communication port, <code>25672</code></li>
1065-
<li>AMQP 0-9-1 (and 1.0, if enabled) listener for non-TLS connections, <code>5672</code></li>
1066-
<li>AMQP 0-9-1 (and 1.0, if enabled) listener for TLS-enabled connections, <code>5671</code></li>
1067-
<li><a href="/management.html">HTTP API</a>, 15672</li>
1068-
</ul>
1069-
1070-
All listeners are bound to all available interfaces.
1071-
</p>
1072-
<p>
1073-
Inspecting TCP listeners used by a node helps spot non-standard port configuration,
1074-
protocol plugins (e.g. <a href="/mqtt.html">MQTT</a>) that are supposed to be configured but aren't,
1075-
cases when the node is limited to only a few network interfaces, and so on.
1076-
</p>
1077-
</doc:subsection>
1078-
1079-
1080-
<doc:subsection name="troubleshooting-hostname-resolution">
1081-
<doc:heading>Hostname Resolution</doc:heading>
1082-
1083-
<p>
1084-
It is very common for applications to use hostnames or URIs with hostnames when connecting
1085-
to RabbitMQ. <a href="https://en.wikipedia.org/wiki/Dig_(command)">dig</a> and <a href="https://en.wikipedia.org/wiki/Nslookup">nslookup</a> are
1086-
commonly used tools for troubleshooting hostnames resolution.
1087-
</p>
1088-
</doc:subsection>
1089-
1090-
<doc:subsection name="troubleshooting-port-access">
1091-
<doc:heading>Port Access</doc:heading>
1092-
1093-
<p>
1094-
Besides hostname resolution and IP routing issues,
1095-
TCP port inaccessibility for outside connections is a common reason for
1096-
failing client connections. <a href="https://en.wikipedia.org/wiki/Telnet">telnet</a> is a commonly
1097-
used, very minimalistic tool for testing TCP connections to a particular hostname and port.
1098-
</p>
1099-
<p>
1100-
Failed or timing out <code>telnet</code> connections
1101-
strongly suggest there's a proxy, load balancer or firewall
1102-
that blocks incoming connections on the target port. It
1103-
could also be due to RabbitMQ process not running on the
1104-
target node or uses a non-standard port. Those scenarios
1105-
should be eliminated at the step that double checks server
1106-
listener configuration.
1107-
</p>
1108-
<p>
1109-
There's a great number of firewall, proxy and load balancer tools and products.
1110-
<a href="https://en.wikipedia.org/wiki/Iptables">iptables</a> is a commonly used
1111-
firewall on Linux and other UNIX-like systems. There is no shortage of <code>iptables</code>
1112-
tutorials on the Web.
1113-
</p>
1114-
<p>
1115-
Open ports, TCP and UDP connections of a node can be inspected using <a href="https://en.wikipedia.org/wiki/Netstat">netstat</a>,
1116-
<a href="https://linux.die.net/man/8/ss">ss</a>, <a href="https://en.wikipedia.org/wiki/Lsof">lsof</a>. <a href="/cli.html">rabbitmqctl status</a>
1117-
can be used to list configured ports as well.
1118-
</p>
1119-
<p>
1120-
For the list of ports used by RabbitMQ and its various
1121-
plugins, see above. Generally all ports used for external
1122-
connections must be allowed by the firewalls and proxies.
1123-
</p>
1124-
</doc:subsection>
1125-
1126-
<doc:subsection name="troubleshooting-ip-routing">
1127-
<doc:heading>IP Routing</doc:heading>
1128-
1129-
<p>
1130-
Messaging protocols supported by RabbitMQ use TCP and require IP routing between
1131-
clients and RabbitMQ hosts to be functional. There are several tools and techniques
1132-
that can be used to verify IP routing between two hosts. <a href="https://en.wikipedia.org/wiki/Traceroute">traceroute</a> and <a href="https://en.wikipedia.org/wiki/Ping_(networking_utility)">ping</a>
1133-
are two common options available for many operating systems. Most routing table inspection tools are OS-specific.
1134-
</p>
1135-
1136-
<p>
1137-
Note that both <code>traceroute</code> and <code>ping</code> use <a href="https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol">ICMP</a>
1138-
while RabbitMQ client libraries and inter-node connections use TCP.
1139-
Therefore a successful <code>ping</code> run alone does not guarantee successful client connectivity.
1140-
</p>
1141-
1142-
<p>
1143-
Both <code>traceroute</code> and <code>ping</code> have Web-based and GUI tools built on top.
1144-
</p>
1145-
</doc:subsection>
1146-
1147-
<doc:subsection name="troubleshooting-traffic-captures">
1148-
<doc:heading>Capturing Traffic</doc:heading>
1149-
1150-
<p>
1151-
All network activity can be inspected, filtered and analyzed using a traffic capture.
1152-
</p>
1153-
1154-
<p>
1155-
<a href="https://en.wikipedia.org/wiki/Tcpdump">tcpdump</a> and its GUI sibling <a href="https://www.wireshark.org">Wireshark</a>
1156-
are the industry standards for capturing traffic, filtering and analysis. Both support all protocols supported by RabbitMQ.
1157-
See the <a href="/amqp-wireshark.html">Using Wireshark with RabbitMQ</a> guide for an overview.
1158-
</p>
1159-
</doc:subsection>
1160-
1161-
<doc:subsection name="troubleshooting-tls">
1162-
<doc:heading>TLS Connections</doc:heading>
1163-
1164-
<p>
1165-
For connections that use TLS there is a separate <a href="/troubleshooting-ssl.html">guide on troubleshooting TLS</a>.
1166-
</p>
1167-
1168-
<p>
1169-
When adopting TLS it is important to make sure that clients
1170-
use correct port to connect (see the list of ports above)
1171-
and that they are instructed to use TLS (perform TLS
1172-
upgrade). A client that is not configured to use TLS will
1173-
successfully connect to a TLS-enabled server port but its connection
1174-
will then time out since it never performs the TLS upgrade that the server
1175-
expects.
1176-
</p>
1177-
1178-
<p>
1179-
A TLS-enabled client connecting to a non-TLS enabled port will successfully
1180-
connect and try to perform a TLS upgrade which the server does not expect, this
1181-
triggering a protocol parser exception. Such exceptions will be logged by the server.
1182-
</p>
1183-
</doc:subsection>
986+
<p>
987+
<a href="/troubleshooting-networking.html">Troubleshooting of networking-related issues</a> is covered in a separate guide.
988+
</p>
1184989
</doc:section>
1185990
</body>
1186991
</html>

0 commit comments

Comments
 (0)