Netbird relay connection stale for some peers (workaround found) #3936

Silex · 2025-06-06T08:06:29Z

Hello

With netbird self hosted version 0.45.1, peers version 0.45.3 and 0.36.5 that are relayed due to CGNAT issues (one peer is a 5G router, other peer is a windows PC behind corporate firewall) after a while the relay becomes "stale" in the sense that you cannot ping anymore between the peers, yet it says it's connected:

$ netbird status -d

pictet-nvr1.netbird.stvs:
  NetBird IP: 100.70.94.175
  Public key: wNWlJ95DqnJMCdXX77gZwVLB4oDDInwp7DpACxy/SV4=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: rels://netbird.stvs.com:443
  Last connection update: 7 hours, 9 minutes ago
  Last WireGuard handshake: 7 hours, 10 minutes ago
  Transfer status (received/sent) 711.3 MiB/18.1 GiB
  Quantum resistance: false
  Routes: -
  Networks: -
  Latency: 52.905573ms

$ wg show

peer: wNWlJ95DqnJMCdXX77gZwVLB4oDDInwp7DpACxy/SV4=
  endpoint: 127.0.0.1:38500
  allowed ips: 100.70.94.175/32
  latest handshake: 7 hours, 13 minutes, 32 seconds ago
  transfer: 711.28 MiB received, 18.11 GiB sent
  persistent keepalive: every 25 seconds

As you see the latest handshake is way too old. A simple workaround is to stop/start netbird, but that kills all other connections (the PC is connected to many routers). Another workaround is to remove problematic router from policy group & add it again to force an update, but having to handle that manually is annoying.

I guess one could also wg set his way into removing the offending peer, and netbird would recreate the wireguard peer? So maybe I can monitor latest handshakes and "kill" the peers that are stuck?

Any ideas welcome.

The text was updated successfully, but these errors were encountered:

Silex · 2025-06-06T08:28:29Z

I found this which is interesting, but seems netbird already does the right thing:

https://www.reddit.com/r/WireGuard/comments/k3d1hc/latest_handshake_few_hours_ago/

Silex · 2025-06-06T08:42:12Z

Just to clarify the setup:

Netbird runs on multiple 5G routers (Teltonika TRB500) and on multiple servers (windows). The connexions are relayed due to CGNAT/firewall issues.

One of these server records cameras served through the multiple routers.

Almost every night, some of the routers relayed connexions become stale and thus the cameras are unreachable. Simply restarting netbird fixes the issues.

From the other servers most of the time the connexions to the routers are not stale, but it also happens from time to time.

This problematic server is a VM that runs with by different provider so maybe the network issues are mainly due to this other provider, but my guess is that it has more to do with the wireguard tunnel not being correctly detected as not working (e.g 5G router IP changed, 5G connection glitches, etc).

Silex · 2025-06-06T14:09:12Z

Meh, I though it was the wireguard tunnel but it seems deeper than that:

When peer is unreachable:

peer: 6kq3/G775aJK5slDq1OyEyLFK4TvyZiurx+OddRotVw=
  endpoint: 127.1.189.16:51820
  allowed ips: 100.70.189.16/32
  transfer: 0 B received, 148 B sent
  persistent keepalive: every 25 seconds

When peer is reachable:

peer: 6kq3/G775aJK5slDq1OyEyLFK4TvyZiurx+OddRotVw=
  endpoint: 127.1.189.16:51820
  allowed ips: 100.70.189.16/32
  latest handshake: 28 seconds ago
  transfer: 796.04 KiB received, 247.33 KiB sent
  persistent keepalive: every 25 seconds

I removed/recreated the peer using plain wg set commands but it does not reconnect the peer.

The only thing working at this point is netbird down/up or editing the peer policy so netbird "resets" the config.

Should I give 0.46.0 a try?

nazarewk · 2025-06-06T14:29:27Z

I removed/recreated the peer using plain wg set commands but it does not reconnect the peer.

I'm pretty sure it uses elaborate negotiation process to establish connectivity. I wouldn't expect wg set to have any chance of working unless the Peer was directly reachable over the internet.

You can always try the 0.46.0 but after looking briefly at the notes, I don't see anything particularly relevant there.

Silex · 2025-06-06T15:52:41Z

@nazarewk thanks.

I'm trying to find a workaroud so I only reset the stale peer instead of the whole netbird connection. Any idea? Removing & adding the wireguard peer seemed smart but I guess it's a dead end.

Silex · 2025-06-06T18:40:28Z

Hum, forwarding UDP 51820 from WAN to peer does not seem to help P2P connection. Any idea what to try?

Silex added the triage-needed label Jun 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Netbird relay connection stale for some peers (workaround found) #3936

Netbird relay connection stale for some peers (workaround found) #3936

Silex commented Jun 6, 2025

Silex commented Jun 6, 2025

Uh oh!

Silex commented Jun 6, 2025 •

edited

Loading

Uh oh!

Silex commented Jun 6, 2025 •

edited

Loading

Uh oh!

nazarewk commented Jun 6, 2025

Uh oh!

Silex commented Jun 6, 2025

Uh oh!

Silex commented Jun 6, 2025

Uh oh!

Uh oh!

Netbird relay connection stale for some peers (workaround found) #3936

Netbird relay connection stale for some peers (workaround found) #3936

Comments

Silex commented Jun 6, 2025

Silex commented Jun 6, 2025

Uh oh!

Silex commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Silex commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nazarewk commented Jun 6, 2025

Uh oh!

Silex commented Jun 6, 2025

Uh oh!

Silex commented Jun 6, 2025

Uh oh!

Silex commented Jun 6, 2025 •

edited

Loading

Silex commented Jun 6, 2025 •

edited

Loading