-
-
Notifications
You must be signed in to change notification settings - Fork 707
Netbird relay connection stale for some peers (workaround found) #3936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I found this which is interesting, but seems netbird already does the right thing: https://www.reddit.com/r/WireGuard/comments/k3d1hc/latest_handshake_few_hours_ago/ |
Just to clarify the setup: Netbird runs on multiple 5G routers (Teltonika TRB500) and on multiple servers (windows). The connexions are relayed due to CGNAT/firewall issues. One of these server records cameras served through the multiple routers. Almost every night, some of the routers relayed connexions become stale and thus the cameras are unreachable. Simply restarting netbird fixes the issues. From the other servers most of the time the connexions to the routers are not stale, but it also happens from time to time. This problematic server is a VM that runs with by different provider so maybe the network issues are mainly due to this other provider, but my guess is that it has more to do with the wireguard tunnel not being correctly detected as not working (e.g 5G router IP changed, 5G connection glitches, etc). |
Meh, I though it was the wireguard tunnel but it seems deeper than that: When peer is unreachable:
When peer is reachable:
I removed/recreated the peer using plain The only thing working at this point is netbird down/up or editing the peer policy so netbird "resets" the config. Should I give |
I'm pretty sure it uses elaborate negotiation process to establish connectivity. I wouldn't expect You can always try the |
@nazarewk thanks. I'm trying to find a workaroud so I only reset the stale peer instead of the whole netbird connection. Any idea? Removing & adding the wireguard peer seemed smart but I guess it's a dead end. |
Hum, forwarding UDP 51820 from WAN to peer does not seem to help P2P connection. Any idea what to try? |
Hello
With netbird self hosted version
0.45.1
, peers version0.45.3
and0.36.5
that are relayed due to CGNAT issues (one peer is a 5G router, other peer is a windows PC behind corporate firewall) after a while the relay becomes "stale" in the sense that you cannot ping anymore between the peers, yet it says it's connected:$ netbird status -d pictet-nvr1.netbird.stvs: NetBird IP: 100.70.94.175 Public key: wNWlJ95DqnJMCdXX77gZwVLB4oDDInwp7DpACxy/SV4= Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: rels://netbird.stvs.com:443 Last connection update: 7 hours, 9 minutes ago Last WireGuard handshake: 7 hours, 10 minutes ago Transfer status (received/sent) 711.3 MiB/18.1 GiB Quantum resistance: false Routes: - Networks: - Latency: 52.905573ms $ wg show peer: wNWlJ95DqnJMCdXX77gZwVLB4oDDInwp7DpACxy/SV4= endpoint: 127.0.0.1:38500 allowed ips: 100.70.94.175/32 latest handshake: 7 hours, 13 minutes, 32 seconds ago transfer: 711.28 MiB received, 18.11 GiB sent persistent keepalive: every 25 seconds
As you see the latest handshake is way too old. A simple workaround is to stop/start netbird, but that kills all other connections (the PC is connected to many routers). Another workaround is to remove problematic router from policy group & add it again to force an update, but having to handle that manually is annoying.
I guess one could also
wg set
his way into removing the offending peer, and netbird would recreate the wireguard peer? So maybe I can monitor latest handshakes and "kill" the peers that are stuck?Any ideas welcome.
The text was updated successfully, but these errors were encountered: