When Time Broke the Internet
A look into how not to debug a network issue
I recently faced a major home networking hiccup, and after spending entirely too much time debugging it, I decided it was an interesting enough mystery to document as my first post here.
Background
I got interested in self-hosting and home networking a few months ago, and that took me down a huge rabbit hole of a mix-and-match of different tools and configurations. The start was a Netgear Nighthawk RAXE500, but then it spiraled into me buying an Intel NUC (BOXNUC8I7BEH) for OPNsense and an Aruba 2530-8-PoE+ to manage the VPNs. I had a spare ASUS RT-AX52 on which I flashed OpenWrt and performed the same setup on an Etisalat S3 AC 2100.
In the final setup, my ISP router was in DMZ (I bought a new TP-Link XN020-G3 to run in bridge mode but haven’t gotten it yet) connected to the Aruba. The NUC was configured as a router on a stick (I know it divides the bandwidth, but I didn’t want to invest in a Thunderbolt 3 ethernet adapter). The Nighthawk is connected to the switch, and another long cable goes to my setup upstairs, where it connects to the RT-AX52. The RT-AX52 acts as a managed switch too, since it gets three VLANs (Trusted, Guest, and IoT). The S3 AC 2100 acts as a range extender for the IoT SSID.
To summarize:
| VLAN ID | Name |
|---|---|
| 1 | Management |
| 10 | Trusted |
| 20 | IoT |
| 50 | Guest |
| 99 | WAN |
Problem
We recently had a power issue, and since my UPS was under maintenance, there were consecutive power failures. The outcome was no internet connection. I started with my RT-AX52, but it was not working correctly. I went to the switch and changed the connection (this is important), but it still didn’t work. Since I was near the Nighthawk, I got connected to that and didn’t notice. Then I opened OPNsense and ran some diagnostics. The direct IP pings were working, so it meant DNS was broken. Since it had been a few months, I didn’t remember anything at all. I had like five DHCP services and DNS services, but they were disabled. Finally, I opened Unbound, checked the config, and saw that forwarding to NextDNS was working fine.
After spending some time on this, I finally noticed that the time on that device was not correct. Since DNSSEC was enabled, the invalid time broke the DNSSEC validation. Because the NTP used domains instead of IP addresses, it couldn’t resolve the domain to fix the time—it became a bad cycle. It took me a few seconds to SSH into the device and manually fix the time. Then I added an IP address to the NTP server as well so that this doesn’t happen again.
Sequel
Now, I had spent some time working on other stuff and the internet was working fine. I went upstairs, and suddenly the internet was not working anymore. This confused me since the connection was just working fine. I restarted the RT-AX52, but it didn’t fix it. So I turned off the RT-AX52 for a while and used the worse downstairs Wi-Fi signal. Once I got free, I decided to debug that.
Since I was not getting any IP, I manually set an IP in the network and then opened the router settings. I tried the password, and it kept giving me an error saying it was wrong. Was my password wrong? I was using the one saved in my password manager. But who knows what happened—I couldn’t remember if I had changed it recently or if the router just glitched.
I had to put the router in Failsafe mode of OpenWrt, SSH into the machine, mount the partition, update the password, and reboot. Finally, I could access the router settings, and opening them showed that everything was working fine. I assumed that this had the same issue as the NUC (i.e., having a bad date), but nope, everything was working fine. Then I started debugging the interfaces and slapped my head: I forgot I had switched the port on the switch. The RT-AX52 was expecting VLAN tagged packets while the switch was sending untagged traffic. I just switched the port back, and everything started working fine.
Conclusion
Networking is hard. The amount of time spent on this shows me how simple it is to simply set up a mesh network and forget about it. Still, it was a valuable experience and gave me a newfound respect for network engineers. This did make me recognize how important metrics and telemetry are, and I need to set that up. Still, that is a task for a different day, and the internet is working again.