The other day, I received a call that a PtP link in one of the industrial areas, a kind of “in the middle of the desert” location, has been relocated to a naval shipyard and that this link has stopped working, although line-of-sight is still there and the link distance is shorter than before.
The setup had been up and running for the past year. This was running on two controller-less outdoor APs with external semi-directional antennas across 500 meters (1,640 feet). Other than having to provide support to resolve this issue, this link specifically meant a huge deal to me. I had helped install, align and establish the link in in the middle of summer, here, in one of the distant areas of Abu Dhabi, UAE. So yes, imagine going a bit later during the day for a couple of hours to make this work while basking in the sun for a brown tan at 50ºC (122ºF)!!!
I had to help get to the bottom of this, if anything, just for the memory of that experience. Let’s get to the root-cause with the 5-Why process:
Why wasn’t it working anymore?
The link was not being established between the mesh portal and the mesh point over the 5GHz band.
Why was the link not established?
The mesh portal kept changing channels.
Why did the mesh portal keep changing channels?
Radar radio channels were being detected on the spectrum analyzer.
Why wasn’t the mesh portal using non-DFS channels?
The regulatory settings of the APs were showing that only the DFS channels were allowed to be utilized by this AP model.
Why wasn’t the AP allowing for non-DFS channels to be used although it was permitted in the country’s outdoor regulatory domain allowed channels?
It turns out that the regulatory domain certification for this AP utilizes only the DFS channels. The regulatory domain for the country allows for the additional channels, but not for outdoor applications.
How can we solve this situation?
There are multiple options to look at the resolution:
- Regulatory certification gets updated with additional channels for that AP model – Highly unlikely since the TRA (Telecommunications Regulatory Authority) Regulations won’t change soon/that easily
- Switch to a different regulatory domain – NOT RECOMMENDED as this is illegal.
- Try to calibrate the DFS channels that the RADAR might not use – Not Practical, since we don’t have control over the RADAR and so many other dwellers on the DFS channels could also share the air in that location, including defense and coast guard communication systems.
- Switch to a different connection method or technology, wired or wireless – Makes more sense than other options, but will require another cycle of design and implementation.
I have mentioned the obvious options for dealing with this situation. However, the most important aspect of this or any other issue a networking engineer might face, is to have the right process and tools to troubleshoot any Wi-Fi issue.
Learn or Develop Your Own Troubleshooting Process
Troubleshooting begins by identifying the problem and then working to gather the facts to help identify its possible causes. There are multiple troubleshooting processes that different vendors follow, or that your company or you individually might have developed.
The important thing is to always remember on which layers of the OSI stack Wi-Fi operates (i.e: Physical and Data Link Layers). The idea is to start troubleshooting from there. Again, having a process is the recommendation here, which is better compared to just jumping into the problem and trying to solve it arbitrarily. This will save time and effort that might be spent on testing other services on the same or different layers that are also utilized for the service to work.
In this case’s example, you’d want to start by troubleshooting the wireless medium (condition, line-of-sight, Noise and interferers…etc), cabling, and antennas. You could be using spectrum analysis tools to troubleshoot the “cleanliness” of the air.
In other cases, some other aspect could be checked, like checking any hubs/repeaters, and the physical state of the AP and distribution system’s NICs before jumping to other layers to troubleshoot other aspects. These could include switches, DHCP services, routers, controllers, authentication servers, and going all the way up to the upper layers to include advanced firewalls, proxy servers and so on.
For this problem, it was precisely at the step of checking the spectrum that the troubleshooter was able to detect RADAR interference. It was verified from the AP’s software that those channels were being avoided with no free channels to utilize for the link. From there on, the remaining process or methodology will be useful in ensuring that the proper steps are taken so this or any Wi-Fi problems are repaired and properly documented.