Part IV: All Really is Well

May 3 08

Part IV: All Really is Well

Paul Weinstein

Previously on pdw @ zoomshare:

Recently we moved zoomshare into a new home.

We scheduled an overnight maintenance window to move the necessary zoomshare equipment.

Each new home has its own little quirks and idiosyncrasies to them. But as time goes by one learns how to navigate the little quirks in a new home. They can become reassuring where originally they were unsettling.

As I closed in on being awake for 24 hours I knew I didn’t have a taxing day ahead of me, I had planned my schedule accordingly at least. However, I still had some work to do and tops on that list was a check-in with zoomshare in a few hours, so I set my alarm for 9 am and closed my eyes.

I don’t remember the alarm going off.

10 am. I could have used some more sleep, but that could wait just a few minutes. A quick check-in on how zoomshare was handling the morning traffic and then a few more hours of shuteye.

I don’t think I even got in my chair, let alone logged into my computer.

I had voicemail. In fact I had a voicemail from kree10 that was just a few minutes old.

Not good news.

Not good news at all, in fact. He was on his way back to the colocation facility. No one from the office was able to connect to any zoomshare site, verifying in turn that everything was in working order. Moreover, it was looking like a good percentage of our users were having issues as well, which meant it wasn’t localized to just one network connection or path.

In the case of zoomshare one of the connecting pipe’s had a pin-size leak. When the “water pressure”, network traffic, was low some droplets of water, network packets, escaped via the leak. Annoying, but manageable. However, when the pressure was turned up the leak started to turn into a flood, more than half of the network packets never made it to their destination.

And Now The Conclusion…
Troubleshooting computing issues can be difficult, more so when systems and services are divided up by different providers. Consider for example the problem most users have with their own system. A user encounters an error while using a specific piece of software, they call up the software provider’s support number, navigate their overly complicated phone system only to be told, once they’ve gotten hold of a real person, described the problem and tried restarting the system that the problem is not with their software package and is obviously a hardware issue and recommend calling the hardware company.

A second call later, this time to the hardware company and the user is nowhere closer to a resolution since the hardware company’s tech support blames the issue on the software provider. All long the user of course doesn’t care about who’s fault it is but simply wants the problem fixed so can get on with their task.

Alas, even us tech folks have to navigate the labyrinth of voicemail hell and deal with providers and support technicians who can be less than forthcoming with assistance when “its not our problem/fault”.

The Root of the Matter
Zoomshare moved into a new co-location facility in which a company provided space and power. Another the network connectivity to the Internet. Three parties had entered the dance.

Our network connection was experiencing a “leak” and we had to pinpoint that leak in our new home for zoomshare. After two days of testing our equipment we suspected the issue was elsewhere. But our new “landlord”, who tested their own setup suspected the issue was elsewhere and our Internet Service Provider (ISP) at first couldn’t even confirm any network issue, let alone the “leak” we witnessed every time we turned up the “water pressure.”

We had run ourselves ragged trying everything we could, different equipment, different configurations. No one knew find the location of the leak, but eventually everyone was able to acknowledge a leak did exist. By the morning of the third day it was time to call a meeting of the brain trust with all three dancers together.

During the course of this third crunch day – as if all the other days hadn’t been crunch time – we retested all the integrated systems. First our networking equipment and wiring. Next, with the landlord the wiring leading up to our cage. Lastly, some testing with the ISP. The different this time? Instead of playing he said, she said, we all had repersentitives present physically all talking with each other, not at each other.

Eventually Holmes’ maxim – “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth” did lead us to the source of the network leak. We eliminated the possible points by testing each segment of the pipe, from “water main” to “faucet”. Alas we had an added complication, navigating and coordinating support procedures of various service providers, something it seems Mr. Holmes and Dr. Watson never had to concern themselves with.

By the end of the third day for our Zoomshare users at least, all really was well again.