Part II: On Two Hours Sleep

Feb 16 08

Part II: On Two Hours Sleep

Paul Weinstein

This is the continuation of a story begun in “The Move” in which our heroes battle nefarious network gremlins in order to save zoomshare from imploding under its own weight.

By 7am, when everyone else was waking, I was getting into bed. Other than a need for a good long nap, all seemed well. The move from colocation to colocation took a little longer than planned, but judging by the performance we saw before leaving the building zoomshare was running better than ever.

As I closed in on being awake for 24 hours I knew I didn’t have a taxing day ahead of me, I had planned my schedule accordingly at least. However, I still had some work to do and tops on that list was a check-in with zoomshare in a few hours, so I set my alarm for 9 am and closed my eyes.

I don’t remember the alarm going off.

10 am. I could have used some more sleep, but that could wait just a few minutes. A quick check-in on how zoomshare was handling the morning traffic and then a few more hours of shuteye.

I don’t think I even got in my chair, let alone logged into my computer.

I had voicemail. In fact I had a voicemail from kree10 that was just a few minutes old.

Not good news.

Not good news at all, in fact. He was on his way back to the colocation facility. No one from the office was able to connect to any zoomshare site, verifying in turn that everything was in working order. Moreover, it was looking like a good percentage of our users were having issues as well, which meant it wasn’t localized to just one network connection or path.

I gave kree10 a call, He asked me to meet him and to bring any spare any network switches I might have with me. He was bring one as well, just I case. So much for getting more shuteye.

I dressed, I disassembled part of my home network, unplugged my network switch, picked up my laptop and headed out the door.

The Odd Couple
Why bring along a network switch? I believe by the time I was on the road back from which I had come peenworm had determined that about 60% of the network packets from our office were being dropped on their way to the zoomshare servers in their new home. Most of the equipment was the same equipment that had been humming along at the old location just a day before. Most, but not all.

Some of the new equipment already in place before the move included a new network switch for managing network traffic between the various servers. When all is working correctly network switches properly inspect network data packets as they are received, determine their source and destination and forward the traffic appropriately. When things go wrong, well…things sure didn’t seem to be going right.

The funny thing was peenworm had seen something like this within the past few months at the old facility as well. It wasn’t caused by new equipment so much as by an uptake in network traffic. It was a network communication issue, a duplex mismatch to be exact.

Ok, see a network connection can be unidirectional or bidirectional. In either case this is known as duplexing. Our networking equipment, our switch helps determine if there is either a bidirectional “two-way path” between the two connected parties or a unidirectional “reverse path”. Just as it manages the who, what and where of the network traffic the switch helps manage the how. But if any of the equipment on the network is misconfigured, then boom, a network collision and network data disappears, literately into the ether.

In any case the quickest way to straighten out the issue was going to be swapping out the new network switch with another model. Traffic on the network would right itself, packets would stop getting lost and after a little rest we could property reconfigured the newer network switch and place it back online when ready.

If only.

After meeting up with kree10 we tried our switches. One network switch replacement later and no improvement. Two networks switches later and still, no improvement. Something else was causing the network loss. But what?

I Dare Say Mr. Holmes
In “The Move” I mentioned feeling out of place as a software engineer in a network engineer’s world. In the world of computing there are hardware and software “layers” that allow engineers to develop ever more powerful tools. Each “layer of abstraction” removes a level of complexity out of one’s hands such that other problems can be tackled. As a web developer, for example, on any given day I don’t have to worry about programming networking protocols to mange duplexing issues, as that’s all been taking care for me by someone else.

A third wheel I said I felt like, sort of like Dr. Watson tagging along as Sherlock Holmes probes his client’s inner thoughts in search for clues … and an answer.

And yet in Arthur Conan Doyle’s tales Dr. Watson was a medical man, a practitioner of science. He too could put his analytical problem solving skills to the test, even in unfamiliar waters. After all Holmes’ axiom rings true in just about any logical situation, “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.”

If its not the network switch it must be some other piece of equipment. Alas, logic has a way of escaping one’s sense after 36 hours and just a few hours of sleep. Frustration starts to set in, and that sure was the case for all three of us after a few more hours of troubleshooting. The office was having issues connecting, but at the facility where we plugged into the net everything worked better than fine.

We had tried everything we could. It sure seemed like it wasn’t a problem with any of our equipment. Therefore, as Mr. Holmes would correctly point out, it must be a problem elsewhere. When the floodgates were opened and all of the traffic trying to get to zoomshare came rushing in, something starting acting up.

But we couldn’t think straight; perhaps our logic was escaping us. We must have missed something. It couldn’t be a problem with the colocation facility could it? We didn’t see any network issues.

Could it be with our new Internet service provider? The tech we talked to sure didn’t see any issues from his workstation.

Any yet … To be continued in “The Analogy