Wednesday, November 01, 2006

Wow!

That is all I can say and not have content blocked.

The company decided to sell off a small portion to interested buyers. Seems easy enough, every thing is stored in orginisational units, we can shave off that portion, provide it in a easy to load format for the purchasers and we are ready to go, only 5 systems need to have data transferred.. The team of managers and "in the know" people put together to analyse what what was needed said "1 week to develop a plan, another week to extract the data, 1 week of quality control", 3 weeks of work and then all done. We were steaming along steadily well into week two with no major problems, when we started to get calls to our help desk, "Systems is hung", "I can't get logged in".

The help desk guys ran through their normal list of things to check, well, they tried to. Our knowledge base was down, that triggered pretty much every help desk person to pick up the phone and call somebody. Most of the non help desk IT people were in a meeting about the sale and didn't notice anything. It was very funny though, almost like you see in those movies where all of the stars are in a room and all of their cell phones start going off at once to say they had better do something before the world as they know it comes to an end. I was up at the front showing some interesting data I had found in the financials system that was not orginised properly but needed to be extracted when my blackberry started ringing. The IT managers blackberry was vibrating away on the table, and the two SA's cell phones beeping.

After only a few seconds, we all started to file out of the meeting room in a straight line for the server room. I was about 10 feet from the server room at the back of the line, when, wait for it. The fire alarm goes off, the main building fire alarm is whooping and ringing. The hallway to the server room has one of those red fire alarm bells in it. I don't think there was a dry pair of pants in that hallway when that bell went off, those things are freaking loud and the hallway has a door at one end from the lobby and the server room door on the other end, and nothing but concrete, linoleum and at this particular point in time 8 highly trained IT folks to absorb the sound. Despite the fire alarm going off, our manager had been opening the door to the server room and we continue into the server room with the screaming ringing racket of the fire bell going off about 8 feet behind us, as we all came into the server room we found out where the fire that triggered the fire alarm was- in our server room. There was a thick layer of smoke along the roof of the server room and the back of our our tape array rack was billowing smoke, the smell was almost enough to knock one out. Well, being the highly trained IT proffesionals we are, we were stepping on one another trying to get back out of the server room in a mess of arms and legs as 8 of us fit through the standard sized door pretty much at once.

After a very undignified exit into the lobby scaring the people in the lobby pretty much out of their wits we composed ourselves a bit, realized we were still alive and were moving quickly out of the building when one of the SA's spoke up and said "What about the fire suppression system in the room?". My answer, in a moment of pure brilliance at stating the obvious was "It isn't working". We loitered around outside waiting for the billowing smoke to consume the entire building, after what seemed like an hour the fire department showed up. It was really only 7 minutes from the time the alarm went off to the time the FD pulled up. The IT manager and the building manager spoke with the fireman in charge and explained where the fire was.

Us IT folks stood outside with the rest of the building population and waited and wondered. The two SA's were bickering back and forth on who's day it was to send the offsite tapes offsite and wondering if our DR site was in good enough shape to run the company while this place was rebuilt. After another 10 minutes or so the firemen came out and said it was just mostly smoke and they had put the fire out and after a few minutes we can go in and inspect what was up. The building manager would only let one SA and the IT manager into the room for insurance reasons. They didn't touch anything until they had taken about 3,000 pictures and the insurance company over the phone said we could do what was needed to get our business running again.

The Fire inspector figured out what the problem was, it was pretty obvious once you could see it. A power bar that comes built into the rack had ignited into a slow smoldering burn, causing all 8 power cords plugged into it to start to smolder and put off smoke too. It didn't aparranty get hot enough to trigger the fire suppression system in the room. It got plenty hot enough to melt all of the plastic off of all of the power cords, damage the rack and a fibre network hub thing too and generate vast quantities of smoke. All told, under $6,000 dollars damage. Not including the rooms new paint job, contractors to clean the room and the IT departments time to inspect all of the equipment since running it in a smokey environment is apparently bad for it.

The downtime was the remaining portion of the day it happened, and the entire following day but we were up and running at full capacity by 6am on day 3. One of the most junior help desk people we have, a great guy, summed up the entire thing into two words "Mother F***er!" when he was told what was going on. I will let you fill in the bleeped out section.

I promise to post some documentation on primary and unique keys I had been working on later this week. I also have a document on the pitfalls I have ran into using CURSOR_SHARING of SIMILAR or FORCE, but that is a week or two away.



No comments: