Some People Can Pick Them

Re the story. It sounds better when told. But here goes.

It concerns a Pick system. Pick used file hashing based upon a given module
number. The speed of the file search depended upon having this value
correctly specified when the file was first created. At the time, if you
wanted to re-size a file then you had to calculate the new required modulo,
manually set this value for the file, then save the file to tape, delete the
file and then restore the file. Or if you needed to do this for many files,
then once you had set the new required values, do a complete system save
then a complete system restore. This was often how files were resized. The
issue, of course, was that once the restore started all the system was
effectively ‘erased’ and could only be used again once the restore had
successfully completed. It was very common at the time for daily complete
saves to be done followed by a verify that checked that the written data
could be read and matched the data. Actual restore was only done when it had
to be – for the obvious reason.

OK. Now to the story. I’d installed a large (32 multi-user system – which
was large back then) which had worked fine for a few months. Then I got a
telephone call saying it was now running much slower than it was when first
installed. Well the reason was obvious. File module sizes for the hashing
were now too small. The end user knew nothing about how to resize etc so I
went to the user site to do the resize. Knowing the save itself took a few
hours, it was agreed I’d get there Friday lunchtime, do the required file
calculations, set the new module values etc so that when usage was finished,
we could do the save/restore overnight.

Well everything went OK to start with. Around 6pm on the Friday we were all
ready for the save. The backup medium was 1/2 inch tape. To be sure there
would be no tape issues, I used a new tape. Set the back-up going. OK. Went
to the pub for a couple of hours. Came back. Checked the system. The
verification was just finishing. OK. Crossed fingers and started the
restore. Watched it start the restore. Ok for 15 minutes. I then went to a
restaurant for dinner. Came back. And then the problem. Parity error when
reading data. Retries failed. Restore terminated. This is just what I didn’t
want. No restore meant no system and no data – and quite possibly no company
if I couldn’t get this system back up and running.

Well I tried the restore again. This time it didn’t as far through the
restore as it did the first time before the same error. What restored the
first time wouldn’t now. Panic was starting to set in. Now from experience I
knew that the tapes/tape drive was susceptible to excess heat. That’s why
when the system was designed plenty of fans had been included to keep the
inside cool. So I felt the air flow where the fans were located on the back.
Hardly anything from one set. Oh ****. I took the back off to see what was
happening. Some of the fans weren’t moving. More ****. Checked the power to
the fans. Nope. No power. After some more investigation I found a blown
fuse. Ok. Just replace the fuse (I had spares – be prepared!). Done that.
Switched back on and the fuse blew straight away. There was a problem with
the fan tray. Well there was no way that that fan tray could be replaced
until Monday PM at the earliest. Oh dear. This system had to running by
Monday morning at the latest. Now convinced that the issue was heat, I put
the back-up tape in the fridge to cool it down. I turned the room air
conditioning to the coldest it could do. Got my hairdryer that I used when I
stayed in hotels overnight and set it on to cold and balanced it so that the
cool air coming out went over the tape drive. Waited 30 minutes. I crossed
my fingers – and everything else – got the tape from the fridge and tried
the restore again. Got past the last error point. Got past the first error
point. Got to an hour – everything working OK. Got to 2 hours. OK. Got to
the end of the tape and the system started. Whepeeee! This was now after 2am
in the morning. I left a note, switched off the system and finally got to my
hotel. The next day I went back and explained to the user what had happened
and the issue. The system was started Monday morning still using the hair
dryer. The fan tray was replaced Monday night and everything then went
smoothly.

We added a heat detector to the hardware and I added Pick os code to detect
and report excess heat which also shut down the system if over-heating
persisted. We never had the same issue again.