Capital City Christian Church
Explanation and Perspective on Documentation




Introduction

Ben, pickup here with updates...

Documentation means many things to many people. It can be busy work, helpful, a waste of time, useless, misleading, confusing...

From my perspective, (technology) documentation should be a tool that can be used as reference, an inventory of hardware/software/configuration..., psydo-policy, and/or psydo-procedures (do some research to understand the difference between policy and procedure). documentation is an indicator of excellence or neglect.

Very importantly, documentation is an invaluable resource for the people who will eventually replace me, and for their subsequent replacements, and so on. It gives someone who is first stepping into an environment somewhere to put their foot. It gives technicians who are new to this environment the information necessary to build a mental map of the environment, how it is laid out, and the reasons it is as it is, the why.

But here is the problem. Most people expect documentation to be the answers to problems, steps to fix something, and checklists of what to do in a given situation. While some of that is fine, most problems do not reoccur, at least not to such an exact degree that a previous solution still applies. Technology, and specifically digital networking (along with all of the ways digital networks are setup, managed, and used) is so situational that practically every new problem to be solved, need to be met, and adaptation to be made is so unique that there is little value in documenting how specific problem or projects are solved or performed.

In fact, relying on how you fixed a problem last time usually hampers addressing a new but similar problem. Each instance of a problem, situation, requirement is unique or different enough that blindly doing what worked before will often compound the current problem and further obscure the solution.


And to the points...

So, why document much of anything?

My solution is to focus on two purposes in documentation. First is to create an ever evolving snapshot of the components and arrangements of the network infrastructure and use. Second, documentation is training material that focuses on the skills needed to address many types of problems. If you are going to work with digital networking, you need to understand and be skilled with cabling, switches/routers, the protocols used on the network, and the arranging and interconnecting and inter-operation of these components. If you are dealing with security, you need to understand and be familiar with the Operating Systems involved, the different file systems involved, how applications and scripts operate in an Operating System, permissions, and especially how these can be used and combined in ways not intended by the developers and users.

These are examples of areas where your effectiveness is a result of your understanding and being familiar with the components of the environment, how they are intended to operate, the ability to identify exceptions, and the elimination of things that might appear involved but are not.

For example:

We have an LED street sign. It is physically on the ethernet network. It is managed by a Windows application provided by the manufacturer.

One day it simply went black. Nothing was displayed. The Windows application reported that it could not connect with the sign's system.

So I started troubleshooting.

I started by checking the circuit breaker. Fine, but where was it. There are a bunch of breaker boxes throughout the church. Well, I went to the part of the church closest to the sign and started digging through closets and anywhere an electrician might put a breaker box. I found several, and none were labeled. Also, none appeared to be tripped. Knowing that this was the oldest part of the church building, and that there had been LOTS of work done to the building, and some of that work done by volunteers, I had to consider that the sign might be 'hard wired' to the power in some way. So a bunch more time is spent on a dead end. Much later I found that the breaker box was on an interior wall behind a picture.

Physically look at the sign. There were no interfaces found, no access doors, nothing to work with. There was however a power cable running underground from the sign going to the East. I knew this because a few feet from the sign, the cable was exposed due to soil erosion. I pulled on the exposed cable and it easily came above ground all the way to the base of the sign. In the other direction the cable pulled out of the ground with the end being ragged and ripped. Well that would explain a lot!

Then I noticed that on both sides of the sign, there was a single led dot that was lit. How was that possible with what appeared to be a broken power cable, and an unknown breaker box.?

So I put the broken power cable (which had take time to investigate) to the side for the moment. Maybe it was trash left and buried from some construction project.

Then, a few weeks later, the sign came back on. It had the same series of display screen and messages just as it should.

Nothing had been done or changed that would account for the sign beginning to operate again.

Note that the sign is controlled by a Windows application that is only installed on one laptop in the Kidz Zone. Why? You'd have to go back several generations of volunteers to find a clue. Besides, what would it matter?

A couple months before any of the sign problems began, that laptop had be replaced and given to me. Knowing what it was, I stashed it in my office intending to somehow clone it so that the process it singularly performed (old, out of date, case physically cracked and broken, have user accounts for people who I had never heard of and no accounts for current staff...) would not be lost.

Guess what? Yep, I never found the time to clone that laptop. But it still worked fine.

Remember that at this point the sign had begun working again. Well time came that the messages on the sign needed to be changed.

So I fired up that old laptop. It booted just fine. I logged in, no problem. I confirmed it was fully operational on the network, and launched the application that manages the sign.

Error message: Sign could not be found.

"What! It is out in the parking lot - I saw it this morning!", I argued with the error message...

I checked the application's configuration and got the IP:PORT that pointed to the sign's network system. I checked the MySQL database and confirmed that was the IP:PORT for the system system.

I ran a ping... No response. I ran arp-scan... it was not listed by IP or MAC.

So was the sign's hardware so damaged that it was functioning as a sign, but not as a host on our network.

Given that information, I decided to run netdiscover just to cover more bases than I had so far. And yep, there it was. It was on the 192.168.1.0/24 subnet. Apparently it had 'rebooted' and used it's default non-routable local IP address.

Next I changed the subnet that the old laptop was using (10.32.10.0/24) to the sign's subnet, and launched the sign application. And it was able to connect to the sign.

So I went into the administration app for the sign, set it's IP to a static address on our production network, and was able to again manage the sign.

With all of that being said, how could anyone 'document' the steps necessary to 'fix' or respond to the problems of the sign no longer working, or it's not being seen on our network when it did work? It that likely to occur again? Not so much. Is a variation possible? Yep. Would approaching it the same was as I first did work again? Nope, it would simply waste a bunch of time and not help illuminate the actual problem.

What will help when other or similar problems occur in the future?

So, in our documentation I plan to include information about the physical and logical infrastructures. (Note: Take a few minutes to research and understand the differences between those two type of infrastructure. What do they consist of? What tools are used in each? How do they relate to one another? How do they interoperate? Throughout this documentation, and EVERYTHING you read or observe, if a word, phrase, concept is not already a clear piece in your mental landscape of technology, then take a moment to understand it and fit it into your mental landscape. This must be an ongoing part of 'doing' technology!)