I really care the concept of “going in production”, because it is the moment when the software bring to life. Going in production can be sometimes a really difficult thing to do, and there are a lot of reasons behind it.

First of all we have the syndrome of “It works on my machine”. The software run perfectly in developer’s machines, but when you move in production servers or into customers machines nothing work. If you ask developer, quite always you get this answer: “HAve You installed library XYZ? Then have you wrote into the registry the SuperSecretKey XYXCWKDSAF, after you done this you should run for a mile, turn yourself for three time with fingers on your nose, …..” This is the sign that you need at least a document that explain how to deploy software, or you’re gonna kill your developers

image

Then the software begin to work with real customer data, and you can see with horror that everything worked ok with test data, but when you import 10KK orders from the old system of your customer, the software stops working. But other problems can arise, maybe the customer works with order with negative amount and the software does not permits so. In the end, you find that test data were not representative, and now you are in trouble with a live system that is not capable to handle the data that the customer really needs. (you’ll end with unsatisfied customer)

image

Maybe the software begin to crash, when you see the log it happens that users insert “Hello Folks!” where the software expects a DateTime…Ok, it is not so simple, but it often happens that real user uses the software in a way we never thought that was possible, and this pattern lead to a crash.

image

You have your system in production, users begin to send feedback, after some time you have a new version ready, but… you now have to update database, update client machines, make sure that older version can work with the new data. Another typical problem is that you cannot stop production server, but you need to make changes online.

Customer Calls you telling “The system crashed”, if you ask more detail maybe you get “The System crash when we are working on it”, please do not expect customer to be technical skilled, he can never tells you real details of the error, if you want to laugh just ask to the customer “Can you send me the stack trace?”, the answer can range from “WHAT??” to “Hey does not bother me with technical terms, the system chrased, NOW YOU’LL FIX IT”. Then you can only go into production server and then you need to understand why, when, what part of the system crashed…. it sounds you need real careful logging system. Maybe you log everything in a file, but…. you have no access to production server because in the customer web farm, so you need to contact the IT manager, tell him you need log, typical answer is “in witch machine are those log…… this lead to pain and frustration

image

I can continue but the real concept is that a system in production can generate an high number of problems that does not arise during developement stage. So what is the solution? It turns out that it can be as simple as the sentence

“Go in production as often as you can”.

I only gives you a little scenario. Set up a CC.net or similar integration machine, create a test production server, at each check in the integration machine compiles code , runs tests and if everything is ok, it deploys the software in the test machine, let it be called preproduction machine . Fill this machine with real data, and make it available to the customer. Now you immediately faces all the problems of a system live in production, because you are actually developing against a software in production, where you virtually go in production at each checkin ;) .

  1. You are forced to create deploy script that automates the task of deploying the software
  2. You are forced to maintain data of the customer in the preproduction machine, so you need to find a way to manage software updates.
  3. You immediately works with real data and real user, moreover you immediately have feedback on the software
  4. Since software is in development stage, bug arises, and you need a quick way to correct them, so you will setup a good error handling system such as elmah, or maybe you can setup a log system that email or send all data to a bug tracking system where developer will automatically see details about the error, without the need to go into the production machine.

This means that you begin to face immediately typical problems of a live software, and you do not postpone solving them when you really are in production. Solving a problem when you are really in production, can be really a pain, so you better work hard to make all possible problem arising during developement stage. Time spent to immediately start developing with a production environment can be difficult, but in the long run it really saves your life.

Alk.

Tags:

2 Responses to “Going in production as often as you can”

  1. You should really define context making statements like this. We have hundreds of clents, some of which do not have IT departmets, so they need to schedule downtime, pay for us to come and install the software, or better yet have parallel ‘preproduction environment’ that they cannot afford. Not everybody is working for a single client here…

  2. When you have multiple customers I think that it is even more important to have a developement environment with continuos integration. Since you have multiple production machine, located at different places, with different environment, etc etc. If your developers are used to work with continuosly live system, it would be simplier to manage such as scenario. Maybe you can keep in your dev environment a list of virtual machines where you have for each one one of your major release of your software.
    The like to have scenario is something like this.
    Suppose your software had 3 major version in production: 2.2, 2.4 and 2.6; you create three virtual machines with these version, and at each checkin or once a day, you run all the update scripts to take these versions to the actual one, then run some tests to verify that everything is ok.
    After each set of test you simply restore a snapshot backup of the virtual machines, so they are clean again for the next test.
    Alk.