24 October 2009

When you forget to prune your logging data


The last 8 months I have worked on and off on a municipal IT project, which have had many different subcontractors before me. The system pretty much does what it has to do, but the code is pretty smelly and could use a good deal of refactoring.

It's a learning experience to read such code. Once in a while you find little gold nuggets and intelligent solutions to problems, that you can really learn from. You also learn from the code smells and the truly "criminal" stuff that you find. It has made me think more on what constitutes a production ready system. More on that in a coming post.

What I want to share, was a pending bug that I found, after one of the servers the system runs on started to act strange and finally went dead. I found the following code:
static void Main(string[] args)
{
try
{
Application app = new Application();
app.Start();
}
catch(Exception ex)
{
Logger.Log(ex);
}
}
At first glance this looks fine (if you approve of a global exception handler like this), but once you've made the decision to log data of any kind, you must remember to prune your log, so that it does not grow out of proportion - which would be, when it takes up all hard disk space.

Once this happens, the OS will throw an IOException every time a write to disk is attempted. You see the trouble now?
  1. An exception occur somewhere in the system
  2. The exception is caught by the catch statement
  3. The logger tries to write the details of the exception to a log
  4. The hard disk is full and the OS throws an IOException
  5. Repeat from 2
This loop crashed the server due to a full hard disk and the consumption of all RAM.

I guess the programmer who wrote the Logger class did not ever consider, that his implementation would have the potential to crash an entire server, because it lacked pruning of the logs. I'll not lie. Only a few years back I would not have thought of it either.

So here's a rule of thumb you might try to remember:

If your code adds bytes to any persistent media, it is almost certain, that it should also remove bytes with some regularity.

Labels: ,


19 October 2009

When your tests rot

The last 15 months I have worked as project manager, system architect and competency trainer at Transsoft A/S.

When I was hired, my task was to implement transparent project management, which would allow the board to know what development was doing and how the project was progressing. I also had to implement practices, which would empower the team to deliver software of a verifiable quality level.

ERP software is supposed to last many years, often 10-15 years. This means, that it must be built to be maintainable, and it must have a very extensible and flexible application architecture, which allow for easy implementation of customer specific busines rules and views.

In order to deliver a system with the kind of progress transparency and quality of architecture demanded by the board, I had the following agenda:

Concretely:
All these elements are needed in some form or shape, to deliver the kind of software we write. But the test-suite, that is the result of test-first development, is to me, the most important element, to ensure a steady progress and a certain level of quality to the code. A Lot of Good Things emerge when you do Test-Driven Development/BDD that I will not reiterate here,

The last few months I have spent a lot of my time developing a course department in Transsoft. This has had the consequence, that I haven't had the same focus on our practices and quality of the code, that I had before. My lack of focus on our practices have resulted in a rotting test-suite.

After a major refactoring (that I gave the greenlights on), our tests were not kept in synch, and we could no longer rely on them. This has reduced the confidence of the team, as we can no longer run all our tests and know, that adding a new feature or removing a bug, has not changed the expected behavior of the system. In addition, the test-suite is now technical debt and does the exact opposite of its intention - it reduces confidence instead of reinforcing it.

We have changed our sprint plan to give the developers time to refactor the tests to get in synch with the codebase. We also did a brush up on our DONE DONE DONE definition as well on why we use the practices we do. But this work takes time, velocity is reduced and refactoring 500+ tests is a pain.

So my advice to any team practicing TDD or unit-testing is to NEVER, EVER ALLOW YOUR TESTS TO ROT!

Labels: , , ,


05 October 2008

Comic anti-pattern

This one had me laughing out loud...

Jeremy's First Law of Continuous Integration:
"If you check in very often and/or first, you can make merge conflicts be someone else's problem"

Labels: , ,


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]