Thursday, February 23, 2006

Top Ten of Programming Advice to NOT follow

Copyright (C) 2006 Kristian Dupont Knudsen

A friend of mine is just beginning his professional programming career and
he asked me recently f I had any specific programming advice for him.
I found that I couldn't really think of anything interesting to say. There
are lots and lots of programming advice on the web. Some of it good, some of
it less so. But most of it is simply plain obvious or just too general to be
interesting.
Then I thought well, since I feel that I do have something to say in that
area, why not try and address some of the most common things people are
being told with which I disagree?
So here goes,
The top ten list of programming advice to not follow:

10) Don't use exceptions
Okay, I just had to throw in this particular one because it keeps showing up
over at Joel's. :)
Joel feels strongly about this and since he is read by many, I think he is
considered somewhat of an authority in the area. However, exceptions are in
my opinion superior to error return codes for many reasons.
This discussion relates to programming language discussions which tend to be
dead ends, so I'll step carefully. I think the number one reason why I like
exceptions is that it makes it easier to convince myself that my invariants
are enforced. Joel says: "Every time you call a function that can raise an
exception and don't catch it on the spot, you create opportunities for
surprise bugs caused by functions that terminated abruptly, leaving data in
an inconsistent state, or other code paths that you didn't think about." I
would argue the exact opposite: everytime you call a function that merely
returns an error code, you risk such surprises, simply because you may not
react to this error code. Throwing exceptions is a way to ensure that code
is not executed with a broken precondition. Error codes rely on the
programmer to remember. In C++, Java and C#, constructors are the most
obvious example. If something goes wrong during the construction of an
object, throwing an exception is the only way to back out, unless you want
to leave behind a zombie object. Granted, throwing an exception will cause
your entire program to explode if you don't catch it but at least, you will
have valuable information about where things went wrong. Zombie objects on
the other hand, are bombs with a timer set to go off in maybe a second or
so, when code that relies on that particular object fails. Now, a second may
not be a lot to you and me but to the computer, this is millions of
instructions away from the erroneous code. Tracking that is a lot harder.


9) Use unsigned integers for values that can only be positive
I'm a bit nervous about this one because it is used in the STL and if there
is one thing that I have learned it is that those guys thought things
through. STL is an example of strikingly good design. However, in STL, size
types - *::size_type are unsigned integers just like the old C size_t was.
Why that is I simply don't understand. Unsigned integer types seem nice at
first because they feel safer. They have a constraint that says "this
variable will never become negative" which makes perfect sense for a number
of variables, such as sizes. Problem is, this constraint is enforced by
wrapping rather than bounds checking (at least in C and C++ - in C# you can
turn on overflow checking and as far as I know, Java doesn't have unsigned
integer types). Hence, you simply end up having no way to tell whether it
has been violated or not - the bit which could be used for error indication
is now used to gain size. But we're talking one bit here, people. If you're
stretching your code so far as to require the size gained by that one bit,
you are probably in the danger zone of introducing errors of more serious
kinds.


8) Design classes parallel to their physical counterparts
I believe that one of the reasons why OO is so popular is that it is so easy
to grasp. Inheritance and specialization really does seem to occur in real
life and the concept of classes fits nicely into real life phenomena. A Ford
is a car is a vehicle. Only this does not map very well to software. You're
not implementing a car. You're probably implementing a record of a car which
can carry a number of stock items from one city to another, with a certain
mileage and speed. This is not a car, it is a virtual reference to a certain
car. Thinking of it as a car will lead your design in the wrong direction.
Even if you're designing a model of a new prototype of car for Ford, which
will be rendered in 3D with physical modelling and all, you're _still_ not
implementing a car. People who think in such parallels are likely to find
themselves confused if they run into the "a square is a rectangle" problem.
In math, squares may well be subclasses of rectangles but making square
inherit from rectangle is plainly wrong.
But while I'm on the topic, I want to bring in a favourite nit of mine,
namely that inheritance is overrated. It used to be the case with C++,
though it seems most of the deep hierarchy people have moved to Java and C#,
leaving the C++ community to evolve in a direction that I think is more
promising. Danny Kalev puts this rather precisely.


7) Make sure your team shares a common coding standard
Not quite going to tell you to not do this, simply that I think it is much
less important than people make it. Programmers can see through indentation
and naming conventions, really. But alas, some don't want to (I didn't for a
long time - I would actually spend a lot of time reformatting source code to
fit my liking if I had to use it. Sigh).

Consider these two versions of the same function:

public string FormatAsHeadline(string sourceText)
{
string resultCode = "

" + sourceText + "

";
HeaderCount++;
return resultCode;
}

public string format_as_headline (string source_text) {
string result_code="

"+source_text+"

";
headerCount++;
return result_code;
}

Now tell me, is there one of the versions of the code above you could not
immediately decipher? Of course not. The two pieces are semantically
identical but they look different. But not so different that they would fool
a programmer. Now, if you're going to tell me that style is important, at
least have the guts to admit that they only really are because of your
aesthetical nit picking desires.


6) Write lots of comments
Make sure you comment your code. Otherwise, it will be impossible for anyone
to understand - including yourself in a year or so. I have heard that many
many times and thought that it was probably right. It makes sense after all,
and I have often found use of comments when trying to find meaning in
somebody elses code. Particularly, I have struggled with comment-less code
and sworn that I would never make that mistake. However, this is mostly the
case when the code is not self explanatory. Which it should be. If you feel
the need to write comments in your code, I suggest you try to refactor
instead, so comments won't be needed. Renaming some variables or introducing
a function call will probably do the trick. Context is better documented
using assertions. In fact, a context that cannot be described using
assertions is probably a bad sign!
There are times when comments serve a pupose though. Algorithmics can be
hard to grasp and yet impossible to simplify through further abstractions.
In such cases, you can explain yourself with comments. I think my colleague
Lars Thorup pointed out a very good test for comments: they should contain
the word "beacuse". This way, you know that you are answering a why rather
than a what.
Oh, and my favourite specialization of the comments advice: keep a history
of changes and author info etc. in the top of each file.
I've never actually heard anyone say that you should do this but I have seen
it so many times that there must be people out there recommending it. Why on
earth you would clutter the code with information that so obviously belongs
in the version control system is just beyond me.


5) Use accessors or properties rather than public fields
We've all learned this. Public fields are no good. Which is true, they're
not, because they break encapsulation. However, supplying accessors gives
you but a tiny bit more encapsulation. What you need to do is determine why
somebody from the outside needs to manipulate the inner workings of your
class. Often, you will find that there should really be a method or set of
methods that do the manipulation. Otherwise, your class might be the victim
of what Martin Fowler calls "feature envy", which means that other classes
seem to wish they had the fields that your class does. Maybe they should,
then?


4) Use the singleton pattern for variables that you KNOW you should have
only one instance of
Global variables are evil. And just because you put a such into a class
which in turn is put into a design pattern from nothing less than the GoF
book, it is no less so. Variables should live in the innermost scope
possible, since this makes every scope more deterministic. A method that
relies only on local variables is easier to analyze than one that also
relies on members of the class. Because instead of having to look around the
method for places where manipulation takes place, you now have to look
around the entire class. And guess what happens if you pull the variable
even further out? That's right - if global variables are used, you have to
look around the entire program for manipulation.
On a more philosophical level, what is "global"? The date could be said to
be a global variable, if one doesn't consider timezones and such, but that
is hardly what you mean. Neither is it what you get. "Global" in programming
terms means per process. Which is sometimes fine but it is actually a rather
arbitrary resolution given the distribution of many software systems. A
software system can consist of many processes running on many machines - and
each process may internally run many threads. In this perspective,
process-level variables are really somewhat of an odd size.
I should note that this goes for mutable variables only. I think. Global,
immutable variables - constants - are okay. For the simple reason that since
they do not vary, determinism is maintained and, they can safely be accessed
by all threads as well as multiple processes.


3) Be tolerant with input and strict with output
Yet another piece of advice that seems intuitive. My program or function
should be able to accept almost anything but produce a very deterministic
and streamlined set of results. This seems like really diplomatic behaviour,
however, it easily conflicts with a principle that is of greater importance:
fail fast. A function that accepts a vast variety of input formats is harder
to test and harder to validate. Also, it allows for problems to propagate
down the system - if a calling function supplied invalid data, this is less
likely detected. This brings us back to the exceptions point. You will want
to fail as soon as you realize something is wrong - and fail as effectively
as possible. Not pleasant to the user, perhaps, but much easier to find and
debug.
If you want to allow for various formats, for instance if you input is
entered by the user, you should split your function into two: a
normalization function and a processing function.


2) Code all the corner cases immediately, cause otherwise you'll never go
back and fix things
Why do we programmers feel guilty when not finishing a function? We're
simply keeping focus! First of all, the "you'll never go back and fix it"
argument is just silly. If you don't then it's obviously because it's not
necessary! But this point is part of an entirely different discussion about
whether you focus on aesthetics or economics, which I will not bring up
here.
Known hacks and incomplete code can be documented, preferably with a failing
test. But you can write a todo-item in your taks list for all I care.
Getting the broader picture together first is important because it shows you
whether your solution is on the right track or not. When you are certain,
focus on the details, but not before that. Which brings us to my number one
piece of advice not to follow:


1) Design first, then code
Okay, things are going political now, but even though you will find many
many people who disagree, I still feel that this is the single most valuable
lesson that I have learned. Designing first and then coding simply doesn't
work. The problem is that this is so counter intuitive that you more or less
have to find it out for yourself.
I think every programmer has experienced a project with no planning turning
into a mess. Changing such code can only be done with hacks and patches to
everyones great frustration. It is at that time that you realize that the
only decent way to code is by designing things right from the start. Only
now the frustration is even greater when you realize that your beautiful
design isn't prepared for exactly this new feature that you are to implement
now. What to do then?
You should think before you code. Go ahead, but think for hours, not days.
Don't kid yourself into believing you can sketch an entire design document
with UML diagrams and everything without making mistakes. At least, don't
think you can do so any faster than you could have simply written the code.
Now, if you're not familiar with agile methodologies such as eXtreme
Programming, the whole concept of evolving design sounds like the very
problem programmers are trying to solve with all their clever ways of
abstracting things out. And indeed, evolving design only works well if you
follow a number of practices. Short iterations, automated testing and
frequent refactoring being the most important.
I suggest you read Martin Fowler's excellent article Is Design Dead? which
explains it all a lot better that I am capable of.

No comments: