You are here:

Beautiful Code: An Introduction

3/2/2014 12:42PM

I don't remember why, but while working for A Far Site Better, they asked me what was most important in my coding. Their guess was that it worked. And, um, no. That isn't the most important thing for me when I am coding. I often write minimal test versions of the code that does absolutely nothing, so that the process can be tested.

 

The most important thing for me is beautiful code. Whenever I start coding anything, I build the ground work for that code out, make the comments, litter it with KERBLUH (that's my TODO marker), make it really readable, and only after all of that is done, then do I start coding it. This is not only why I have had far fewer bugs every time I code something new, but also the leading factor in why it is so easy for me to find 9 out of 10 problems in the code.

 

You see, over the years, every couple of years or so, I have come up with or found another new programming method to use. These aren't tips or tricks to make things work better, though. Instead, it just makes it so you can look at source and know very quickly what it does. In fact, this is the main reason why my code can generally be understood by non-programmers as well, even if they do not understand the source itself. I call that a complete victory.

 

I could probably write an entire article about each of these, so I am being very brief. In fact, in the future, I will end up doing an entire series on this. It really is THAT important to me. And so many popular libraries don't give a darn about readable or understandable code (I'm looking at you, jQuery and the entire language of Perl), which makes me wonder how maintainable anything released using them really is.

 

Let's take a look at a number of rules I've added over the years to see why they make a big difference:

  1. Use meaningful, consistent names for everything.

    I know this sounds really obvious, but so many people do not adhere to this rule. One only has to look at what's popular to see that this is not what "the crowd" likes to follow. And to any outsider, it looks like complete jibberish. A function's name should tell you what it does. A variable should tell you what it is.

     

    This is one of those rules that I only started using a few years ago with EVERYTHING, but upon updating my code, immediately made it far more readable. For example, I used to use a q everywhere for a random counter value. q doesn't mean a whole lot to much of anybody. Instead, I changed it to index when it is used as an index in an array, or counter when it is a counter, etc.

     

    What does a function named Widget mean to you? Not much, I'm sure. It has something to do with a widget, but beyond that, no clue. How about DisplayWidget? We're displaying a widget. Novel concept, eh? It's why just about everything in my libraries follow the [*Verb*][*Noun*] structure for names, except the really obvious ones which are generally just [*Verb*]. If you are using a Widget class, and there is a method called Display, then logically it follows you are displaying the Widget at that point.

  2. Use all lower-case for local values, and caps-first camel case for global or shared values.

    By using a different capsing format for local and global/shared values, it becomes obvious just looking at variables where they came from. HomePage - Oh, a global. homepage - Ah, a local. Makes sense, right?

     

    Now, I'm going to have to get side tracked for a while, because what comes next is my explaining the dumbest trend in programming that is happening all over the place, and I have to explain it. There is a coding paradigm going the programming universe around called camel case. It allows you to merge multiple words and still have the same meaning as if each word had spaces. ItLooksSomethingLikeThis. It is ALMOST as readable as if there were spaces between each word.

     

    Now, throw in lazy people. Lazy people don't like that awful shift key. I have heard several other programmers defend this because they "want to type fewer capital letters". So they try as much as they can to avoid it. Oooo! It'll trip you up! It's SHIFT after all. You know, that freakin' key we all use all the time whenever we do a >, &, *, +, {, }, etc. All these symbols we see a million times throughout our code, and have to press that awful key to make them appear on the screen. Why, then, have these lazy people permeated into the programming world, where we are not afraid of shift?

     

    If you can't tell, I am talking about why the heck people are using lower-case-first camel case. Instead of DoSomething, it looks like doSomething. You know, the exact OPPOSITE of how the English language works. Every time I see it, I think of those people that thought 1337 5p34\< was a good thing. ExCePt ThEy AlSo DiD tHiS. Why? I have no idea! I have absolutely no idea why this started perpetuating itself, other than the lazy people theory above. Why did JavaScript decide it was a good idea? Should we call it javaScript instead of JavaScript? I'm sometimes quite tempted to, except then I feel like I'd hate myself for actually doing it.

     

    In any case, in existing languages, I can do nothing to fix them. In JavaScript, I am forced to use the native methods with this ridiculous casing format. To outsiders/non-programmers, it looks like jargon, it makes things HARDER to read. Like everything else I have been discussing, that is the complete opposite of what programming should be.

     

    In any case, if you cannot tell, I don't like it. And I don't use it. I want my caps-first camel casing wherever possible. PS.GetContent is far more readable than PS.getContent to me.

  3. Every statement gets at least its own line.

    Absolutely no exceptions, here! Not putting code on new lines is like not using spaces in English text. Yes, you CAN do it and still convey the same meaning. But anybody trying to read it will have to read it much much slower.

     

    Again, many popular libraries love to break this and merge all sorts of things into a single statement. The worst of these is the "chaining" that I see a lot, which looks like: a->b()->c()->d("Something")->e();. In case you missed it, that just did four things all at once. Is that readily obvious just looking at it? Maybe if you actually think these chain things are a good idea.

  4. A function does one type of thing and returns one type of value.

    There are unusual exceptions, but this rule is also very important, again for consistency. This is one that is broken time and again in some popular libraries, such as jQuery, and is very frustrating for those of us that don't want to learn it to ever work with.

     

    Why load the same exact function with so many ways of working? Why have .css act two completely different ways (you can have it set styles or return style data), instead of making a .GetCSS and .SetCSS? Except, neither of them has anything to do with the actual style sheets themselves (CSS does mean Cascading Style Sheets after all), they have to do with Style which CSS describes, so they're breaking my rule 1 above too. Not that seeing .css means much in the first place. This is why I have a .SetStyle and .GetStyle - not only do they only do their one task, but you can tell just by looking at them what exactly they do.

  5. Split up large functions into smaller functions.

    This is one you hear a lot, but until you've written 200+ line functions, and then you've written the same code as a few separate 20-40 line functions, you don't understand why this is a common rule with programming.

     

    Let's take a simple example. Say you are building the HTML for a list of products in a <ul> <li> ... </li> </ul> kind of structure. The common idea would be to do something like (in PHP, with comments removed for brevity)

    <?php
    public function DisplayItemList(array & $itemlist) {
      if (!count($itemlist))
        return;
    ?>
    <ul id="ItemList">
      <?php
        foreach ($itemlist as & $itemdata) {
      ?>
      <li>
        <?php echo(HTMLClean($itemdata["Name"])); ?>
      </li>
      <?php
        }
      ?>
    </ul>
    <?php
    }
    ?>

     

    Split into two functions, you can see each separate piece easier:

    <?php
    public function DisplayItemList(array & $itemlist) {
      if (!count($itemlist))
        return;
    ?>
    <ul id="ItemList">
      <?php
        foreach ($itemlist as & $itemdata) {
          $this->DisplayItemListItem($itemdata);
        }
      ?>
    </ul>
    <?php
    }
    protected function DisplayItemListItem(array & $itemdata) {
    ?>
      <li>
        <?php echo(HTMLClean($itemdata["Name"])); ?>
      </li>
    <?php
    }
    ?>

     

    Of course, following my other rules, it'd become clearer to those of you who don't program as well:

    <?php
    // ********************************************************************************************************************
    // Created:       v1.0.0
    // Last Modified: v1.0.0
    // Displays a list of items.
    // Returns: Nothing
    // Parameters:
    //   itemlist                 Array, A list of the items.
    public function DisplayItemList(array & $itemlist) {
      // ==============================================================================================
      // Exit if we don't have any items
      if (!count($itemlist))
        return;
    
    
      // ==============================================================================================
      // Display the list of items
    ?>
    <ul id="ItemList">
      <?php
        foreach ($itemlist as & $itemdata) {
          $this->DisplayItemListItem($itemdata);
        }
      ?>
    </ul>
    <?php
    } // DisplayItemList
    
    
    
    // ********************************************************************************************************************
    // Created:       v1.0.0
    // Last Modified: v1.0.0
    // Displays a single item. Called by DisplayItemList.
    // Returns: Nothing
    // Parameters:
    //   itemdata                 Array, The data for the item.
    protected function DisplayItemListItem(array & $itemdata) {
    ?>
      <li>
        <?php echo(HTMLClean($itemdata["Name"])); ?>
      </li>
    <?php
    } // DisplayItemListItem
    ?>
  6. Functions should have a human-readable comment, in a consistent format.

    Just about any project now caters to a single comment parser's format for describing functions. You'll recognize it when you see a comment block start with /**. These are readable by the comment parser, but with any sort of complexity, they're really not readable by people without having to do a lot of translating.

     

    For this very reason, I have created my own comment format and a parser to go with it. Instead of doing something very computer-oriented for naming sections (they do @something {data}), I instead have it so the content speaks for itself. Being consistent is important, and if you are, the parser will understand.

     

    In my case, I put the version the function was created, then the version it was last modified, then a description of the function, what value or values it returns (some things return an array or a null on error, for example), optionally the details of the parameters (with appropriate tabulation for maximal readability), example, etc. Each of these is in a simple Something: format.

     

    Taking an example from TinyMCE, their bind method looks like:

    /**
     * Binds a callback to an event on the specified target.
     *
     * @method bind
     * @param {Object} target Target node/window or custom object.
     * @param {String} names Name of the event to bind.
     * @param {function} callback Callback function to execute when the event occurs.
     * @param {Object} scope Scope to call the callback function on, defaults to target.
     * @return {function} Callback function that got bound.
     */
    self.bind = function(target, names, callback, scope) {

     

    My method would be (I'm guessing on the versions, but the additional comments actually tell you more because they actually exist):

    // ********************************************************************************************************************
    // Created:       v1.0.0
    // Last Modified: v4.0.10
    // Binds a call back to an event on the specified target.
    // Returns:
    //   Function, The call back function that got bound.
    //   undefined - An error occurred, generally due to a bad target.
    // Parameters:
    //   target                   Single DOM Element, The element to target.
    //   names                    String, Either a single event name, or a space separated list of event names. These should
    //                            NOT include 'on' in the beginning.
    //                            Example: 'mouseover mouseout', which will bind the onmouseover and onmouseout events.
    //   callback                 Function, The function to call back on that event.
    //   scope                    Optional - Object, The scope to call the call back function on. Use null or undefined to
    //                            use the target.
    //                            Default: undefined
    self.bind = function(target, names, callback, scope) {
  7. Comment every section of code except tiny functions that only have one or two lines.

    This largely occurs automatically for me anymore because I create the comments before I code, but the exception might seem a little silly. But just look back to my DisplayItemListItem example above. The description of the function says what it does, and it's very easy to just look and see, "Yep, that does just display the item." If it got any more complicated, I certainly would add comments as I always do (see the DisplayItemList sample above).

  8. At the end of a long block of code, add a comment about what it was. And ALWAYS comment the ends of functions, classes, structs, and other large data definitions.

    This may sound odd at first, but it helps you figure out where the heck you are. When you can see what is before your current position in the page as well as what's ahead, it's a lot easier to tell what is going on. Separating code into a bunch of separate functions helps as well, but as you can see, I will always comment the ends of functions, so you just need to see the last line of a function to know what function it is.

  9. Use separator bars that shrink as the program moves through tiers, with appropriate spacing for each of them.

    This is another one of my own rules, and though it sounds strange at first, it becomes so incredibly useful. I started using it a number of years ago because I got frustrated scrolling through code and having trouble seeing where functions started and ended. I still have this problem with other people's code, but never my own anymore.

     

    The second condition, "with appropriate spacing ...", may seem a little odd and takes a tiny bit of explaining. For me, after each "block", I will add a certain amount of spacing. 3 lines after a function or class or other large structure or group of data, 2 lines after a large or top level block of code, and 1 line after smaller inner blocks of code. I've become so consistent at this in the last couple of years, that any time I see my spacing pattern broken in older code (generally stuff from before 2012), I have to fix it. It takes almost no time at all to fix in a spot.

     

    In any case, this separation of code into chunks is makes it very easy to see where each section of code is. The separation on functions with the ********** bar that I use makes it about as easy to see them as if they were on separate pages of a book. Having the sub sections under $$$$$$$$ (for longer functions with big sub sections) or ====== (for general top level sub sections) bars splits out the major tasks of the function. And if I ever have to get down to the ::::::::: or ........ bars, I know that function probably needs splitting to multiple pieces.

  10. Absolutely always tab content properly, with 2 spaces per tab.

    This is kinda a given, but I had to bring it up because it is important. Not tabbing your code means you have no clue what context it has to whatever is above or below it. And I chose 2 spaces per tab back in the MS-DOS days, when you only had 78 characters across the screen in MS Edit. Having the default 8 was ridiculous to deal with, and 4 was almost as bad, so I stuck with 2. I have used 2 for nearly 20 years now. It works out great. Whoever had the idea to use 8 space tabs really likes things being seriously staggered.

  11. If possible, always break at 120 characters, and then split out tabbing appropriately.

    Having to scroll horizontally is widely considered a "bad idea". In general, nobody wants to or really does it. This is particularly true while coding, because it is not something you do very often. So any code that goes past the edge of your view port is going to be hidden, and hidden code means you don't know what it's doing, and code doing something you don't know about is *NEVER* a good thing.

     

    The choice of 120 characters was one I made back in about 2001, since all screens are only getting higher and higher resolution. I came from a programming history in the DOS days, where we only had 78 characters in Edit (or QuickBASIC). Having a huge screen was awesome. I could fit around 150 characters on my screen's width back then at the size I was using (1280x960, at 8px per character, gives 160, but some space is taken up by the window borders and scroll bar). However, I found that 99% of the time, there was just a glaring big huge blank white space over there on the right. I tried 100, but it felt to cramped, so I settled on 120, which works out well for me.

     

    This rule is hard to see in little examples. I suppose I can give you a larger one, though:

    <?php
      if ((is_string($scriptname) && $EmailAlert["Alert"]["ScriptName"] != strtolower($scriptname)) || (!is_string($scriptname) && $EmailAlert["Alert"]["ID"] != $scriptname)) {
        if (!EmailAlert_GetData($scriptname, $fromemail, $flags))
          return false;
      }
    ?>

     

    However, break that if up, and you can understand it without horizontal scrolling (well, unless you're on a mobile device):

    <?php
      if (
          (is_string($scriptname) && $EmailAlert["Alert"]["ScriptName"] != strtolower($scriptname)) ||
          (!is_string($scriptname) && $EmailAlert["Alert"]["ID"] != $scriptname)
        ) {
        if (!EmailAlert_GetData($scriptname, $fromemail, $flags))
          return false;
      }
    ?>

     

    You might ask, "Why two tabs for the inside of the if?" My response is, see that starting ( and ending ) of the if? That requires tabulation. And since I'm not having the final ) go back a line for an if (or any block starter) as that would look out of place, that means the contents of the if needs what seems to be an extra tab. It makes sense in my head, and having used it for quite some time now, it also makes sense reading it in the code.

 

So, what does it all end up looking like in the end? Why, just look at my source and you'll see! A great example is a minimal sound library I coded up recently: view the full source