Thursday, October 29, 2009

Initial design vs. maintenance and refactoring

As previously mentioned in this blog, our team is busy rewriting a complex application from Cobol to Perl.

While studying the old Cobol code, I often discover some areas where the initial design was well-thought and well-organized, but later became blurred and confused over 20 years of maintenance. So part of our analysis task is to do some archeology, trying to understand the historical layers, and to sort out what is still relevant to the current needs of our users.

Unfortunately, the same phenomenon also starts appearing in the Perl code ! Some parts were written two years ago, and then had to evolve for various reasons ... and sometimes the initial design becomes blurred in this process.

One may think that when this happens, it is because the initial design did not have the proper abstraction / parameterization hooks to make it easy to extend. But sometimes when doing the initial design of a component, you don't have a complete picture yet of what is going to surround that component ... or the requirements may have changed because this is a long-term project, and life doesn't stop while we are working on this application. So what is really needed to keep it clean is constant refactoring.

The problem is that maintenance operations do not have the same metrics : maintainers are evaluated by how many tickets they solved and how long it took; so there is a natural tendency to just "get it to work". Spending additional time on refactoring operations brings no immediate reward : users won't see a change, managers won't understand why you need to revisit code that was already written, and there is an additional risk of introducing regressions.

It's easy to understand that on a collective level there would be some rewards on the long-term (better maintainability, cleaner architecture, etc.) ... but on the long-term the maintainer will probably have gone to another project !

Saturday, October 17, 2009

how to reconcile audits and agile development ?

According to recommendations recently emitted by the Swiss working group for IT government audit (Swiss chapter of ISACA, international Information Systems Audit and Control Association), every important IT project in Swiss government should have at least 10 documents ready for the auditor (among about a hundred kinds of documents defined by the Swiss project management method Hermes ) :

1. Feasibility study
2. Specifications
3. Cost effectiveness analysis
4. Integration into the IT environment
5. Requirements
6. Concept for an internal control system (ICS)
7. System architecture
8. Tests (test plan and documentation)
9. Acceptance by the user
10. Final assessment

The recommendations explicitly insist that this list also applies to "new so-called 'agile' development methods".

For our Perl project at Geneva courts of law, this means that we must produce such documents to be ready for occasional auditors. The problem is, that the Hermes method was mainly inspired by good old waterfall development methods on mainframe computers, and some of the documents listed above just do not make much sense in our context; so instead of helping to better structure and organize the project, they just represent an additional burden.

For example, some parts of the application start in an exploratory way, without formal specifications, and are progressively shaped into working functionalities; tests are not planned in a document, but written in a galaxy of test files; etc.

I guess that the pressure for formal deliverables in project management is probably stronger in government than in private companies, but nevertheless people doing big Perl projects in any context probably also have at least some of such constraints. Any testimonies on that ?

Tuesday, October 6, 2009

I don't hate the stash ... but I loathe the session.

A couple of days ago, John Napiorkowski asked : Does Anyone Else Hate the Stash?

True enough, the stash in Perl Catalyst is a big bag in global memory. Any method along the chain can read or write into that bag. Then you pass the whole thing to the Template Toolkit (TT), that integrates the Catalyst stash into its own stash, and again any template fragment can read or write into the TT stash. So when studying one particular component along the chain, it is indeed sometimes hard to track what is in the stash at that point, and where the data came from. I frequently need to resort to the perl debugger to sort out such situations.

Nevertheless, I don't hate the stash, because it is sooo convenient to let various software components collaborate at little cost. Setting up a more controlled way of passing information between components would be quite tedious and would imply more maintenance. It is like when having several humans in a team : if collaboration is harmonious, it leverages some multiplicative power; if not, people start treading on each other's toes, and the global result is unsatisfactory. A simple step for ensuring harmonious collaboration is to partition the stash namespace through additional levels of hashrefs (same principle we use on CPAN for avoiding collisions between module names).

Furthermore, global memory in stash is not too risky because it is very limited in time : at the end of the request the whole stash is cleaned up. Unfortunately, there is something much worse than the stash : the session !

Some colleagues tend to like putting stuff into session storage, because it's easy to program sequences of requests without having to propagate state through URL parameters or JSON data. I try to avoid it as much as possible because :
  • data in session storage is likely to produce unwanted "action at distance"
  • the URL API is no longer RESTful (calling the same URL with same parameters might yield different results)
  • there is a cost in serializing / deserializing the session data at each request
  • session data is limited in size
  • so session data is not appropriate for storing recursive datastructures of unknown sizes

Programmers a tempted by the easy aspect of session data, but are not always conscious of the limitations above.

Saturday, October 3, 2009

YAPC presentation styles

One personal comment I got from the YAPC::EU::09 Survey results was : "Split slides so they contain less text". Well, while attending several talks, I felt exactly the reverse : I wished the speaker had condensed slides so they contain more text !

Finding the right balance is really a difficult question. It is true that I have a tendency to fill slides with a lot of material, in order to exploit complementary channels : while my voice gives the general idea, or emphasizes a particular point, the slide can convey more detailed information, and people in the audience can grab more content if they are especially interested in one particular aspect.

Probably I like this style because it corresponds to my own way of learning. When I was at school, at a time where beamers were rare and expensive, most teachers used physical transparencies. Some of them had the habit of putting the transparency and immediately hiding it with a piece of paper; then they would progressively uncover the slide, one point at a time. I hated that habit, because I was forced to think at the same speed as the teacher. If I see the global picture at once, I can immediately choose which points seem more important to me, and focus on them, maybe already preparing a question, or think back at what was said before, or anticipate what is probably going to be said next. But if the teacher dictates the rhythm, and decides to pause for 5 minutes on a point which is important to him, but not to me, or decides to quickly skip over a detail which I need to elaborate in my head, then I'm in trouble.

The modern way of uncovering slides one point at a time is the Takahashi style (lots of slides, very few words, huge font), which seems much praised in the Perl community. I must admit that I was quite impressed the first time I saw a presentation in this style : it is quite efficient for a lightning talk, or to create some suspense at a particular point in a presentation. However, if many speakers adopt this style just out of fashion, without deliberate thinking about which effect they want to achieve, then at the end of the day it becomes quite boring, and I feel like having watched several hours of videoclips. After such a day I don't really remember what were the highlights.

Thursday, September 24, 2009

Design patterns, or why Java needs external crutches

In my last blog about architecture and design I promised to come back on the topic of design patterns. This term was coined in a famous book that proposed a catalogue of "Elements of Reusable Object-Oriented Software", and became a best-seller in the Java world (usually referred to as the "GoF" book, for "Gang of Four" authors). Feeling the golden ore, all editors quickly produced a whole line of similar books, where patterns were adapted to various domains and languages : for C##, for Ruby, ..., etc. I almost expected to see patterns books for assembly code and for Befunge, but these never came out !

So what about patterns for Perl ? To my knowledge, no major editor published any book on that topic, which is kind of surprising because one would think that there is some money to make. However, Phil Crow privately published Perlish patterns (cheeply available as e-book), which is really worth reading; Phil proposes the following explanation in his introduction :

Eventually, I came to understand that there were several reasons why patterns were never as enthusiastically embraced in our community as they were in others. Some of the patterns just apply new names to common techniques. Some are represented in Perl's core, so we don't think much about using them, at least in their normal forms. Some apply better to languages which focus on object orientation.

I very much agree with this analysis. The original GoF book was refreshing to read, but when programming in Perl I never think in terms of those patterns, because the standard language features plus some common CPAN modules answer most of my needs for structuring my programs, even when assembling large bodies of functionality.

The fact that Java is so verbose, and that everything has to be an object, results in code that is often spread among lots and lots of classes. So to condense that information, it is no surprise that Java programmers need other abstractions like "patterns", so that they can think in terms of larger units. For the same reason, they also need sophisticated tools like Eclipse for navigating through the class hierarchy, and they need costly tools like Rational Rose to see and design the big picture, and generate code skeletons. I'm always surprised to hear such tools presented as strenghts of the Java world, while to me they are just necessary consequences of the way Java code is layed out.

In standard Perl, we have hashrefs and arrayrefs, we have closures, multiple inheritance, namespace manipulation primitives, dynamic classes and dynamic methods, functional grep, map and other List::MoreUtils goodness; we can assemble those into dispatch tables, delegation structures, function and method templates and factories ... enough patterns to fill a whole life !

Monday, September 21, 2009

hit by operator precedence and right associativity

While studying a bug, I wrote the following test program :

use strict;
use warnings;
use Data::Dumper;
my $bool = 1;
my %h;
$bool ? $h{true} = 't' : $h{false} = 'f';
print Dumper(\%h);

The ternary expression starting with $bool was supposed to be a concise way to write a conditional, but the result was a disaster. Can you guess the output ?

Here it is : $VAR1 = { 'true' => 'f' };

This really seems totally insane : something is assigned to the 'true' slot of the hash, but the value comes from what was supposed to be in the 'false' slot !

OK, the ternary expression above is wrong, because the '?:' operator has higher precedence than '=', so one should really write

$bool ? ($h{true} = 't') : ($h{false} = 'f');

But how comes that perl issues no error, no warning, and happily produces a very strange result ? It seems that both sides of the conditional are executed simultaneously, and collapse in a mysterious way.

I tried running the script through B::Deparse to understand how it was parsed, but the output was exactly like the original source, so it really seems to be legal Perl !

It really took me a while until the 'aha' moment that made me realize that because of right-associativity, and because conditional expressions can be lvalues, and because "Unlike in C, the scalar assignment operator produces a valid lvalue" (perlop dixit), this was parsed as

($bool ? ($h{true} = 't') : $h{false}) = 'f';

So the $bool test chooses an lvalue between $h{true} and $h{false}, and it doesn't matter that this lvalue is first assigned a 't', because later the main assignment puts a 'f' into it.

Obvious, isn't it ?

Saturday, September 19, 2009

Learning architecture and design

Matt Trout asks about how people learned what they know about architecture and design.

As far as I am concerned, I've always been more or less interested in that subject, but only started to study it more seriously about ten years ago, when I left the academic research world and started writing real software instead of writing papers about software.

So where to go when one is interested in design ? One source of information is books. My sources are quite similar to the ones cited in Matt's article. Currently I'm reading Beautiful Architecture (actually I bought it at YAPC::EU::09), which I enjoy very much because the articles are of high quality and cover a vast territory. The previous book in that series Beautiful Code is also worth reading, although a bit less interesting in my taste. Of course I also read a couple of books about design patterns ... but I'll blog another time about those.

Despite the fact that they seem to sell well, books on design are are not numerous ... probably because they are so hard to write ! I mean, writing a regular technical book is hard; producing good designs is hard; so writing a good book that highlights the design process is necessarily even harder. Actually, books mentioned above are never an organized discourse starting at A and ending at Z; what they do is supply a catalog in which the reader can grab interesting ideas, and that's probably the best any book on design can do.

Which brings us to the point : books on design are nice, they open your mind, but that's seldom the place where you really learn. Design is acquired by practice, using or reading other people's designs, and then doing your own through trial and error. It reminds me of my counterpoint courses : although there are a few recipes, it's only after having studied a dozen Bach fugues and having painfully written two or three in the same style that one really understands what it means to design a fugue.

So I enjoy browsing through technical manuals and APIs of many components, even if I'll never use them. For example, it's quite instructive to study the difference between the object models of Microsoft Word and Microsoft Excel, two products in the same family, but with huge differences in design. Word is an infamous amalgamation of fuzzy concepts (anybody ever understood how automatic numbering works ?), while Excel is a beautiful join of many powerful features into a single orderly framework.

Here is a list of some designs that I considered particularly inspiring:

  • the NeXT operating system and programming environment had a nice language (Objective-C) and a beautiful OO architecture, with a generic notion of object inspector for editing object properties ... really a joy to use and program. Too bad all of this disappeared for lack of market penetration.
  • when Netscape first proposed to integrate Javascript into Web pages so that they could become dynamic, I had a "wow" moment : this opened so many possibilities ! Besides, the documentation was extremely well-written (many years after, it's still a useful reference, especially the chapters on how to exploit prototype-based inheritance).
  • in the same vein, I had another "wow" moment one or two years later when Internet Explorer first came with the notion of manipulating the DOM through scripting (initial versions of Javascript in Netscape could not do that). Again, this opened a whole new world, and the API was quite clean and very well documented (not respecting standards is another story). By the way, at that time the MSDN library site was really cool, with support for keyboard arrow command while navigating through the documentation tree --- later on they moved to .NET technology, and were no longer able to support keyboard navigation !
  • the Apache architecture is amazingly well-thought for extensibility, with its clean separation of each phase in the request lifecycle, and the possibility to to insert hooks in each of those phases. Actually I didn't study Apache directly, but only through mod_perl, which exposes almost everything of the Apache API to Perl programming, and is another piece of amazing design. I must say, however, that mod_perl has a peculiar way of doing OO, through a kind of home-made mixing of packages into common namespaces, which for the time was quite clever but took me some time to understand. I guess this would all be written with "roles" if redesigned in modern Perl.
  • several important CPAN modules would be worth discussing here, but that would bring us too far ... maybe later in another blog entry. Let me just state that every time I came to discover another module of Andy Wardley, I felt a sense of beauty : to me, Template Toolkit, AppConfig, Pod::POM