Sunday, July 15, 2007

Haskell Mindset

There are several things an imperative programmer needs to address before fully getting to grips with Haskell:
  • It is lazy. X is not evaluated 'when the program' reaches 'X = blah'. In fact it might never be.
  • Changing state is not paramount. Forget the pigeon holes.
  • Function arguments are not always necessary.
  • Types can be inferred.
A good article to read about these is "Why Haskell Matters"

All of the above are possible in an imperative language (or just about) but Haskell brings these to the fore and provides a syntax and semantics that is built around them. This encourages other ways of thinking that are different from those encouraged by the OO paradigm.

One key concept in Haskell is that of Monads. Some people really struggle with these and there a numerous tutorials on it. Some of these tutorials are good and some are bad. The one that worked for me was the one by Jeff Newburn. It helps if you have the above concepts in mind before embarking on the journey.

Monads are also about sequencing of computations. In imperative languages, there is typically only one way that operations are sequenced. This can be encoded in Haskell using a programmer defined Monad stack. See the description of HJS for an example of a simple stack for JavaScript that provides support for IO, state and exceptions.


As the Haskell Wiki says

Haskell is a general purpose, purely functional programming language featuring static typing, higher order functions, polymorphism, type classes, and monadic effects. Haskell compilers are freely available for almost any computer.

Some of my contributions to the Wiki and Hackage are:

HJS - A JavaScript interpreter.
Enterprise Haskell - Requirements for the use of Haskell in the real world.
HGene - The beginnings of a geneology program in Haskell.

Lists considered harmful

A quick post inspired by the paper "Stream Fusion From Lists to Streams to Nothing at All" All programming languages include features for lists/collections. The problem with your bog standard list is that there is no tie-back to what built the list. This means that the opportunity for any optimisation that you could get by fusing the creation of the list with its use, is lost.

Conversations with a type checker

Haskell encourages a high level of thought prior to putting down characters. One feature that has been noticed is that once written a Haskell program will usually do the right thing. Haskell moves the task from punching out characters to thinking about what you are writing and, importantly, getting the types consistent across the whole program.

Haskell does not force you to specify a type for everything. This enables you to develop a function iteratively and then to ask Haskell what it infers the type of the function to be. As an example, suppose you had a higher level function that you knew the general layout of. You know that the function calls other functions but are not sure what the types of these functions are. You can get an idea of their type by writing the top level function as if the lower functions where arguments to the higher function and then asking Haskell for the type of the top level function. The type signature would include information about the lower level functions.

Customisation or Configuration?

Following on from Peter Batty's topic and as something that has bothered me over the last year ...

The usual definition given for the configuration vs customsation dichotomy in terms of non-code/code is a good starting point but there is more to it than that. Often it is helpful to look at the context and issues around a term rather than get hot and bothered about its definition. Here are a couple of other ways of looking at it:

Firstly, there is, of course, the Total Cost of Ownership issue. Customisation leads to unpredictable total cost of ownership for systems. This is made up of the cost of building the customisation in the first place, the cost of supporting the customisation (including the cost of handling problems at the boundary of the customisation and the product) and, the scary one, the cost of migrating the customisation to new versions of the product. All of these are high risk and involve the user in areas that are not core to the business. An option here is to out source the activity and the risk.

The other way of looking at the distinction is in terms of what happens to my configuration and what happens to my customisation when the product version changes. With configuration the expectation is that the configuration is migrated seamlessly to the new platform, with customisation this is not expected to be the case. A parallel distinction is in what is supported and what is not - the product is expected to support my changing these this (configuration) but not that (customisation).

Both of the above avoid the need to use code/non-code to distinguish between customisation and configuration and this is good because there are times when 'changing a few parameters' is either too cumbersome or more complex logic is required. Scripting provides a means to do so and scripting through some form of DSL should not in principle be excluded from what is configuration. However in practice the way that scripting is currently managed is not going to cut the mustard. The difficulty is that the 'scripts' usually sit outside the system and are stored as flat ASCII. This leads to problems when the product, for instance, changes the name of a table, adds a new parameter to a function or even removes a function.

Amongst the things that people look for when they require the product to be extended are:
  • Task Automation - This will involve getting data from the GUI (for instance the currently selected object), pushing data into the GUI (putting data into a text field) and triggering actions (pressing the 'Insert' button).
  • Business Rule Validation - On entering data into the field, the system will fire any 'hooked-in' validation rules.
  • Loading Data - This is either data from legacy systems as part of initial migration or on going alignment of data between systems.
  • Dumping Data - This is typically in the form of a report, but may also be a data export to another systems.
Now all of the above do not necessarily need scripting. For the last two, there are standard ways of doing this for simple tasks and there are 'data alignment' products on the market if the task is complex. For the first, a macro/record facility would meet most needs and for the second, many validation rules can be configured into a rules engine. An observation about all these solutions is that they are all different; require different tools and skills (even if those tools are off the shelf). The temptation is to go for the 'scripting language' path as this can meet all of the above and more.

As Peter points out though, as the application becomes more tuned to a particular domain, it is expected that it includes all the things that the customer wants. Another way of putting this is that the product encodes 'best practice'.

About Me

Melbourne, Australia
I work for GE in Melbourne Australia. All views do not necessarily represent GE.