Poundstone

Sunday, February 8, 2009

Corporate Wikis - Information At the Speed of Thought?

Businesses survive on a diet of information. Information needs to be easily accessible and well organised. Those who can modify this information need to be able to do this easily but within a framework of revision control and editorial control.

Businesses remain too document centric. Information is locked in boxes with poor linkage to other information. The default process is to send documents to colleagues as email attachments.

Information is put onto intranet sites but rather than as a standard HTML page, the information is embedded into a short document or slide pack. Downloading the document and activating the appropriate reader application takes long enough for the reader to become disengaged.

Documents are important for some situations particular those relating to contractual, technical specifications etc.

A solution that offers itself is a corporate wiki. Wikis are not suitable for all documents and the following issues present themselves. If they cannot be resolved then a document needs to be used.

Issues that need to be considered before implementing a wiki:
  • Platform needs to be fast (wiki after all means quick). Hardware and network access needs to be swift. Navigating, search and editing needs to be at the 'speed of thought'.
  • Requires some organisational and editorial structure.
  • Offline working.
  • Printing/document-forming.
  • Baselining. Whilst wikis do track updates it is often required that baselines be formed. For example to align with a particular version of a product or a point in time (ie end of quarter).

Monday, November 3, 2008

GIS with Haskell 1

It's time to bite the bullet and do some GIS Haskelling.

My first project is to develop a simple map server in Haskell. Here are the ingredients:
  • PostgresSQL + PostGIS
  • Some data to put into the database. For this I sourced some Australian suburb boundaries.
  • A library for manipulation GIS geometry, GEOS. In particular this provides functions to parse WKT strings from the database into geometry structures.
  • Haskell CGI package
  • Haskell bindings to the GEOS library.
  • Extension to HaXML adding SVG combinators.
The definition of the map is specified using a set of combinators resulting in a DSL looking a lot like the MapFile format of MapServer.

The following gives a map showing the suburbs centred on the city of Melbourne.
ex0 = map (connection "host=localhost user=postgres password=postgres dbname=australia"
`u` size 700 700
`u` extents 144.897079467773 (-37.8575096130371) 0.16821284479996734 0.1410504416999956
`u` layer ( table "suburbs"
`u` geometry "the_geom"
`u` klass ( style ( outlinecolour 255 0 0 1
`u` colour 100 255 100 1))
))

The resulting SVG file when viewed looks like:
The components that make up the map definition:
  • connection - supplies the database connection parameters,
  • size - is the size of the SVG output to be generated.
  • extents - is the extents of the map in world coordinates.
  • layer - this defines a layer. The table property defines the database table to use, the geometry property defines the column name to use. The klass parameter defines the style to use for drawing the geometry.
Layers are the key to this. Basic layers associate a geometry from the database with a style to be used when drawing the geometry.

A slightly more complex example with two layers is the following:
ex1 = map (connection "host=localhost user=postgres password=postgres dbname=australia"
`u` size 700 700
`u` extents 144.897079467773 (-37.8575096130371) 0.16821284479996734 0.1410504416999956
`u` layer ( table "suburbs"
`u` geometry "the_geom"
`u` labelitem "name_2006"
`u` klass ( style ( outlinecolour 255 0 0 1
`u` colour 100 255 100 1))
`u` label (colour 255 255 0 1))
`u` layer (table "suburbs"
`u` geometry "geomunion(the_geom)"
`u` klass ( style ( outlinecolour 0 0 0 1 `u` width 4)))
)

This is the same as before but with a new layer that provides a thick border around the edge of all the suburb boundaries:

Next steps are to source some population data, to colour code the suburbs depending on population and to include a legend.

Friday, September 26, 2008

Financial Contracts, Haskell and Probability

This article brings together the ideas presented in the paper 'How to write a financial contract' (HWFC) and Martin Erwig's FPF module.

We are going to deal with a simple but common situation in finance - if I have a contract where I am going to receive $100 dollars in 3 years time what is that 'contract' worth to me now. How much would I pay to obtain that contract? In order to calculate the worth we need to consider what else I would do with the money and the most obvious action is to deposit it into a bank account that attracts interest.

The question is reposed then as: if I put x into a bank account then what is x if the final amount after 3 years is $100. This is easy if the interest is fixed, not so easy if it varies.

This blogpiece will provide a fragment of the implementation of HWFC that answers the above.

As this is literate Haskell some preliminaries:

> module Main where
>
> import Probability

HWFC introduces the concept of a value process which is a function from time to a random variable. We shall equate a random variable with a probability distribution and a definition of a value process is:

> type PR a = Int -> Dist a

For our interest rate model let us say that from year to the next the interest rate can either stay the same, increase by 1% or decrease by 1% all with equal likelihood. We can express this as:

> interest :: Floating a => a -> PR a
> interest i n = (n *. one) i where one start = uniform [start+1/100,start,start-1/100]

The *. function allows us to repeat a random process n times. The process here is to start with an interest rate and to move to the next years rate.

If this year the rate is 10%, after a couple of years the distribution looks like:

interest 10 2
10.0 33.3%
9.99 22.2%
10.01 22.2%
9.98 11.1%
10.02 11.1%

Let us put that to one side and look at the contracts side of things. I will short circuit the approach in the paper and dive directly into the valuation

> data Obs a = O { evalObs :: PR a }
>
> konst k = O (\t -> certainly k)
> lift f (O pr) = O (\t -> fmap f (pr t))
> lift2 f (O pr1) (O pr2) = O (\t -> joinWith f (pr1 t) (pr2 t))
> date = O (\t -> certainly t)
>
> data Contract = C { evalContract :: PR Float }
> cconst k = C $ \ _ -> certainly k
> when o c = C $ disc (evalObs o) (evalContract c)
>
>
> at t = lift2 (==) date (konst t)
> zcb t x = when (at t) x
>
> whenFirstTrue :: PR Bool -> Int
> whenFirstTrue prb = f 0 where f i = if prb i == certainly True then i else f (i+1)
>
> baseRate = 10

This is a value process such that if when the first argument is true, return the second otherwise calculated the discounted value of the first argument.

> disc :: PR Bool -> PR Float -> PR Float
> disc prb prd t = if prb t == certainly True then prd t else let s = prd t
> t' = whenFirstTrue prb
> in discount baseRate s (t'-t)
>
> discount :: Floating a => a -> Dist a -> PR a
> discount int final time = let intspread = interest int time
> in joinWith (\i s -> s / (1+i/100)) intspread final
>

Lets start with a trivial example to make sure that things are working as planned

> ex1 = cconst 100

The value of this contract, as a random variable, is:

evalContract ex1 0

100.0 100.0

> ex2 = zcb 3 (cconst 100)

The value of this contact is:

evalContract ex2 0

90.90909 25.9%
90.900826 22.2%
90.91736 22.2%
90.89256 11.1%
90.92562 11.1%
90.88431 3.7%
90.93389 3.7%

The PFP library has a function to provide the expected value which can be ask of a distribution. The expected value of our contract is:

expected $ evalContract ex2 0

90.9091

Sunday, July 15, 2007

Haskell Mindset

There are several things an imperative programmer needs to address before fully getting to grips with Haskell:
  • It is lazy. X is not evaluated 'when the program' reaches 'X = blah'. In fact it might never be.
  • Changing state is not paramount. Forget the pigeon holes.
  • Function arguments are not always necessary.
  • Types can be inferred.
A good article to read about these is "Why Haskell Matters"

All of the above are possible in an imperative language (or just about) but Haskell brings these to the fore and provides a syntax and semantics that is built around them. This encourages other ways of thinking that are different from those encouraged by the OO paradigm.

One key concept in Haskell is that of Monads. Some people really struggle with these and there a numerous tutorials on it. Some of these tutorials are good and some are bad. The one that worked for me was the one by Jeff Newburn. It helps if you have the above concepts in mind before embarking on the journey.

Monads are also about sequencing of computations. In imperative languages, there is typically only one way that operations are sequenced. This can be encoded in Haskell using a programmer defined Monad stack. See the description of HJS for an example of a simple stack for JavaScript that provides support for IO, state and exceptions.

Haskell

As the Haskell Wiki says

Haskell is a general purpose, purely functional programming language featuring static typing, higher order functions, polymorphism, type classes, and monadic effects. Haskell compilers are freely available for almost any computer.


Some of my contributions to the Wiki and Hackage are:

HJS - A JavaScript interpreter.
Enterprise Haskell - Requirements for the use of Haskell in the real world.
HGene - The beginnings of a geneology program in Haskell.

Lists considered harmful

A quick post inspired by the paper "Stream Fusion From Lists to Streams to Nothing at All" All programming languages include features for lists/collections. The problem with your bog standard list is that there is no tie-back to what built the list. This means that the opportunity for any optimisation that you could get by fusing the creation of the list with its use, is lost.

Conversations with a type checker

Haskell encourages a high level of thought prior to putting down characters. One feature that has been noticed is that once written a Haskell program will usually do the right thing. Haskell moves the task from punching out characters to thinking about what you are writing and, importantly, getting the types consistent across the whole program.

Haskell does not force you to specify a type for everything. This enables you to develop a function iteratively and then to ask Haskell what it infers the type of the function to be. As an example, suppose you had a higher level function that you knew the general layout of. You know that the function calls other functions but are not sure what the types of these functions are. You can get an idea of their type by writing the top level function as if the lower functions where arguments to the higher function and then asking Haskell for the type of the top level function. The type signature would include information about the lower level functions.

Customisation or Configuration?

Following on from Peter Batty's topic and as something that has bothered me over the last year ...

The usual definition given for the configuration vs customsation dichotomy in terms of non-code/code is a good starting point but there is more to it than that. Often it is helpful to look at the context and issues around a term rather than get hot and bothered about its definition. Here are a couple of other ways of looking at it:

Firstly, there is, of course, the Total Cost of Ownership issue. Customisation leads to unpredictable total cost of ownership for systems. This is made up of the cost of building the customisation in the first place, the cost of supporting the customisation (including the cost of handling problems at the boundary of the customisation and the product) and, the scary one, the cost of migrating the customisation to new versions of the product. All of these are high risk and involve the user in areas that are not core to the business. An option here is to out source the activity and the risk.

The other way of looking at the distinction is in terms of what happens to my configuration and what happens to my customisation when the product version changes. With configuration the expectation is that the configuration is migrated seamlessly to the new platform, with customisation this is not expected to be the case. A parallel distinction is in what is supported and what is not - the product is expected to support my changing these this (configuration) but not that (customisation).

Both of the above avoid the need to use code/non-code to distinguish between customisation and configuration and this is good because there are times when 'changing a few parameters' is either too cumbersome or more complex logic is required. Scripting provides a means to do so and scripting through some form of DSL should not in principle be excluded from what is configuration. However in practice the way that scripting is currently managed is not going to cut the mustard. The difficulty is that the 'scripts' usually sit outside the system and are stored as flat ASCII. This leads to problems when the product, for instance, changes the name of a table, adds a new parameter to a function or even removes a function.

Amongst the things that people look for when they require the product to be extended are:
  • Task Automation - This will involve getting data from the GUI (for instance the currently selected object), pushing data into the GUI (putting data into a text field) and triggering actions (pressing the 'Insert' button).
  • Business Rule Validation - On entering data into the field, the system will fire any 'hooked-in' validation rules.
  • Loading Data - This is either data from legacy systems as part of initial migration or on going alignment of data between systems.
  • Dumping Data - This is typically in the form of a report, but may also be a data export to another systems.
Now all of the above do not necessarily need scripting. For the last two, there are standard ways of doing this for simple tasks and there are 'data alignment' products on the market if the task is complex. For the first, a macro/record facility would meet most needs and for the second, many validation rules can be configured into a rules engine. An observation about all these solutions is that they are all different; require different tools and skills (even if those tools are off the shelf). The temptation is to go for the 'scripting language' path as this can meet all of the above and more.

As Peter points out though, as the application becomes more tuned to a particular domain, it is expected that it includes all the things that the customer wants. Another way of putting this is that the product encodes 'best practice'.

Saturday, January 6, 2007

The name ...

Just started to read 'The map that changed the world' by Simon Winchester. An easy-read historical book about William Smith who single handedly developed the first geological map of most of Britain. Poundstones were used by farmers for measuring weight rather than buying a metallic weight. Where William grew up many farmers choose flattened circular stones which were in fact fossils - Clypeus ploti.

About Me

Melbourne, Australia
I work for GE in Melbourne Australia. All views do not necessarily represent GE.