A couple of months ago while in New York I got a call from a friend in need
of some Haskell help. He was working on a project for a Comp. Bio course and he
needed to implement the Knuth Shuffle, but he was having a bit of touble using
Haskell arrays. His problem involved some type error that I couldn’t tackle
over the phone, so it had to wait until later.
Once I got home, I got onto Wikipedia and looked up Knuth Shuffle.
Seeing it was an inherently imperative algorithm, I wasn’t totally surprised my
friend hit a few bumps while trying to translate it to Haskell. I decided to go
ahead and implement my own version so I’d have a better understanding of what he
was doing. I called him later that night to help him fix the problem, but by
that time he had figured it out.
The implementation I wrote sat around on my hard drive untouched for a couple
months until I rediscovered it last week. For your viewing
pleasure, here it is:
module Permutation ( permute ) where
import Data.Array.MArray
import Data.Array.IO
import System.Random
permute :: [a] -> IO [a]
permute xs =
let len = length xs in
do a <- newListArray (0, len 1) xs
mapM_ (\i -> randomRIO (i, len 1) >>= swap a i) [0..len1]
getElems a
swap :: (Ix i) => IOArray i a -> i -> i -> IO ()
swap a i j = do e1 <- readArray a j
e2 <- readArray a i
writeArray a i e1
writeArray a j e2
Looking this code over, a question came to mind: can I pull the
permute function out of the IO monad by using
unsafePerformIO? The documentation
page for unsafePerformIO lays down some cursory guidelines for its
use:
This is the “back door” into the IO monad, allowing
IO computation to be performed at any time. For this to be safe,
the IO computation should be free of side effects and independent
of its environment.
If the I/O computation wrapped in unsafePerformIO performs side
effects, then the relative order in which those side effects take place
(relative to the main I/O trunk, or other calls to unsafePerformIO) is
indeterminate.
The second part of this disclaimer points out the essential problem with
allowing mutation in a lazy language: without any guarantee of when an
expression will be evaluated, one does not know when or in what order side
effects will take place. Hence the IO monad, which requires the
programmer sequence computations that may produce side effects or may depend on some mutable state.
But what if you don’t care when a side effect takes place? What if you don’t
care when a computation is performed relative to other side effects?
unsafePerformIO in essence allows you to say exactly that about
a computation; it removes that computation from the IO monad, thus
absolving the programmer of the task of specifying when, relative to other
computations, the computation should be performed. However the documentation
for this function tells us it is only “safe” if, i.e., it is necessary that,
“the IO computation … be free of side effects and independent of
its environment”. This certainly seems like a sufficient condition; if I can say
that about a computation, then it certainly doesn’t matter when or in what
relative order the computation is performed. Is it necessary? No, and I
belive the permute function above proves it.
permute does mutate global state: the state of Haskell’s random
number generator (stored in theStdGen) is modified by each call
to randomRIO. It is dependent on its environment: the value in
theStdGen determines the values produced each time
randomRIO is called. Despite this it would be absurd to claim
the state of Haskell’s random number generator matters to me each time I want
to permute a list, or that it matters when I modify the state of the random
number generator relative to other computations. Therefore, I can define
permute as follows:
permute :: [a] -> [a]
permute l = unsafePerformIO $
let len = length l in
do a <- newListArray (0, len 1) l
mapM_ (\i -> randomRIO (i, len 1) >>= swap a i) [0..len1]
getElems a
A few more issues need to be addressed. The documentation page for
unsafePerformIO mentions three precautions one must take when
compiling a module using this function. All of them address unintended behavior
that arises due to optimizations performed by GHC (under the assumption of referential transparency).The NOINLINE
pragma is essential. It doesn’t seem as though common subexpression elimination
or let-lifting will be an issue in the module. However, uses of permute could
fall victim to either of these optimizations. As an example, consider the
following expression:
[permute [1..4], permute [1..4]]
If this expression is in some compiled module, GHC could potentially (almost
certainly) identify the common subexpressions and transform the expression
it into the following:
let e = permute [1..4] in [e, e]
Obviously, this wouldn’t be a good thing, since instead of getting two (likely)
distinct permutations, one would always get the same permutation. To
avoid this, use the -fno-cse option
when compiling such a module.
The exception
A possible objection one might raise to all of this is that it is possible to
set the state of Haskell’s random number generator by using the function
setStdGen. After the state of the generator is set, the generator
ceases to be “random” and becomes observably deterministic and so it would
matter when this action is performed relative to calls to permute.
Such an objection I belive is right, or at least tenable. So don’t do it!