It's salt

... because it's about time I got a "blog."

Who’s Afraid of the Big Ol’ unsafePerformIO?

Note. I wrote this post a long time ago. Now I know more about Haskell, so I’ve reworked and republished it on Computationally Endowed, my new corner of the internet. You can find the new version of this post at its new location: Random Code, Permutations, and unsafePerformIO.

A couple of months ago while in New York I got a call from a friend in need of some Haskell help. He was working on a project for a Comp. Bio course and he needed to implement the Knuth Shuffle, but he was having a bit of touble using Haskell arrays. His problem involved some type error that I couldn’t tackle over the phone, so it had to wait until later.

Once I got home, I got onto Wikipedia and looked up Knuth Shuffle. Seeing it was an inherently imperative algorithm, I wasn’t totally surprised my friend hit a few bumps while trying to translate it to Haskell. I decided to go ahead and implement my own version so I’d have a better understanding of what he was doing. I called him later that night to help him fix the problem, but by that time he had figured it out.

The implementation I wrote sat around on my hard drive untouched for a couple months until I rediscovered it last week. For your viewing pleasure, here it is:

module Permutation ( permute ) where

import Data.Array.MArray
import Data.Array.IO

import System.Random


permute :: [a] -> IO [a]
permute xs =
  let len = length xs in
    do a <- newListArray (0, len - 1) xs
       mapM_ (\i -> randomRIO (i, len - 1) >>= swap a i) [0..len-1]
       getElems a

swap :: (Ix i) => IOArray i a -> i -> i -> IO ()
swap a i j = do e1 <- readArray a j
                e2 <- readArray a i
                writeArray a i e1
                writeArray a j e2

Looking this code over, a question came to mind: can I pull the permute function out of the IO monad by using unsafePerformIO? The documentation page for unsafePerformIO lays down some cursory guidelines for its use:

This is the “back door” into the IO monad, allowing IO computation to be performed at any time. For this to be safe, the IO computation should be free of side effects and independent of its environment.

If the I/O computation wrapped in unsafePerformIO performs side effects, then the relative order in which those side effects take place (relative to the main I/O trunk, or other calls to unsafePerformIO) is indeterminate.

The second part of this disclaimer points out the essential problem with allowing mutation in a lazy language: without any guarantee of when an expression will be evaluated, one does not know when or in what order side effects will take place. Hence the IO monad, which requires the programmer sequence computations that may produce side effects or may depend on some mutable state.

But what if you don’t care when a side effect takes place? What if you don’t care when a computation is performed relative to other side effects? unsafePerformIO in essence allows you to say exactly that about a computation; it removes that computation from the IO monad, thus absolving the programmer of the task of specifying when, relative to other computations, the computation should be performed. However the documentation for this function tells us it is only “safe” if, i.e., it is necessary that, “the IO computation … be free of side effects and independent of its environment”. This certainly seems like a sufficient condition; if I can say that about a computation, then it certainly doesn’t matter when or in what relative order the computation is performed. Is it necessary? No, and I belive the permute function above proves it.

permute does mutate global state: the state of Haskell’s random number generator (stored in theStdGen) is modified by each call to randomRIO. It is dependent on its environment: the value in theStdGen determines the values produced each time randomRIO is called. Despite this it would be absurd to claim the state of Haskell’s random number generator matters to me each time I want to permute a list, or that it matters when I modify the state of the random number generator relative to other computations. Therefore, I can define permute as follows:

permute :: [a] -> [a]
permute l = unsafePerformIO $
  let len = length l in
    do a <- newListArray (0, len - 1) l
       mapM_ (\i -> randomRIO (i, len - 1) >>= swap a i) [0..len-1]
       getElems a

A few more issues need to be addressed. The documentation page for unsafePerformIO mentions three precautions one must take when compiling a module using this function. All of them address unintended behavior that arises due to optimizations performed by GHC (under the assumption of referential transparency).The NOINLINE pragma is essential. It doesn’t seem as though common subexpression elimination or let-lifting will be an issue in the module. However, uses of permute could fall victim to either of these optimizations. As an example, consider the following expression:

[permute [1..4], permute [1..4]]

If this expression is in some compiled module, GHC could potentially (almost certainly) identify the common subexpressions and transform the expression it into the following:

let e = permute [1..4] in [e, e]

Obviously, this wouldn’t be a good thing, since instead of getting two (likely) distinct permutations, one would always get the same permutation. To avoid this, use the -fno-cse option when compiling such a module.

The exception

A possible objection one might raise to all of this is that it is possible to set the state of Haskell’s random number generator by using the function setStdGen. After the state of the generator is set, the generator ceases to be “random” and becomes observably deterministic and so it would matter when this action is performed relative to calls to permute. Such an objection I belive is right, or at least tenable. So don’t do it!