Interpreting Brainfuck in Haskell

January 19, 2025
A twenty-one minute read
4 💬 5 🗣️ ❤️ 11 🔁
Tags: programmingprogramming-languagesparsinginterpreterszippershaskellbrainfuck

Writing an interpreter for Brainfuck is almost a rite of passage for any programming language implementer, and it’s my turn now. In this post, we’ll write not one but four Brainfuck interpreters in Haskell. Let’s go!

Introduction

Brainfuck (henceforth BF) is the most famous of esoteric programming languages. Its fame lies in the fact that it is extremely minimalistic, with only eight instructions, and very easy to implement. Yet, it is Turing-complete and as capable as any other programming language¹. Writing an interpreter for BF is a fun exercise, and so there are hundreds, maybe even thousands of them. Since BF is very verbose, optimizing BF interpreters is almost a sport, with people posting benchmarks of their creations. I can’t say that what I have in this post is novel, but it was definitely a fun exercise for me.

BF has eight instructions of one character each. A BF program is a sequence of these instructions. It may have other characters as well, which are treated as comments and are ignored while executing. An instruction pointer (IP) points at the next instruction to be executed, starting with the first instruction. The instructions are executed sequentially, except for the jump instructions that may cause the IP to jump to remote instructions. The program terminates when the IP moves past the last instruction.

BF programs work by modifying data in a memory that is an array of at least 30000 byte cells initialized to zero. A data pointer (DP) points to the current byte of the memory to be modified, starting with the first byte of the memory. BF programs can also read from standard input and write to standard output, one byte at a time using the ASCII character encoding.

The eight BF instructions each consist of a single character:

>: Increment the DP by one to point to the next cell to the right.
<: Decrement the DP by one to point to the next cell to the left.
+: Increment the byte at the DP by one.
-: Decrement the byte at the DP by one.
.: Output the byte at the DP.
,: Accept one byte of input, and store its value in the byte at the DP.
[: If the byte at the DP is zero, then instead of moving the IP forward to the next command, jump it forward to the command after the matching ] command.
]: If the byte at the DP is nonzero, then instead of moving the IP forward to the next command, jump it back to the command after the matching [ command.

Each [ matches exactly one ] and vice versa, and the [ comes first. Together, they add conditions and loops to BF.

Some details are left to implementations. In our case, we assume that the memory cells are signed bytes that underflow and overflow without errors. Also, accessing the memory beyond array boundaries wraps to the opposite side without errors.

For a taste, here is a small BF program that prints Hello, World! when run:

+++++++++++[>++++++>+++++++++>++++++++>++++>+++>+<<<<<<-]>+++
+++.>++.+++++++..+++.>>.>-.<<-.<.+++.------.--------.>>>+.>-.

As you can imagine, interpreting BF is easy, at least when doing it naively. So instead of writing one interpreter, we are going to write four, with increasing performance and complexity.

Setup

First, some imports:

{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Arrow ((>>>))
import Control.Monad (void)
import Data.Bits (shiftR, (.&.))
import Data.ByteArray qualified as BA
import Data.Char (chr, ord)
import Data.Functor (($>))
import Data.Int (Int8)
import Data.Kind (Type)
import Data.Vector qualified as V
import Data.Vector.Storable.Mutable qualified as MV
import Data.Word (Word16, Word8)
import Foreign.Ptr (Ptr, castPtr, minusPtr, plusPtr)
import Foreign.Storable qualified as S
import System.Environment (getArgs, getProgName)
import System.Exit (exitFailure)
import System.IO qualified as IO
import Text.ParserCombinators.ReadP qualified as P

We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the memory and vector libraries.

We abstract the interpreter interface as a typeclass:

class Interpreter a where
  data Program a :: Type
  parse :: String -> Program a
  interpret :: Memory -> Program a -> IO ()

An Interpreter is specified by a data type Program and two functions: parse parses a string to a Program, and interpret interprets the parsed Program.

For modelling the mutable memory, we use a mutable unboxed IOVector of signed bytes (Int8) from the vector package. Since our interpreter runs in IO, this works well for us. The DP hence, is modelled as a index in this vector, which we name the MemIdx type.

newtype Memory = Memory {unMemory :: MV.IOVector Int8}
type MemIdx = Int

newMemory :: Int -> IO Memory
newMemory = fmap Memory . MV.new

memorySize :: Memory -> Int
memorySize = MV.length . unMemory

readMemory :: Memory -> MemIdx -> IO Int8
readMemory = MV.unsafeRead . unMemory

writeMemory :: Memory -> MemIdx -> Int8 -> IO ()
writeMemory = MV.unsafeWrite . unMemory

modifyMemory :: Memory -> (Int8 -> Int8) -> MemIdx -> IO ()
modifyMemory = MV.unsafeModify . unMemory

nextMemoryIndex :: Memory -> MemIdx -> MemIdx
nextMemoryIndex memory memIdx = (memIdx + 1) `rem` memorySize memory

prevMemoryIndex :: Memory -> MemIdx -> MemIdx
prevMemoryIndex memory memIdx = (memIdx - 1) `mod` memorySize memory

We wrap the IOVector Int8 with a Memory newtype. newMemory creates a new memory array of bytes initialized to zero. memorySize returns the size of the memory. readMemory, writeMemory and modifyMemory are for reading from, writing to and modifying the memory respectively. nextMemoryIndex and prevMemoryIndex increment and decrement the array index respectively, taking care of wrapping at boundaries.

Now we write the main function using the Interpreter typeclass functions:

main :: IO ()
main = do
  IO.hSetBuffering IO.stdin IO.NoBuffering
  IO.hSetBuffering IO.stdout IO.LineBuffering
  progName <- getProgName
  let usage = "Usage: " <> progName <> " -(s|a|b|o) <bf_file>"

  getArgs >>= \case
    [interpreterType, fileName] -> do
      code <- filter (`elem` "+-.,><[]") <$> readFile fileName
      memory <- newMemory 30000
      parseAndInterpret memory code usage interpreterType
    _ -> exitWithMsg usage
  where
    parseAndInterpret memory code usage = \case
      "-s" -> interpret @StringInterpreter memory $ parse code
      "-a" -> interpret @ASTInterpreter memory $ parse code
      "-b" -> interpret @BytecodeInterpreter memory $ parse code
      "-o" -> interpret @OptimizingBytecodeInterpreter memory $ parse code
      t -> exitWithMsg $ "Invalid interpreter type: " <> t <> "\n" <> usage

    exitWithMsg msg = IO.hPutStrLn IO.stderr msg >> exitFailure

The main function calls the parse and interpret functions for the right interpreter with a new memory and the input string read from the file specified in the command line argument. We make sure to filter out non-BF characters when reading the input file.

With the setup done, let’s move on to our first interpreter.

String Interpreter

A BF program can be interpreted directly from its string representation, going over the characters and executing the right logic for them. But strings in Haskell are notoriously slow because they are implemented as singly linked-lists of characters. Indexing into strings has $O (n)$ time complexity, so it is not a good idea to use them directly. Instead, we use a char Zipper².

data StringInterpreter

instance Interpreter StringInterpreter where
  data Program StringInterpreter = ProgramCZ CharZipper
  parse = ProgramCZ . czFromString
  interpret memory (ProgramCZ code) = interpretCharZipper memory code

Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus³.

data CharZipper = CharZipper
  {czLeft :: String, czFocus :: Maybe Char, czRight :: String}

czFromString :: String -> CharZipper
czFromString = \case
  [] -> CharZipper [] Nothing []
  (x : xs) -> CharZipper [] (Just x) xs

czMoveLeft :: CharZipper -> CharZipper
czMoveLeft = \case
  CharZipper [] (Just focus) right -> CharZipper [] Nothing (focus : right)
  CharZipper (x : xs) (Just focus) right -> CharZipper xs (Just x) (focus : right)
  z -> z

czMoveRight :: CharZipper -> CharZipper
czMoveRight = \case
  CharZipper left (Just focus) [] -> CharZipper (focus : left) Nothing []
  CharZipper left (Just focus) (x : xs) -> CharZipper (focus : left) (Just x) xs
  z -> z

This zipper is a little different from the usual implementations because we need to know when the focus of the zipper has moved out the program boundaries. Hence, we model the focus as Maybe Char. czFromString creates a char zipper from a string. czMoveLeft and czMoveRight move the focus left and right respectively, taking care of setting the focus to Nothing if we move outside the program string.

Parsing the program is thus same as creating the char zipper from the program string. For interpreting the program, we write this function:

interpretCharZipper :: Memory -> CharZipper -> IO ()
interpretCharZipper memory = go 0
  where
    go !memIdx !program = case czFocus program of
      Nothing -> return ()
      Just c -> case c of
        '+' -> modifyMemory memory (+ 1) memIdx >> goNext
        '-' -> modifyMemory memory (subtract 1) memIdx >> goNext
        '>' -> go (nextMemoryIndex memory memIdx) program'
        '<' -> go (prevMemoryIndex memory memIdx) program'
        ',' -> do
          getChar >>= writeMemory memory memIdx . fromIntegral . ord
          goNext
        '.' -> do
          readMemory memory memIdx >>= putChar . chr . fromIntegral
          goNext
        '[' -> readMemory memory memIdx >>= \case
          0 -> go memIdx $ skipRight 1 program
          _ -> goNext
        ']' -> readMemory memory memIdx >>= \case
          0 -> goNext
          _ -> go memIdx $ skipLeft 1 program
        _ -> goNext
      where
        program' = czMoveRight program
        goNext = go memIdx program'

Our main driver here is the tail-recursive go function that takes the memory index and the program as inputs. It then gets the current focus of the program zipper, and executes the BF logic accordingly.

If the current focus is Nothing, it means the program has finished running. So we end the execution. Otherwise, we switch over the character and do what the BF spec tells us to do.

For + and -, we increment or decrement respectively the value in the memory cell at the current index, and go to the next character. For > and <, we increment or decrement the memory index respectively, and go to the next character.

For ,, we read an ASCII encoded character from the standard input, and write it to the memory at the current memory index as a byte. For ., we read the byte from the memory at the current memory index, and write it out to the standard output as an ASCII encoded character. After either cases, we go to the next character.

For [, we read the byte at the current memory index, and if it is zero, we skip right over the part of the program till the matching ] is found. Otherwise, we go to the next character.

For ], we skip left over the part of the program till the matching [ is found, if the current memory byte is non-zero. Otherwise, we go to the next character.

The next two functions implement the skipping logic:

skipRight :: Int -> CharZipper -> CharZipper
skipRight !depth !program
  | depth == 0 = program'
  | otherwise = case czFocus program' of
      Nothing -> error "No matching [ while skipping the loop forward"
      Just '[' -> skipRight (depth + 1) program'
      Just ']' -> skipRight (depth - 1) program'
      _ -> skipRight depth program'
  where
    program' = czMoveRight program

skipLeft :: Int -> CharZipper -> CharZipper
skipLeft !depth !program
  | depth == 0 = czMoveRight program
  | otherwise = case czFocus program' of
      Nothing -> error "No matching ] while skipping the loop backward"
      Just ']' -> skipLeft (depth + 1) program'
      Just '[' -> skipLeft (depth - 1) program'
      _ -> skipLeft depth program'
  where
    program' = czMoveLeft program

The tail-recursive functions skipRight and skipLeft skip over parts of the program by moving the focus to right and left respectively, till the matching bracket is found. Since the loops can contain nested loops, we keep track of the depth of loops we are in, and return only when the depth becomes zero. If we move off the program boundaries while skipping, we throw an error.

That’s it! We now have a fully functioning BF interpreter. To test it, we use these two BF programs: hanoi.bf and mandelbrot.bf.

hanoi.bf solves the Tower of Hanoi puzzle with animating the solution process as ASCII art:

A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf

mandelbrot.bf prints an ASCII art showing the Mandelbrot set:

Mandelbrot set ASCII art by mandelbrot.bf

Both of these BF programs serve as good benchmarks for BF interpreters. Let’s test ours by compiling and running it⁴:

❯ nix-shell -p "ghc.withPackages (pkgs: with pkgs; [vector memory])" \
    --run "ghc --make bfi.hs -O2"
[1 of 2] Compiling Main             ( bfi.hs, bfi.o )
[2 of 2] Linking bfi [Objects changed]
❯ time ./bfi -s hanoi.bf > /dev/null
       29.15 real        29.01 user         0.13 sys
❯ time ./bfi -s mandelbrot.bf > /dev/null
       94.86 real        94.11 user         0.50 sys

That seems quite slow. We can do better.

AST Interpreter

Instead of executing BF programs from their string representations, we can parse them to an Abstract Syntax Tree (AST). This allows us to match brackets only once at parse time, instead of doing it repeatedly at run time. We capture loops as AST nodes, allowing us to skip them trivially.

data ASTInterpreter

instance Interpreter ASTInterpreter where
  data Program ASTInterpreter = ProgramAST Instructions
  parse = ProgramAST . parseToInstrs
  interpret memory (ProgramAST instrs) = interpretAST memory instrs

We represent the BF AST as a Haskell Algebraic Data Type (ADT):

type Instructions = V.Vector Instruction

data Instruction
  = Inc               -- +
  | Dec               -- -
  | MoveR             -- >
  | MoveL             -- <
  | GetC              -- ,
  | PutC              -- .
  | Loop Instructions -- []
  deriving (Show)

There is one constructor per BF instruction, except for loops where the Loop constructor captures both the start and end of loop instructions. We use immutable boxed vectors for lists of instructions instead of Haskell lists so that we can index into them in $O (1)$ .

We use the parse combinator library ReadP to write a recursive-decent parser for BF:

parseToInstrs :: String -> Instructions
parseToInstrs code =
  V.fromList $ case P.readP_to_S (P.many instrParser <* P.eof) code of
    [(res, "")] -> res
    out -> error $ "Unexpected output while parsing: " <> show out

instrParser :: P.ReadP Instruction
instrParser = P.choice
  [ P.char '+' $> Inc,
    P.char '-' $> Dec,
    P.char '>' $> MoveR,
    P.char '<' $> MoveL,
    P.char ',' $> GetC,
    P.char '.' $> PutC,
    Loop . V.fromList <$> P.between (P.char '[') (P.char ']') (P.many instrParser)
  ]

All cases except the loop one are straightforward. For loops, we call the parser recursively to parse the loop body. Note that the parser matches the loop brackets correctly. If the brackets don’t match, the parser fails.

Next, we interpret the BF AST:

interpretAST :: Memory -> Instructions -> IO ()
interpretAST memory = void . interpretInstrs 0 memory

interpretInstrs :: MemIdx -> Memory -> Instructions -> IO MemIdx
interpretInstrs memIdx !memory !program = go memIdx 0
  where
    go !memIdx !progIdx
      | progIdx == V.length program = return memIdx
      | otherwise = case program V.! progIdx of
          Inc -> modifyMemory memory (+ 1) memIdx >> goNext
          Dec -> modifyMemory memory (subtract 1) memIdx >> goNext
          MoveR -> go (nextMemoryIndex memory memIdx) $ progIdx + 1
          MoveL -> go (prevMemoryIndex memory memIdx) $ progIdx + 1
          GetC -> do
            getChar >>= writeMemory memory memIdx . fromIntegral . ord
            goNext
          PutC -> do
            readMemory memory memIdx >>= putChar . chr . fromIntegral
            goNext
          Loop instrs -> readMemory memory memIdx >>= \case
            0 -> goNext
            _ -> interpretInstrs memIdx memory instrs >>= flip go progIdx
      where
        goNext = go memIdx $ progIdx + 1

The AST interpreter code is quite similar to the string interpreter one. This time we use an integer as the IP to index the Instructions vector. All cases except the loop one are pretty much same as before.

For loops, we read the byte at the current memory index, and if it is zero, we skip executing the Loop AST node and go to the next instruction. Otherwise, we recursively interpret the loop body and go to the next instruction, taking care of passing the updated memory index returned from the recursive call to the execution of the next instruction.

And we are done. Let’s see how it performs:

❯ time ./bfi -a hanoi.bf > /dev/null
       14.94 real        14.88 user         0.05 sys
❯ time ./bfi -a mandelbrot.bf > /dev/null
       36.49 real        36.32 user         0.17 sys

Great! hanoi.bf runs 2x faster, whereas mandelbrot.bf runs 2.6x faster. Can we do even better?

Bytecode Interpreter

AST interpreters are well known to be slow because of how AST nodes are represented in the computer’s memory. The AST nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an AST, it jumps all over the memory, causing a slowdown. One solution to this is to convert the AST into a more compact and optimized representation known as Bytecode. That’s what our next interpreter uses.

data BytecodeInterpreter

instance Interpreter BytecodeInterpreter where
  data Program BytecodeInterpreter = ProgramBC BA.Bytes
  parse =
    parseToInstrs
      >>> translate
      >>> assemble
      >>> ProgramBC
  interpret memory (ProgramBC bytecode) = interpretBytecode memory bytecode

We reuse the parser from the AST interpreter, but then we convert the resultant AST into bytecode by translating and assembling it⁵. We use the Bytes byte array data type from the memory package to represent bytecode.

Unlike AST, bytecode has a flat list of instructions—called Opcodes—that can be encoded in a single byte each, with optional parameters. Because of its flat nature and compactness, bytecode is more CPU friendly to execute, which is where it gets its performance from. The downside is that bytecode is not human readable unlike AST.

data Opcode
  = OpInc
  | OpDec
  | OpMoveR
  | OpMoveL
  | OpGetC
  | OpPutC
  | OpLoop Opcodes
  | OpClear
  deriving (Show)

type Opcodes = [Opcode]

We use the Opcode ADT to model the BF opcodes. For now, it corresponds one-to-one with the Instruction ADT.

The translate function translates Instructions to Opcodes:

translate :: Instructions -> Opcodes
translate = V.toList >>> map translateOpcode
  where
    translateOpcode = \case
      Inc -> OpInc
      Dec -> OpDec
      MoveR -> OpMoveR
      MoveL -> OpMoveL
      GetC -> OpGetC
      PutC -> OpPutC
      Loop instrs -> OpLoop $ translate instrs

The assemble function assembles Opcodes to bytecode byte array:

assemble :: Opcodes -> BA.Bytes
assemble = BA.pack . concatMap assembleOpcode

assembleOpcode :: Opcode -> [Word8]
assembleOpcode = \case
  OpInc -> [0]
  OpDec -> [1]
  OpMoveR -> [2]
  OpMoveL -> [3]
  OpGetC -> [4]
  OpPutC -> [5]
  OpLoop body ->
    let assembledBody = concatMap assembleOpcode body
        bodyLen = length assembledBody + 3
     in if bodyLen > 65_536 -- 2 ^ 16
          then error $ "Body of loop is too big: " <> show bodyLen
          else do
            let assembledBodyLen = assembleBodyLen bodyLen
            [6] <> assembledBodyLen <> assembledBody <> [7] <> assembledBodyLen
  OpClear -> [8]
  where
    assembleBodyLen bodyLen =
      let lb = fromIntegral $ bodyLen .&. 0xff
          mb = fromIntegral $ (bodyLen .&. 0xff00) `shiftR` 8
       in [lb, mb] -- assumes Little-endian arch

The assembleOpcode function assembles an Opcode to a list of bytes (Word8s). For all cases except for OpLoop, we simply return a unique byte for the opcode.

For OpLoop, we first recursively assemble the loop body. We encode both the body and the body length in the assembled bytecode, so that the bytecode interpreter can use the body length to skip over the loop body when required. We use two bytes to encode the body length, so we first check if the body length plus three is over 65536 ( $= 2^{8} * 2^{8}$ ). If so, we throw an error. Otherwise, we return:

a unique byte for loop start (6),
followed by the body length encoded in two bytes (in the Little-endian order),
then the assembled loop body,
followed by a unique byte for loop end (7),
finally followed by the encoded body length again.

We encode the body length at the end again so that we can use it to jump backward to the start of the loop, to continue looping. Let’s look at this example to understand the loop encoding better:

> code = "++++++++++++++++++++++++++++++++++++++++++++++++>+++++[<+.>-]"
> concatMap assembleOpcode . translate . parseToInstrs $ code
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,6,8,0,3,0,5,2,1,7,8,0]

Let’s focus on the last twelve bytes. The diagram below shows the meaning of the various bytes:

Assembled bytecode for a BF loop

The example also demonstrates the flat nature of assembled bytecode. Now, all we have to do is to interpret it:

interpretBytecode :: Memory -> BA.Bytes -> IO ()
interpretBytecode memory bytecode =
  MV.unsafeWith
    (unMemory memory)
    (BA.withByteArray bytecode
      . interpretBytecodePtr (memorySize memory) (BA.length bytecode))

Instead of using integer indices in the bytecode array and memory vector, this time we use C-style direct pointers ⁶:

type ProgPtr = Ptr Word8
type MemPtr = Ptr Int8

interpretBytecodePtr :: Int -> Int -> MemPtr -> ProgPtr -> IO ()
interpretBytecodePtr memLen programLen memStartPtr progStartPtr =
  go memStartPtr progStartPtr
  where
    progEndPtr = progStartPtr `plusProgPtr` programLen
    memEndPtr = memStartPtr `plusMemPtr` memLen

    go !memPtr !progPtr
      | progPtr == progEndPtr = return ()
      | otherwise = readProg >>= \case
          0 -> modifyMem (+ 1) >> goNext                           -- Inc
          1 -> modifyMem (subtract 1) >> goNext                    -- Dec
          2 -> jump (nextMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveR
          3 -> jump (prevMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveL
          4 -> getChar >>= writeMem . fromIntegral . ord >> goNext -- GetC
          5 -> readMem >>= putChar . chr . fromIntegral >> goNext  -- PutC
          6 -> readMem >>= \case                                   -- Loop start
            0 -> readProg2 >>= jump memPtr
            _ -> jump memPtr 3
          7 -> readMem >>= \case                                   -- Loop end
            0 -> jump memPtr 3
            _ -> readProg2 >>= jump memPtr . negate
          8 -> writeMem 0 >> goNext                                -- Clear
          op -> error $ "Unknown opcode: " <> show op
      where
        goNext = jump memPtr 1
        jump memPtr offset = go memPtr $ progPtr `plusProgPtr` offset

        readProg = S.peek progPtr
        readProg2 = -- assumes Little-endian arch
          fromIntegral <$> S.peek (castPtr @_ @Word16 $ progPtr `plusProgPtr` 1)

        readMem = S.peek memPtr
        writeMem = S.poke memPtr
        modifyMem f = readMem >>= writeMem . f

In Haskell, the pointer type Ptr is parametrized by the type of the data it points to. We have two types of pointers here, one that points to the bytecode program, and another that points to the memory cells. So in this case, the IP and DP are actually pointers.

The go function here is again the core of the interpreter loop. We track the current IP and DP in it, and execute the logic corresponding to the opcode at the current memory location. go ends when the IP points to the end of the program byte array.

Most of the cases in go are similar to previous interpreters. Only difference is that we use pointers to read the current opcode and memory cell. For the loop start opcode, we read the byte pointed to by the DP, and if it is zero, we read the next two bytes from the program bytecode, and use it as the offset to jump the IP by to skip over the loop body. Otherwise, we jump the IP by 3 bytes to skip over the loop start opcode and encoded loop body length bytes. For the loop end opcode, we follow similar steps, except we jump backward to the start of the loop.

The helper functions for doing pointer arithmetic are following:

plusProgPtr :: ProgPtr -> Int -> ProgPtr
plusProgPtr = plusPtr

plusMemPtr :: MemPtr -> Int -> MemPtr
plusMemPtr = plusPtr

nextMemPtr :: MemPtr -> MemPtr -> MemPtr -> Int -> MemPtr
nextMemPtr memStartPtr memEndPtr memPtr inc =
  let memPtr' = memPtr `plusMemPtr` inc
   in if memEndPtr > memPtr'
        then memPtr'
        else memStartPtr `plusPtr` (memPtr' `minusPtr` memEndPtr)

prevMemPtr :: MemPtr -> MemPtr -> MemPtr -> Int -> MemPtr
prevMemPtr memStartPtr memEndPtr memPtr inc =
  let memPtr' = memPtr `plusMemPtr` (-1 * inc)
   in if memPtr' >= memStartPtr
        then memPtr'
        else memEndPtr `plusPtr` (memPtr' `minusPtr` memStartPtr)

nextMemPtr and prevMemPtr implement wrapping of pointers as we do for memory indices in nextMemoryIndex and prevMemoryIndex. Let’s see what the results of our hard work are:

❯ time ./bfi -b hanoi.bf > /dev/null
       11.10 real        11.04 user         0.04 sys
❯ time ./bfi -b mandelbrot.bf > /dev/null
       15.72 real        15.68 user         0.04 sys

1.3x and 2.3x speedups for hanoi.bf and mandelbrot.bf respectively over the AST interpreter. Not bad. But surely we can do even better?

Optimizing Bytecode Interpreter

We can optimize our bytecode interpreter by emitting specialized opcodes for particular patterns of opcodes that occur frequently. Think of it as replacing every occurrence of a long phrase in a text with a single word that means the same, leading to a shorter text and faster reading time. Since BF is so verbose, there are many opportunities for optimizing BF bytecode⁷. We are going to implement only one simple optimization, just to get a taste of how to do it.

data OptimizingBytecodeInterpreter

instance Interpreter OptimizingBytecodeInterpreter where
  data Program OptimizingBytecodeInterpreter = ProgramOBC BA.Bytes
  parse =
    parseToInstrs
      >>> translate
      >>> optimize
      >>> assemble
      >>> ProgramOBC
  interpret memory (ProgramOBC bytecode) = interpretBytecode memory bytecode

The optimizing bytecode interpreter is pretty much same as the bytecode interpreter, with the optimize function called between the translation and assembly phases.

The pattern of opcode we are optimizing for is [-] and [+]. Both of these BF opcodes when executed, decrement or increment the current memory cell till it becomes zero. In effect, these patterns clear the current cell. We start the process by adding a new Opcode for clearing a cell:

data Opcode
  = OpInc
  | OpDec
  | OpMoveR
  | OpMoveL
  | OpGetC
  | OpPutC
  | OpLoop Opcodes
  | OpClear
  deriving (Show)

type Opcodes = [Opcode]

The optimize function recursively goes over the Opcodes, and emits optimized ones by replacing the patterns that clear the current cell with OpClear:

optimize :: Opcodes -> Opcodes
optimize = map $ \case
  OpLoop [OpDec] -> OpClear
  OpLoop [OpInc] -> OpClear
  OpLoop body -> OpLoop $ optimize body
  op -> op

Then we modify the assembleOpcode function to emit a unique byte for OpClear:

assembleOpcode :: Opcode -> [Word8]
assembleOpcode = \case
  OpInc -> [0]
  OpDec -> [1]
  OpMoveR -> [2]
  OpMoveL -> [3]
  OpGetC -> [4]
  OpPutC -> [5]
  OpLoop body ->
    let assembledBody = concatMap assembleOpcode body
        bodyLen = length assembledBody + 3
     in if bodyLen > 65_536 -- 2 ^ 16
          then error $ "Body of loop is too big: " <> show bodyLen
          else do
            let assembledBodyLen = assembleBodyLen bodyLen
            [6] <> assembledBodyLen <> assembledBody <> [7] <> assembledBodyLen
  OpClear -> [8]

Finally, we modify the bytecode interpreter to execute the OpClear opcode.

go !memPtr !progPtr
  | progPtr == progEndPtr = return ()
  | otherwise = readProg >>= \case
      0 -> modifyMem (+ 1) >> goNext                           -- Inc
      1 -> modifyMem (subtract 1) >> goNext                    -- Dec
      2 -> jump (nextMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveR
      3 -> jump (prevMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveL
      4 -> getChar >>= writeMem . fromIntegral . ord >> goNext -- GetC
      5 -> readMem >>= putChar . chr . fromIntegral >> goNext  -- PutC
      6 -> readMem >>= \case                                   -- Loop start
        0 -> readProg2 >>= jump memPtr
        _ -> jump memPtr 3
      7 -> readMem >>= \case                                   -- Loop end
        0 -> jump memPtr 3
        _ -> readProg2 >>= jump memPtr . negate
      8 -> writeMem 0 >> goNext                                -- Clear
      op -> error $ "Unknown opcode: " <> show op

We can see how the patterns [-] and [+] that may execute operations tens, maybe hundreds, of times, are replaced by a single operation in the interpreter now. This is what gives us the speedup in this case. Let’s run it:

❯ time ./bfi -o hanoi.bf > /dev/null
        4.07 real         4.04 user         0.01 sys
❯ time ./bfi -o mandelbrot.bf > /dev/null
       15.58 real        15.53 user         0.04 sys

hanoi.bf runs 2.7x faster, whereas mandelbrot.bf is barely 1% faster as compared to the non-optimizing bytecode interpreter. This demonstrates how different optimizations apply to different programs, and hence the need to implement a wide variety of them to be able to optimize all programs well.

Comparison

It’s time for a final comparison of the run times of the four interpreters:

Interpreter	Hanoi	Mandelbrot
String	29.15s	94.86s
AST	14.94s	36.49s
Bytecode	11.10s	15.72s
Optimizing Bytecode	4.07s	15.58s

The final interpreter is 7x faster than the baseline one for hanoi.bf, and 6x faster for mandelbrot.bf. Here’s the same data as a chart:

That’s it for this post. I hope you enjoyed it and took something away from it. In a future post, we’ll explore more optimization for our BF interpreter. The full code for this post is available here.

If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!

BF is Turning-complete. That means it can be used to implement any computable program. However, it is a Turing tarpit, which means it is not feasible to write any useful programs in it because of its lack of abstractions.↩︎
A string interpreter also serves as an useful baseline for measuring the performance of BF interpreters. That’s why I decided to use strings instead of Data.Text or Data.Sequence, which are more performant.↩︎
I am a big fan of zippers, as evidenced by this growing list of posts that I use them in.↩︎
We use Nix for getting the dependency libraries.↩︎
If you are unfamiliar, >>> is the left-to-right function composition function:
```
f >>> g = g . f
```
↩︎
While the only way to access byte arrays is pointers, we could have continued accessing the memory vector using indices. I benchmarked both methods, and found that using pointers for memory access sped up the execution of hanoi.bf by 1.1x and mandelbrot.bf by 1.6x as compared to index-based access. It’s also nice to learn how to use pointers in Haskell. This is why we chose to use Storable vectors for the memory.↩︎
See BFC, which touts itself as “an industrial-grade Brainfuck compiler”, with a huge list of optimizations.↩︎