MonadicT
I see dead objects!
Search

Introduction

I need a parser for a simple Domain Specific Language and I am writing it in Haskell using the Parsec combinator library. My future posts will show why I am building this parser but in this post, I will focus on how to construct a parser for an imperative language using Parsec.

I will make no attempt to explain Haskell features except to touch upon some details of applicatives. There are many resources available on the Internet for learning Haskell

The implementation will be using Applicatives which will be easier to read than one written in a monadic approach. The language I intend to parse is a context-free grammar (CFG) and applicatives will do just fine. It is helpful to review type signatures for <$>, <*>, *> and <* shown below.

:t (<$>)
(<$>) :: Functor f => (a -> b) -> f a -> f b

:t (<*>)
(<*>) :: Applicative f => f (a -> b) -> f a -> f b

:t (<*)
(<*) :: Applicative f => f a -> f b -> f a

:t (*>)
(*>) :: Applicative f => f a -> f b -> f b

<$> applies a function (a -> b) to an argument (f a) in a computational context and produces a new value (f b).

<*> extracts both the function (a -> b) and the argument from context (f a) producing a new value in (f b).

<* always returns the first argument and *> the second.

The syntax of the language that our parser will recognize is shown below.

script :: sequence of stmts
stmt :: var_decl
    | if_stmt
    | while_stmt
    | for_stmt
    | continue_stmt
    | break_stmt
    | "{" stmt... "}"

var_decl :: ident ":=" expr

bool :: "true" | "false"

expr :: bexpr bool_op bexpr
    | bexpr

bexpr :: rexpr relop rexpr
    | rexpr

rexpr :: term termOp term
    | term

term :: term factor_op factor
    | factor

factor :: ID
    | number
    | string
    | True
    | False
    | "(" expr ")"
    | "+|-" factor
    | ID "(" [exppr ["," expr]] ")"

term_op :: "+" | "-"

factor_op :: "*" | "/"

if_stmt :: "if" expr stmt | "if" bool_expr stmt "else" stmt

while_stmt :: "while" expr stmt

for_stmt :: "for" ID stmt

break_stmt :: "break"

continue_stmt :: "continue"

The result of parsing will be an abstract syntax tree (AST). In further posts, I will implement evaluation of the tree or use it generate code.

Implementation

Using Parsec in applicative style leads to remarkably concise and simple implementation.

Module declaration

Lists exports from this module. We export lower-level parse functions for testing purposes.

module Parser (Expr(..), Stmt(..), dslP, parse, exprP, stmtP) where

Imports

Notable imports are Parsec and friends. We also import some functions from Control.Applicative.

import Control.Applicative (liftA2, pure, (<*>), (<$>), (<*), (*>))
import Text.Parsec
import Text.Parsec.String (Parser)
import Text.ParserCombinators.Parsec.Char (digit, letter,
       alphaNum, lower, upper)
import Text.ParserCombinators.Parsec.Language (emptyDef)
import qualified Text.ParserCombinators.Parsec.Token as Token

Lexer

tokenDef = Token.makeTokenParser $ emptyDef
     { Token.commentStart    = "/*"
            , Token.commentEnd      = "*/"
            , Token.commentLine     = "//"
            , Token.identStart      = letter
            , Token.identLetter     = alphaNum
            , Token.reservedNames   = [ "break"
                                      , "continue"
                                      , "else"
                                      , "false"
                                      , "if"
                                      , "print"
                                      , "true"
                                      , "while"
                                      ]
            , Token.reservedOpNames = ["+", "-", "*", "/", ":="
                                      , "<", ">", "|"
                                      , "and", "or", "not"]
            }


reserved = Token.reserved tokenDef
reservedOp = Token.reservedOp tokenDef
ident = Token.identifier tokenDef
integer = Token.integer tokenDef
float = Token.float tokenDef
stringLit = Token.stringLiteral tokenDef
ws = Token.whiteSpace tokenDef
symbol = Token.symbol tokenDef
parens = Token.parens tokenDef
braces = Token.braces tokenDef

Data types

data Expr =
    Add Expr Expr
    | Sub Expr Expr
    | Mul Expr Expr
    | Div Expr Expr
    | Eq Expr Expr
    | Less Expr Expr
    | Greater Expr Expr
    | Le Expr Expr
    | Ge Expr Expr
    | Ne Expr Expr
    | And Expr Expr
    | Or Expr Expr
    | Not Expr
    | Neg Expr
    | Call String [Expr]
    | V String
    | I Integer
    | D Double
    | S String
    | T
    | F
    deriving (Show, Eq)

data Stmt =
    Assign Expr Expr
    | Block [Stmt]
    | Print [Expr]
    | If Expr Stmt (Maybe Stmt)
    | While Expr Stmt
    | Break
    | Continue
    deriving (Show, Eq)

Useful combinators

commaSep p  = p `sepBy` (symbol ",")

Expression parser

This is the expression parser. This accepts semantically invalid expressions as there is no distinction between numerical, string and boolean expressions. In a future post, I will implement a semantic pass over the AST which will flag invalid expressions.

exprP :: Parser Expr
--exprP = termP `chainl1` termopP

exprP = bexprP `chainl1` bopP

bexprP = rexprP `chainl1` relopP

rexprP = termP `chainl1` termopP

termP :: Parser Expr
termP = factorP `chainl1` factoropP

factorP :: Parser Expr
factorP = Not <$> (notP *> factorP)
   <|> Neg <$> (symbol "-" *> factorP)
   <|> symbol "+" *> factorP
   <|> D <$> try float
   <|> I <$> try integer
   <|> S <$> stringLit
   <|> reserved "true" *> return T
   <|> reserved "false" *> return F
   <|> try callP
   <|> V <$> ident
   <|> parens exprP

relopP = (reservedOp "=" *> return Eq
           <|> reservedOp "<" *> return Less
           <|> reservedOp ">" *> return Greater
           <|> reservedOp "!=" *> return Ne
           <|> reservedOp "<=" *> return Le
           <|> reservedOp "<=" *> return Ge)

bopP = symbol "|" *> return Or
       <|> symbol "&" *> return And
       <|> reserved "or" *> return Or
       <|> reserved "and" *> return And

termopP = symbol "+" *> return Add
   <|> symbol "-" *>  return Sub

factoropP = symbol "*" *> return Mul
   <|> symbol "/" *> return Div

callP = Call <$> ident <*> parens (commaSep exprP)

notP = reservedOp "!" <|> reserved "not"

Statement parser

stmtP :: Parser Stmt
stmtP =   assignP
      <|> blockP
      <|> printP
      <|> try ifElseP
      <|> ifP
      <|> whileP
      <|> breakP
      <|> continueP

blockP = Block <$> braces (many stmtP)

printP = Print <$> (reserved "print" >> (commaSep exprP))

assignP = Assign <$> exprP <*> (reservedOp ":=" >> exprP)

ifP = If <$> ((reserved "if") >> exprP) <*> stmtP <*> return Nothing

ifElseP = If <$> ((reserved "if") >> exprP) <*> stmtP
                 <*> ((reserved "else") *> (Just <$> stmtP))

whileP = While <$> (reserved "while" >> exprP) <*> stmtP

breakP = reserved "break" *> return Break

continueP = reserved "continue" *> return Continue

DSL Parser

dslP :: Parser [Stmt]
dslP = ws *> many stmtP <* eof

Most problematic areas

With Parsec, it turns out to be important to order the parsers and adorn them with try. This was most evident in factorP. Very briefly, when there are two parsers one of which is a prefix of the orher, the parser of the longer input should be listed first. If a parser can fail after consuming some input, it should be wrapped in try so that the next parser will be tried at the correct input position.

Other than this, using Parsec to build parsers is pretty straightforward.

Test program

Here is test program that verifies the correctness of the parser. The tokenizer seems to have a bug. It correctly parses "1.2" as D 1.2 but parses -1.2 as I (-1). I will defer this issue for now!

Module imports

import Text.Parsec (parseTest)
import Data.List (intercalate)
import Text.Parsec.String
import Parser

Expression tests

Testing expression parsing.

exprTests :: [(String, Expr)]
exprTests = [("10", I 10)
         , ("-1", Neg (I 1))
         , ("- 1", Neg (I 1))
         , ("1.2", D 1.2)
         , ("-1.2", Neg (D 1.2))
         , ("- 1.3", Neg (D 1.3))
         , ("a", V "a")
         , ("\"a\"", S "a")
         , ("true", T)
         , ("false", F)
         , ("1 + 2", Add (I 1) (I 2))
         , ("1 + -2", Add (I 1) (Neg (I 2)))
         , ("1 + 2 * 3", Add (I 1) (Mul (I 2) (I 3)))
         , ("1 - 2", Sub (I 1) (I 2))
         , ("1 - 2 * 3", Sub (I 1) (Mul (I 2) (I 3)))
         , ("1 + 2 * 3 / 4", Add (I 1) (Div (Mul (I 2) (I 3)) (I 4)))
         , ("1 + a", Add (I 1) (V "a"))
         , ("1 = a", Eq (I 1) (V "a"))
         , ("1 = 2", Eq (I 1) (I 2))
         , ("true and true", And T T)
         , ("true & true", And T T)
         , ("true | true", Or T T)
         , ("true or true", Or T T)
         , ("1 = 2 & 2 = 4", And (Eq (I 1) (I 2)) (Eq (I 2) (I 4)))
         , ("a = b & c = d", And (Eq (V "a") (V "b")) (Eq (V "c") (V "d")))
         , ("a = b | c = d", Or (Eq (V "a") (V "b")) (Eq (V "c") (V "d")))
         , ("(a | b) & (c | d)", And (Or (V "a") (V "b")) (Or (V "c") (V "d")))
         , ("(a & b) | (c & d)", Or (And (V "a") (V "b")) (And (V "c") (V "d")))
         , ("-(1.2)", Neg (D 1.2))
         , ("+(1.2)", D 1.2)
         , ("not true", Not T)
         , ("not not true", Not (Not T))
         , ("true = false", Eq T F)
         , ("foo()", Call "foo" [])
         , ("foo(1)", Call "foo" [I 1])
         , ("foo(1, true)", Call "foo" [I 1, T])
         , ("foo(1, 2)", Call "foo" [I 1, I 2])
         ]

Statement tests

Testing statement parsing.

stmtTests :: [(String, Stmt)]
stmtTests = [ ("x := 1", Assign (V "x") (I 1))
            , ("print 1, 2", Print [I 1, I 2])
            , ("print 1", Print [I 1])
            , ("{}", Block [])
            , ("if true print \"T\" else print \"F\"",
               If T (Print [S "T"]) (Just (Print [S "F"])))
            , ("if true print 1", If T (Print [I 1]) Nothing)
            , ("break", Break)
            , ("continue", Continue)
            ]

Dsl tests

Example DSL and its expected result.

dslTests :: [(String, [Stmt])]
dslTests =  [ ("x := 1 y:= 2", [Assign (V "x") (I 1), Assign (V "y") (I 2)])
            , (" x := 1 ", [Assign (V "x") (I 1)])
            ]

Test runner

testParser accepts a list of input and expected results of parsing them. Inputs that don't produce the expected results are printed.

testParser:: (Eq a, Show a) => Parser a -> [(String, a)] -> IO ()
testParser p tests = do
  putStr (intercalate "\r\n"
           (filter (not . null)
             (map
               (\(s, e, r) -> case r of
                  Right ast -> if e == ast
                        then "" -- "Parsed: " ++  s
                        else "Error: " ++ s ++ " Exp: " ++
                              show e ++ " Act: " ++ show ast
                  Left e -> "Parse error: \n" ++ show e)
               (map (\(s, e) -> (s, e, parse p s s)) tests))))

Main program

Runs all tests defined above.

main :: IO ()
main = do
   testParser exprP exprTests
   testParser stmtP stmtTests
   testParser dslP dslTests
   putStr "\nDONE!\n"