I need a parser for a simple Domain Specific Language and I am writing it in Haskell using the Parsec combinator library. My future posts will show why I am building this parser but in this post, I will focus on how to construct a parser for an imperative language using Parsec.
I will make no attempt to explain Haskell features except to touch upon some details of applicatives. There are many resources available on the Internet for learning Haskell
The implementation will be using Applicatives which will be easier to
read than one written in a monadic approach. The language I intend to
parse is a context-free grammar (CFG) and applicatives will do just
fine. It is helpful to review type signatures for <$>, <*>,
*> and <* shown below.
:t (<$>) (<$>) :: Functor f => (a -> b) -> f a -> f b :t (<*>) (<*>) :: Applicative f => f (a -> b) -> f a -> f b :t (<*) (<*) :: Applicative f => f a -> f b -> f a :t (*>) (*>) :: Applicative f => f a -> f b -> f b
<$> applies a function (a -> b) to an argument (f a) in a
computational context and produces a new value (f b).
<*> extracts both the function (a -> b) and the argument from
context (f a) producing a new value in (f b).
<* always returns the first argument and *> the second.
The syntax of the language that our parser will recognize is shown below.
script :: sequence of stmts
stmt :: var_decl
| if_stmt
| while_stmt
| for_stmt
| continue_stmt
| break_stmt
| "{" stmt... "}"
var_decl :: ident ":=" expr
bool :: "true" | "false"
expr :: bexpr bool_op bexpr
| bexpr
bexpr :: rexpr relop rexpr
| rexpr
rexpr :: term termOp term
| term
term :: term factor_op factor
| factor
factor :: ID
| number
| string
| True
| False
| "(" expr ")"
| "+|-" factor
| ID "(" [exppr ["," expr]] ")"
term_op :: "+" | "-"
factor_op :: "*" | "/"
if_stmt :: "if" expr stmt | "if" bool_expr stmt "else" stmt
while_stmt :: "while" expr stmt
for_stmt :: "for" ID stmt
break_stmt :: "break"
continue_stmt :: "continue"
The result of parsing will be an abstract syntax tree (AST). In further posts, I will implement evaluation of the tree or use it generate code.
Using Parsec in applicative style leads to remarkably concise and simple implementation.
Lists exports from this module. We export lower-level parse functions for testing purposes.
module Parser (Expr(..), Stmt(..), dslP, parse, exprP, stmtP) where
Notable imports are Parsec and friends. We also import some functions
from Control.Applicative.
import Control.Applicative (liftA2, pure, (<*>), (<$>), (<*), (*>))
import Text.Parsec
import Text.Parsec.String (Parser)
import Text.ParserCombinators.Parsec.Char (digit, letter,
alphaNum, lower, upper)
import Text.ParserCombinators.Parsec.Language (emptyDef)
import qualified Text.ParserCombinators.Parsec.Token as Token
tokenDef = Token.makeTokenParser $ emptyDef
{ Token.commentStart = "/*"
, Token.commentEnd = "*/"
, Token.commentLine = "//"
, Token.identStart = letter
, Token.identLetter = alphaNum
, Token.reservedNames = [ "break"
, "continue"
, "else"
, "false"
, "if"
, "print"
, "true"
, "while"
]
, Token.reservedOpNames = ["+", "-", "*", "/", ":="
, "<", ">", "|"
, "and", "or", "not"]
}
reserved = Token.reserved tokenDef
reservedOp = Token.reservedOp tokenDef
ident = Token.identifier tokenDef
integer = Token.integer tokenDef
float = Token.float tokenDef
stringLit = Token.stringLiteral tokenDef
ws = Token.whiteSpace tokenDef
symbol = Token.symbol tokenDef
parens = Token.parens tokenDef
braces = Token.braces tokenDef
data Expr =
Add Expr Expr
| Sub Expr Expr
| Mul Expr Expr
| Div Expr Expr
| Eq Expr Expr
| Less Expr Expr
| Greater Expr Expr
| Le Expr Expr
| Ge Expr Expr
| Ne Expr Expr
| And Expr Expr
| Or Expr Expr
| Not Expr
| Neg Expr
| Call String [Expr]
| V String
| I Integer
| D Double
| S String
| T
| F
deriving (Show, Eq)
data Stmt =
Assign Expr Expr
| Block [Stmt]
| Print [Expr]
| If Expr Stmt (Maybe Stmt)
| While Expr Stmt
| Break
| Continue
deriving (Show, Eq)
commaSep p = p `sepBy` (symbol ",")
This is the expression parser. This accepts semantically invalid expressions as there is no distinction between numerical, string and boolean expressions. In a future post, I will implement a semantic pass over the AST which will flag invalid expressions.
exprP :: Parser Expr
--exprP = termP `chainl1` termopP
exprP = bexprP `chainl1` bopP
bexprP = rexprP `chainl1` relopP
rexprP = termP `chainl1` termopP
termP :: Parser Expr
termP = factorP `chainl1` factoropP
factorP :: Parser Expr
factorP = Not <$> (notP *> factorP)
<|> Neg <$> (symbol "-" *> factorP)
<|> symbol "+" *> factorP
<|> D <$> try float
<|> I <$> try integer
<|> S <$> stringLit
<|> reserved "true" *> return T
<|> reserved "false" *> return F
<|> try callP
<|> V <$> ident
<|> parens exprP
relopP = (reservedOp "=" *> return Eq
<|> reservedOp "<" *> return Less
<|> reservedOp ">" *> return Greater
<|> reservedOp "!=" *> return Ne
<|> reservedOp "<=" *> return Le
<|> reservedOp "<=" *> return Ge)
bopP = symbol "|" *> return Or
<|> symbol "&" *> return And
<|> reserved "or" *> return Or
<|> reserved "and" *> return And
termopP = symbol "+" *> return Add
<|> symbol "-" *> return Sub
factoropP = symbol "*" *> return Mul
<|> symbol "/" *> return Div
callP = Call <$> ident <*> parens (commaSep exprP)
notP = reservedOp "!" <|> reserved "not"
stmtP :: Parser Stmt
stmtP = assignP
<|> blockP
<|> printP
<|> try ifElseP
<|> ifP
<|> whileP
<|> breakP
<|> continueP
blockP = Block <$> braces (many stmtP)
printP = Print <$> (reserved "print" >> (commaSep exprP))
assignP = Assign <$> exprP <*> (reservedOp ":=" >> exprP)
ifP = If <$> ((reserved "if") >> exprP) <*> stmtP <*> return Nothing
ifElseP = If <$> ((reserved "if") >> exprP) <*> stmtP
<*> ((reserved "else") *> (Just <$> stmtP))
whileP = While <$> (reserved "while" >> exprP) <*> stmtP
breakP = reserved "break" *> return Break
continueP = reserved "continue" *> return Continue
dslP :: Parser [Stmt] dslP = ws *> many stmtP <* eof
With Parsec, it turns out to be important to order the parsers and
adorn them with try. This was most evident in factorP. Very
briefly, when there are two parsers one of which is a prefix of the
orher, the parser of the longer input should be listed first. If a
parser can fail after consuming some input, it should be wrapped in
try so that the next parser will be tried at the correct input
position.
Other than this, using Parsec to build parsers is pretty straightforward.
Here is test program that verifies the correctness of the parser. The
tokenizer seems to have a bug. It correctly parses "1.2" as D 1.2
but parses -1.2 as I (-1). I will defer this issue for now!
import Text.Parsec (parseTest) import Data.List (intercalate) import Text.Parsec.String import Parser
Testing expression parsing.
exprTests :: [(String, Expr)]
exprTests = [("10", I 10)
, ("-1", Neg (I 1))
, ("- 1", Neg (I 1))
, ("1.2", D 1.2)
, ("-1.2", Neg (D 1.2))
, ("- 1.3", Neg (D 1.3))
, ("a", V "a")
, ("\"a\"", S "a")
, ("true", T)
, ("false", F)
, ("1 + 2", Add (I 1) (I 2))
, ("1 + -2", Add (I 1) (Neg (I 2)))
, ("1 + 2 * 3", Add (I 1) (Mul (I 2) (I 3)))
, ("1 - 2", Sub (I 1) (I 2))
, ("1 - 2 * 3", Sub (I 1) (Mul (I 2) (I 3)))
, ("1 + 2 * 3 / 4", Add (I 1) (Div (Mul (I 2) (I 3)) (I 4)))
, ("1 + a", Add (I 1) (V "a"))
, ("1 = a", Eq (I 1) (V "a"))
, ("1 = 2", Eq (I 1) (I 2))
, ("true and true", And T T)
, ("true & true", And T T)
, ("true | true", Or T T)
, ("true or true", Or T T)
, ("1 = 2 & 2 = 4", And (Eq (I 1) (I 2)) (Eq (I 2) (I 4)))
, ("a = b & c = d", And (Eq (V "a") (V "b")) (Eq (V "c") (V "d")))
, ("a = b | c = d", Or (Eq (V "a") (V "b")) (Eq (V "c") (V "d")))
, ("(a | b) & (c | d)", And (Or (V "a") (V "b")) (Or (V "c") (V "d")))
, ("(a & b) | (c & d)", Or (And (V "a") (V "b")) (And (V "c") (V "d")))
, ("-(1.2)", Neg (D 1.2))
, ("+(1.2)", D 1.2)
, ("not true", Not T)
, ("not not true", Not (Not T))
, ("true = false", Eq T F)
, ("foo()", Call "foo" [])
, ("foo(1)", Call "foo" [I 1])
, ("foo(1, true)", Call "foo" [I 1, T])
, ("foo(1, 2)", Call "foo" [I 1, I 2])
]
Testing statement parsing.
stmtTests :: [(String, Stmt)]
stmtTests = [ ("x := 1", Assign (V "x") (I 1))
, ("print 1, 2", Print [I 1, I 2])
, ("print 1", Print [I 1])
, ("{}", Block [])
, ("if true print \"T\" else print \"F\"",
If T (Print [S "T"]) (Just (Print [S "F"])))
, ("if true print 1", If T (Print [I 1]) Nothing)
, ("break", Break)
, ("continue", Continue)
]
Example DSL and its expected result.
dslTests :: [(String, [Stmt])]
dslTests = [ ("x := 1 y:= 2", [Assign (V "x") (I 1), Assign (V "y") (I 2)])
, (" x := 1 ", [Assign (V "x") (I 1)])
]
testParser accepts a list of input and expected results of parsing
them. Inputs that don't produce the expected results are printed.
testParser:: (Eq a, Show a) => Parser a -> [(String, a)] -> IO ()
testParser p tests = do
putStr (intercalate "\r\n"
(filter (not . null)
(map
(\(s, e, r) -> case r of
Right ast -> if e == ast
then "" -- "Parsed: " ++ s
else "Error: " ++ s ++ " Exp: " ++
show e ++ " Act: " ++ show ast
Left e -> "Parse error: \n" ++ show e)
(map (\(s, e) -> (s, e, parse p s s)) tests))))
Runs all tests defined above.
main :: IO () main = do testParser exprP exprTests testParser stmtP stmtTests testParser dslP dslTests putStr "\nDONE!\n"