We are now going to look at a solution which is concise, efficient and gives sophisticated error messages. It is also less than 20 lines of code. We’ll be using FParsec.
FParsec is a port of an Haskell library. It is a parser combinator library or, as I like to think of it, an internal DSL to build parsers in F#. My usual disclaimer: I’m not an expert in FParsec. It is likely that, if you are an expert, you can come up with more maintainable/efficient/elegant version of this parser.
The whole code is below:
let ws = " \t\n"
let specialChars = ".)(\\\n"
let pWs = spaces
let pName = manyChars (noneOf (ws + specialChars)) |>> EName
let pExpr, pExprRef = createParserForwardedToRef<Expression, Unit>()
let curry2 f a b = f(a,b)
let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr) (curry2 Function)
let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
.>> pWs .>> pchar ')'
do pExprRef := pFunction <|> pApplication <|> pName
let pExpressions = sepBy pExpr spaces1
This mirrors pretty closely the grammar we are trying to parse. A program is a bunch of expressions separated by whitespaces.
let pExpressions = sepBy pExpr spaces1
sepBy is a combinator that takes a parser that defines what to parse and a parser that defines what the separator should be. And the separator should be one or more spaces …
pExpr is either a function, an application or a name. the operator <|> is a choice combinator that tries each parser in order. It tries the right parser just if the left parser fails and it doesn’t consume input. So it doesn’t backtrack. If you need a backtracking parser, you’ll need to get acquainted with the attempt combinator.
do pExprRef := pFunction <|> pApplication <|> pName
A function starts with a ‘', then comes a name, a dot and an expression.
let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr) (curry2 Function)
I know that might look crazy (and maybe it is), but just bear with me. Someone , who I’m not sure I can name, once told me that functional programming is great to write code, but terrible to read it and debug it. The phrase stayed with me as containing a grain of truth. In any case, in the code above:
-
. is a combinator that says “use the left parser, discard its value and then use the right one, returning its value”. Try to guess what .>> does …
- pipe2 is a combinator that says “apply the first parser, the second parser and then call a function passing as parameters the values returned by the two parsers
- curry2 is a function combinator that transform a function that takes a tuple to a function that takes the parameters as untupled
let curry2 f a b = f(a,b)
An application works similarly, but differently …
let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
.>> pWs .>> pchar ')'
The only difference is that now we have to consume the optional whitespaces and the ‘)’ at the end. A rule of thumb that I use is to use >>. to flow the result through and pipeX when I need more than one result.
The last thing is pName, which consume chars until it finds either a whitespace or a special char.
let ws = " \t\n"
let specialChars = ".)(\\\n"
let pWs = spaces
let pName = manyChars (noneOf (ws + specialChars)) |>> EName
And there you have it, a lexer, a parser all in 20 lines of code. I don’t like the code that I wrote above much. I’m sure I could refine it plenty and it probably contains some bugs, but it gives an idea of what is possible with FParsec.
View comments on GitHub or email me