I've used Parsec to "tokenize" data from a text file. It was actually
quite easy, everything is correctly identified.
So now I have a list/stream of self defined "Tokens" and now I'm stuck.
Because now I need to write my own parsec-token-parsers to parse this
token stream in a context-sensitive way.
Uhm, how do I that then?
a Token is something like:
data Token = ZE String
| OPSShort String
| OPSLong String
| Other String
| ZECd String
Hi, Günther, you could write functions that pattern-match on various
sequences of tokens in a list, you could for example have a look at
the file Evaluator.hs in my scheme interpreter haskeem, or you could
build up more-complex data structures entirely within parsec, and for
this I would point you at the file Parser.hs in my accounting program
umm; both are on hackage. Undoubtedly there are many more and probably
better examples, but I think these are at least a start...
Get the Parsec manual from Daan Leijen's home page then see the
section '2.11 Advanced: Seperate scanners'.
Though mentioned rarely, Parsec in its regular mode is a scannerless
parser. Unless you have complex formatting problems (e.g. indentation
sensitivity, vis Python or Haskell's syntax) scannerless parsers are
often much more convenient than parsers lexers (see the grammar
formalism SDF for many examples). For Parsec, if you want a separate
scanner there's quite a lot of boilerplate you need to manufacture if
you want to use the technique in section 2.11. Usually I can get by
with the Token and Language modules or do a few tricks with the
'symbol' parser instead.
Parsec is monadic so (>>=) allows you to write context-sensitive
parsers, see section '3.1. Parsec Prim' for a discussion and example.
Again, writing a context-sensitive parser can often be more trouble
than studying the format of the input and working out a context-free
grammar (if there is one).
Maybe this can be of help (though it's for Parsec 2):
It's not the only example of this either, tagsoup-parsec is available
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnusï¼ therningï¼=8Eorg Jabber: magnusï¼ thernigï¼=8Eorg
http://therning.org/magnus identi.ca|twitter: magthe
Ð=92 Ñ=81Ð¾Ð¾Ð±Ñ=89ÐµÐ½Ð¸Ð¸ Ð¾Ñ=8212 Ñ=8FÐ½Ð²Ð°Ñ=80Ñ=8F 2010 03:35:10 GÃ¼nther SchmidtÐ½Ð°Ð¿Ð¸Ñ=81Ð°Ð»:
That's pretty easy actually. You can use function `token' to define you ow
primitive parsers. It's defined in Parsec.Prim If I'm correctly remember.
Also you could want to add information about position in the source code t
you lexems. Here is some code to illustrate usage:
Haskell-Cafe mailing list