Design Issues
This page documents outstanding and resolved design decisions.
Dependencies, Haskell Extensions
On what libraries should Language.C depend, what haskell extensions should be required ?
I think a nice approach would be this one:
The core lexer/parser/pretty printer will be haskell98+hierarchical modules, depending on alex and happy. On top of it, we will use cabal flags to enable extensions. For example, Data.Generic instances if we have ghc, Uniplate instances if we have uniplate. MonadState? ParserMonad? if we have mtl or monadLib. etc. (suggested by Iavor),
There are some unresolved issues, though: If someone wants to use Language.C library plus say uniplate instances, is it possible to specify this dependency in the cabal file ??
C Constants
In C, there are integer constants, integer character constants, floating point constants and string literals. The question is how to represent them in the AST. In the original source, integers, chars and strings where represented using their haskell correspondents Integer, Char and String, while floating point numbers where represented by Strings.
The problem with this approach is that
- integer, char and string constants in C have type flags (u,L,LL), which get lost.
- char constants in C are basically treated as integers of type char or wchar_t, but there is no guarantee that they match the Haskell notion of a character.
- Float constants don't have a normalized representation (this can be hard to change though).
- C99 has multi-character character constants (not very popular though). Their interpretation is implementation-dependent, so maybe it is useless to support them.
Not decided yet.
Hierarchical Modules (resolved)
The parser toolkit by Manuel was written pre-hierarchical modules. I think it is appropriate to refactor (move) the modules into a hierarchy. This is almost no work, as I've automated this task for my last project.
What hierarchy should be chosen ? I chose
- Language.C (root) - Facades
- Language.C.Toolkit (Parser toolkit)
- Language.C.Parser (Parser / Lexer)
- Language.C.AST (AST and pretty printing)
- Language.C.<...> (Add additional submodules here)
Supported C languages and subsets
Basically, the library will support C99+most gnu extensions.
Now some people will want to ensure that the AST only includes a subset of the supported language, in order to make processing easier. We have two ways of restricting the language: On the parser side, and on the AST side. Depending on whether an extension is conservative and easily recognizable in the AST, one or the other may be the better choice.
Currently, I think that the Parser should have a flag whether to use for extensions or plain C99. To decide for other use cases, I need to look at a sensible C subset first.
