The fun of parsing C++ with CEDET

First off, there's now a new CEDET branch in Emacs CVS, which you can check out by using

cvs -z3 co -r cedet-branch emacs

So it seems that CEDET really will be part of Emacs 23.2. Don't expect this branch to work right away, but it would be good if experienced CEDET users could check it out and report issues on emacs-devel.

I have been working a bit on CEDET's Semantic lately, primarily to improve its C++ parsing engine. Some time ago, I've thrown several different C++ libraries at it to see how well it works, and it's doing a pretty good job. The most important one is of course the STL, but I also tried Qt, Xapian, CLucene, and Vmime. I guess it's common knowledge that C++ is incredibly hard to parse, for humans and machines alike. One thing that Semantic still had lots of problems with were namespaces and using statements. I noticed this especially with the Vmime library, which does a lot of 'using' trickery.

For example, consider the following

namespace foo {
  struct aStruct { int a; };

namespace bar {
  using foo::aStruct;

The 'using' declaration brings 'foo::aStruct' into the 'bar' namespace scope, so that 'bar::aStruct' becomes something like an alias for 'foo::aStruct'. But you can also use a 'using' directive

namespace anotherbar {
  using namespace foo;

which brings the whole namespace 'foo' into scope. This is like a namespace alias, which however you would usually define by using

namespace anotherbar = foo;

instead. However, Semantic should be able to cope with both. The two statements also differ in the order of symbol lookup, but that's a detail which shall be dealt with on another day…

To make things worse, you can use these types with fully qualified names, like

  bar::aStruct mystruct;

or by bringing 'bar::aStruct' into scope beforehand, like

  using bar::aStruct;
  aStruct mystruct;

or the full namespace

  using namespace bar;
  aStruct mystruct;

The difficulty is that, compared to a compiler, the Semantic parser works 'the other way round': it begins in the current scope and then works its way up to resolve the type. So you first have to check the type - if it's fully qualified (bar::aStruct), look in 'bar' for using statements, otherwise check for 'using' statements in the local scope, check if it brings just one type into scope or a full namespace, and so on… Also keep in mind that there exist nested namespaces and that an alias might refer to another alias (but let's just recursion deal with that).

Well, the Semantic from CVS should be able to deal with this stuff now, and in the end it didn't even take much code.

I also learned once again how important unit tests are, especially when you deal with stuff like this where one little change can easily break other things. I've also set up a little continuous integration tool for CEDET, which will shout as soon as one test fails after a new commit, so that you can fix it immediately instead of bisecting the commit (much) later (which 'git' speeds up greatly, but still there're better ways to spend your time).

The next thing on the agenda is parsing Template Specialization… I'm sure that will be fun…

May I ask when this blog entry was posted? Consider adding the date of publish to your blog posts. It's interesting to know if it's recent or not. I am a CEDET user (by means of ECB), and very interested in improvements to it. I threw Ogre3D code at it, but it fails to make sense of it. Probably due to namespaces? Cheers Jacob
Hi Jacob, I've added the posting date to the blog entries - thanks for the hint. You see that this entry is actually rather old. Please bring your problem with ogre3d to the CEDET-devel mailing list with an example of what you're trying to complete and how it fails. I doubt that it has to do with namespaces, since this is working pretty well by now; it has more likely to do with some preprocessor trickery, which is still a big problem for the Semantic analyzer in various areas - some details on this issue can be found on the EmacsWiki: The other problem is template specialization or even meta-template programming; this is something I'm currently working on (actually, specializations are now correctly recognized in my personal repo).