Retiring nnrss in favor of nnshimbun in Gnus

Tagged:  
OK, I'm more or less retiring nnrss in Gnus. The main problem is that more and more sites don't publish their full content in their RSS feed anymore, hence I have to read it via emacs-w3m anyway. So I figured, why not completely switch to nnshimbun ? Because it's tedious to write a shimbun for every feed, of course (see here and here for details on shimbuns in emacs-w3m). But since almost all sites I'm reading publish RSS feeds, albeit mostly without content or at least only with teasers, why not use a general RSS shimbun which simply inherits from sb-rss.el? So I wrote just that (there is already something like that in emacs-w3m, called rss-hash, but it's for feeds with fully published content).
This way, you can read every RSS feed with nnshimbun; it will ignore the content and fetch the HTML page for you, and you can easily extract the actual content by specifying regular expressions where it starts and ends. It gets even better: I added a detection for some popular blogging engines, namely Google's Blogger/Blogspot, WordPress and TypePad, and the shimbun will try to automatically extract the content for you if it encounters one of those.
The shimbun is called rss-blogs, although it works with practically any feed. To use it, get emacs-w3m from CVS and set the variable shimbun-rss-blogs-group-url-regexp, for example:
(setq shimbun-rss-blogs-group-url-regexp '(("Example: Wordpress" "http://emacs.wordpress.com/feed/") ("Example: w3m" "http://sourceforge.net/export/rss2_projnews.php?group_id=39518" "<a name=\"content\">" "<h3 class=\"titlebar\">") ("Example: w3m without removal" "http://sourceforge.net/export/rss2_projnews.php?group_id=39518" 'none)))

In a nutshell: The name, the URL of the RSS feed, and optional two regexps for start and end of the actual content. If you omit the regexps, the auto-detection of Blogger/WordPress/TypePad will kick in. If you just use 'none, no filtering will be done whatsoever. See also the doc-string of the variable.
After you've set the variable, call gnus-group-make-shimbun-group, choose rss-blogs and then the name you've specified in the above variable.
This indeed works very well. One thing I've noticed though: when I add a blog to shimbun-rss-blogs-group-url-regexp, after doing gnus-group-make-shimbun-group and choosing rss-blogs, the blog isn't there. Even quitting gnus and restarting doesn't have it show up. But quitting emacs and fully reloading and then doing gnus-group-make-shimbun-group etc will have the new blog show up. Is there a "force gnus/shimbun/whatever to look at that variable again" type of function? Completing shutting down emacs to add a blog to the list seems excessive!
For anyone following this: these issues should now be fixed in CVS.
Thanks for rss-blogs. This is fantastic stuff, and I look forward to using it more. I, too, am very eager to move from nnrss to nnshimbun, as I have been unable to get rid of duplicate entries with nnrss. Unfortunately, I have not been able to get rss-hash to work for feeds with full content (shimbun-rss-get-headers returns a "Wrong type argument: stringp"). I'd be curious to know if you have gotten rss-hash to work.
Yes, I have gotten rss-hash to work (with some changes, but those are all in CVS). If you tell me the feed which gives you this error, I'll look into it (post it here or mail to randomsample <AT> randomsample.de).
Is it possible to track comments with shimbum (or otherwise)? What I detest about RSS is that it is virtually impossible to follow replies or track a thread of conversation. I would love to massage blog posts + replies into Gnus with comments appearing as replies to the original message.
Short answer: no. Long answer: The problem is that almost all sites I know don't publish RSS feeds for comments. One exception is Google's Blogger, and the mentioned sb-rss-blogs.el supports those, although just as articles in its own group without any thread information. I think it would be possible to code this in a way that allows a nicely threaded view, but I'm afraid this would lead to a Gnus-only solution, which the emacs-w3m maintainers try to avoid (shimbun also supports Mew and Wanderlust). Then there's the problem with sites which do not publish comment feeds - here you would have to parse the HTML. It might be possible to find a generic solution e.g. for WordPress-based sites, where you often find similar div-tags which can be easily parsed, but usually you will have to adapt the parsing for every site, which would be a lot of work.