RSS Feeds for Wikipedia Current Events and NHL News

Published on Friday, May 26, 2017
Tags: programming, python, RSS

I subscribe to a fair amount of feeds for news, blogs, articles, etc. I’m currently subscribed to 122 feeds, some of which have tens of articles a day (news sites), some of which are dead. [1] Unfortunately there’s still a few sites that I was visiting manually each day to get updates from because they don’t offer any feeds. This included:

Having the Wikipedia Current Events as a feed is a pretty specific thing that’s outside of the scope of MediaWiki, so I can understand why that doesn’t exist. The NHL not having news feeds over RSS or Atom though? That shocks me! I hope I’ve just been unable to find them and that they do exist. Please point me to them if they do!

Wikipedia Current Events

The Wikipedia Current Events feed is publicly available, code can be found in the GitHub repository: clokep/wp-current-events-rss. Note that this pulls data on demand and thus always serves the most up-to-date versions of the articles. This works by:

  1. Pulling the last 7 days of Wikipedia’s current event articles (e.g. like this one) using requests. Each of these is processed individually as a separate article in the RSS feed.
  2. The wikicode for each article is converted to an AST using mwparserfromhell
  3. Some of the headers and templates are removed from each article.
  4. It then converts each article back to HTML. (This was the surprisingly hard part. I couldn’t find a good library to do this and ended up writing this myself.)
  5. The articles are then turned into an RSS feed using feedgenerator.
  6. The feed itself is served via Flask.

Feel free to check it out and let me know of any issues!

NHL News

After building the above, I figured there was no reason not to do the same for the NHL News section (and specifically for the Islanders). You can see the NHL feed or pick your favorite team. Again, the code is available on GitHub: clokep/nhl-news-rss. The stack is pretty similar to the above, it works by:

  1. Pulling the current NHL News page.
  2. Parsing the HTML with BeautifulSoup4 to pull out each article’s title, date, author, and short summary. (Note that the full article isn’t available, we could get it by loading each article individually, but I didn’t implement that.)
  3. The articles are then turned into an RSS feed using feedgenerator.
  4. The feed itself is served via Flask.

Luckily the NHL News site and the news page for each team are in the same format, so it’s just loading different URLs to get the different articles. It was pretty trivial to get the full list of teams and add support for all of them, so that’s included too! Articles are pulled during page load, so should always be up to date.

I hope one (or both) of these are useful to people! Again, please let me know if you have any issues or ideas!

[1]

I recently switched from using Thunderbird to Feedly in order to get cross device read status syncing on articles, but that’s not really related to the rest of this article. Switching has mostly worked out well, but I do miss the filtering capabilities of Thunderbird!

I also tried a few other services (e.g. The Old Reader), but most had too many weird social features. I just wanted to read feeds.

[2]I do know that Wikipedia page updates can be consumed via RSS, but I don’t want to know every time the article is updated, just the state of the article at the end of the day. (It also doesn’t work for the current events article since it’s dynamically generated from a bunch of templates.)

Note

An update, as of September 13, 2017:

  1. The links to each RSS feed were updated.
  2. These apps are now hosted together on https://www.to-rss.xyz.
  3. These projects are no longer being updated on GitHub. The combined site might be open sourced in the future.