Kurt McKee (kurtmckee) wrote,
Kurt McKee
kurtmckee

Moving, part 6

I misspoke, apparently: the Unicode problem in the first tool was stemming from something weird going on with the xmlrpclib.Binary.decode() function. Extracting the raw utf-8 data and decoding that gets me the data I expect. New problem: some of my entries are not fully HTML. The paragraphs are not wrapped in <p> tags, resulting in a massive blob of text when converted to Markdown because html2text discards the newlines. I have to add the HTML tags before converting to Markdown.
Tags: website
Subscribe

  • listparser 0.18

    I'm pleased to announce the release of listparser 0.18! This release simply replaces the regular expression-based RFC 822 date parser with…

  • feedparser 5.2.0

    I'm pleased to announce the release of feedparser 5.2.0! It's available only on the Python Package Index (PyPI) as Google Code is shutting down…

  • I'll be in the Netherlands

    Hey everybody, I'm going to be in Zaltbommel and Waardenburg, Netherlands starting February 4th for two weeks. If anyone would like to get together…

Comments for this post were disabled by the author