Beautiful Soup

I was helping my wife out with a quick script to scrape some data from a site that had a bunch of tables in it. Having only done some regex based scraping before, tables provided a bit of a challenge…

Until I found BeautifulSoup. It’s a python library to which you can throw a blob of html, and it gives you a pretty handy way to traverse the hierarchy and pull a bunch of stuff out. Make sure you use it along with html5lib for better parsing.

Also, since when did this easy_install thing exist? Maybe I just haven’t been doing serious python lately, but this .egg stuff is pretty awesome. Deploying a collection of modules into a local site-packages directory was ridiculously easy. Are all the other python deployment problems solved now too? 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: