Beautiful Soup

I was helping my wife out with a quick script to scrape some data from a site that had a bunch of tables in it. Having only done some regex based scraping before, tables provided a bit of a challenge…

Until I found BeautifulSoup. It’s a python library to which you can throw a blob of html, and it gives you a pretty handy way to traverse the hierarchy and pull a bunch of stuff out. Make sure you use it along with html5lib for better parsing.

Also, since when did this easy_install thing exist? Maybe I just haven’t been doing serious python lately, but this .egg stuff is pretty awesome. Deploying a collection of modules into a local site-packages directory was ridiculously easy. Are all the other python deployment problems solved now too? 


The iOS6 maps “fiasco” brings up an interesting question I often think about: Why is it that in the consumer technology world at least, vertical integration often competes with delivering value to the user.

iOS Maps is an interesting case. Yes, it has some new stuff, but it also took some stuff away. It’s not clearly better, and it’s definitely worse for a huge chunk of city dwellers. I obviously don’t know the whole backstory behind the change, but it smells to me like another example where Apple thought that control over the feature was more useful in the long term than providing the current best experience (which is clearly google maps).

The business relationship between Apple and Google must also be behind this, but why does it have to be this way? Why is it that two tech giants with best-of-breed tech in different areas rarely seem to be able to cooperate to produce an awesome product for the user? (And as companies get larger, and their product overlap continues to grow, the effect tends to worsen) It’s kinda depressing to see this same game play out over and over again. It also seems like the world maybe just hasn’t figured out the right way to make the business side work.

Nexus 7 backlight issue

A quick note to those who might be thinking about getting a Nexus 7: While I’ve found the device to be quite good overall, I’ve hit one hardware issue that is going to take a replacement to fix.

Specifically, there’s a bit of backlight flicker when the screen brightness is on a low setting, and there is Wifi data activity.

Here’s the obligatory giant internet thread about the problem.

If I were on the fence, I’d probably wait a few weeks to see if the problem works itself out through software updates or confirmed hardware fixes.

Update: Google sent me a replacement device, which had the same problem. If you look on the thread, others are having the same experience. So I sent the replacement and the original back for a refund. I guess we get used to the Apple QA sometimes :/. I still like the device, but this issue was killing one of my core use cases. I’ll have to wait and see if it gets fixed and re-order at a later time.

Also, updated above link to point to the right megathread.

Remote ctags with vim

Using a text editor inside a terminal really sucks. It’s really lame that it’s 2012, and we’re still doing it this way.

So I resolved to try to run MacVim locally to do my work. It has a netrw module that lets you do certain things like open a sftp:// url directly, and browse remote trees. Great. But what about ctags?

Turns out ctags file format is pretty simple.

tag<tab>filename<tab>bunch of other stuff...

It also turns out that if in the filename field, you have a “sftp://…” style string, vim doesn’t freak out! Sweet!

So my hack for the night was to write a script that:

  1. logs in remotely and kicks of a ctags run
  2. copies that tags file locally
  3. munges that file to replace relative paths in the remote fs with sftp://… prefixed paths that a local vim could use to access remote files
  4. write some .vimrc to load that tags file

And it works! I can use :tj to search for tags, and vim will automatically open up the right file over sftp and show me the tag. 

The only thing I can’t seem to work out is how to disable the extra “press Enter to continue” prompt that happens every time vim needs to open a new remote file. If I can get rid of that, this is almost a perfect solution.

Actually, one other problem is that vim doesn’t seem to be able to tab-complete sftp://… urls, which seems a little silly, since it has a pretty sophisticated directory browser built in.

For those who are curious, here’s my lame script to munge the tags file:

sftp_prefix = 'sftp://dev/www/'

infile = file('/Users/kdeeter/workstate/tags.raw', 'r')
outfile = file('/Users/kdeeter/workstate/tags', 'w')

lines = infile.readlines()
cnt = 0

for line in lines:
   tag, fname, rest = line.split('t', 2)
   if not tag[0:5] == '!_TAG':
     outfile.write(tag + 't' + sftp_prefix + fname + 't' + rest)
   cnt += 1
   if cnt % 1000 == 0:
      print '%s tags mungedr' % (cnt * 1000),


vim sessions, python, and git branches

Getting back to writing code again. One thing that I always wanted was a way for my vim to understand the sets of files I was working on in a particular git branch. I think I figured this out, along with how to use vim python interface.

In .vimrc

python << ENDPYTHON
import vim
import commands

vim_sessions_dir = '~/www-vim-sessions/'

def save_git_session():                                                                   
  branch = commands.getoutput('git branch 2> /dev/null | grep -e '\*' | sed 's/^..(.*)/\1/'')
  branch_session_file = vim_sessions_dir + branch + '.vim'
  session_save_cmd = ':mksession! ' + branch_session_file
  print "saved session to", branch_session_file


nnoremap <leader>z :python save_git_session()

That says: here’s a function called save_git_sessions, it shells out to get the current git branch, creates a file name based on it, runs “:mksession” with that file name to dump the current state, then quits the program.

All this gets mapped to “z” using remapping.

Now on the bash side, I need a function to restore stuff from the session, which is as simple as:

function vr {
   SESSION_FILE="~/www-vim-sessions/$(git branch 2> /dev/null | grep -e '* ' | sed 's/^..(.*)/1/').vim"
   vim -S ${SESSION_FILE}

Et Voila!

%d bloggers like this: