Nasty bugs crawling down in some of my installed Python libraries.

The Problem

For many weeks I had a problem in one of our codebases using the requests Python library where doing a requests.get() would end up catching a weird exception nobody could figure out. See https://github.com/kennethreitz/requests/issues/2935

JUST TO BE CLEAR HERE, NO ACTUAL BUG WAS THERE IN THE FANTASTIC requests LIBRARY!

This is an example of exception we had been seeing:

Traceback (most recent call last):
  File "project/views.py", line 49, in get_url
    response = requests.get(url)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 54, in request
    session.close()
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 649, in close
    v.close()
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 264, in close
    self.poolmanager.clear()
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.py", line 99, in clear
    self.pools.clear()
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/_collections.py", line 93, in clear
    self.dispose_func(value)
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.py", line 65, in <lambda>
    dispose_func=lambda p: p.close())
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 410, in close
    conn = old_pool.get(block=False)
  File "/usr/local/lib/python2.7/Queue.py", line 176, in get
    raise Empty
KeyError: (1, True)

A Empty exception was being thrown, but a KeyError was being received.

The Diagnosis

I struggled hard trying to find the code that was, as many theorized, monkeypatching the Queue module. No luck there. The most commonly suggested suspect was gevent; however, I didn’t even have gevent installed, so I kept looking for things that could be monkeyparching stuff. e.g. do a regex search of \.(Empty|LifoQueue|get)\s*= and setattr\([^)]+["'] (Empty|LifoQueue|get)["']\). Nothing.

Obviously something was "patching" the Queue.Empty variable, but there was no way of telling what, who, when or why...

The Solution

I had almost lost any hope, when I decided to go hunting again and then I remembered about IRC channels... It was an aliviating moment when someone at Freenode’s #python (Yhg1s) suggested I could try protecting a Python module by directly modifying sys.modules and exchanging the module in question a tweaked object instead of a module, so when the bad guy was trying to modify Empty it would get exposed by throwing an exception. This is how I did it:

First, I protected the Queue module. To do this, I renamed the original Queue.py to _Queue.py:

mv /usr/local/lib/python2.7/Queue.py /usr/local/lib/python2.7/_Queue.py

And then in its place, I created another Queue.py which would mimic the real Queue module, only with a protected object to which no attribute assignments would be allowed, so the whole new /usr/local/lib/python2.7/Queue.py ended up like this:

class Protected(dict):
    def __getattr__(self, name):
        try:
            return self[name]
        except KeyError:
            raise AttributeError("'%s' object has no attribute '%s'" % (self.__class__.__name__, name))

    def __setattr__(self, name, value):
        raise AttributeError("can't set attribute")
import sys, _Queue
sys.modules['Queue'] = Protected(_Queue.__dict__)

As you see, I modified sys.modules['Queue'], so further imports of Queue would return the Protected object with all attributes from _Queue (the original Queue, only sort of immutable).

Finally, after doing that, I starting using the system, and I quickly saw a new exception:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/rest_framework/views.py", line 463, in dispatch
    response = handler(request, *args, **kwargs)
  File "project/apis/base.py", line 1184, in create
    self.perform_create(serializer)
  File "project/apis/base.py", line 1243, in perform_create
    return self._stamp_invoice(request, serializer)
  File "project/apis/base.py", line 1272, in _stamp_invoice
    serializer.save(**extra_kwargs)
  File "/usr/local/lib/python2.7/site-packages/rest_framework/serializers.py", line 191, in save
    self.instance = self.create(validated_data)
  File "project/apis/base.py", line 520, in create
    return self._issue_invoice(xml_etree)
  File "project/apis/base.py", line 982, in _issue_invoice
    with Balancer('dummy', owner, is_live) as ws:
  File "project/backends/balancer.py", line 72, in __enter__
    self.ws = self.get_pooler().get_stamper(self.owner, self.live)
  File "project/backends/balancer.py", line 34, in get_stamper
    except KeyError, Queue.Empty:
  File "/usr/local/lib/python2.7/Queue.py", line 9, in __setattr__
    raise AttributeError("can't set attribute")
AttributeError: can't set attribute

See the problem here:

try:
    ...
except KeyError, Queue.Empty:
    ...

Someone decided to catch both KeyError and Queue.Empty exceptions... only this isn’t the right way of catching two exceptions in Python 2. What this someone did in this library was actually assigning any KeyError exception object onto Queue.Empty, so if for instance a KeyError: (1, True) was caught, it’d get assigned and would replace old Queue.Empty (effectively monkeypatching it without even knowing)

I fixed this, obviously, by simply changing the code to:

try:
    ...
except (KeyError, Queue.Empty):
    ...

The Verdict

After checking around all the code we use, we found there are quite a few libraries doing things like that one above... producing bugs which end up overriding exception types, and these can hit you when you least expect it. Beware!

What I Learned

But most of all, this whole mess was a useful exercise to learn a good technique about how to "protect" a whole module from monkeypatching and being able to detect problems if you know something must be patching a module but you don’t know what.