Everything is an Object

2016-09-06

Programming in Python and you are sure to hear the refrain of "everything is an object", but I have only started to realize how deep the rabbit hole goes.

The Simple Case

An important, early lesson for me in programming Python was how to "inspect" objects. I used it (and still do) to explore APIs and debug undocumented functionality (oftentimes my own).

>>> help(dir)
Help on built-in function dir in module builtins:

dir(...)
    dir([object]) -> list of strings
    
    If called without an argument, return the names in the current scope.
    Else, return an alphabetized list of names comprising (some of) the attributes
    of the given object, and of attributes reachable from it.
    If the object supplies a method named __dir__, it will be used; otherwise
    the default dir() logic is used and returns:
      for a module object: the module's attributes.
      for a class object:  its attributes, and recursively the attributes
        of its bases.
      for any other object: its attributes, its class's attributes, and
        recursively the attributes of its class's base classes.

>>> dir()
['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']

dir is useful to check names that are currently in-scope, but more often I use it to explore the available attributes of an object. This introspection-ability is a result of the deep object-orientation of the language. Python doesn't differentiate between a primitive integer type and an Integer, there is no manual un-boxing of values.

>>> type(42)
<class 'int'>
>>> dir(42)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__',
'__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__',
'__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__',
'__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__int__',
'__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__',
'__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__',
'__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__',
'__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__',
'__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__',
'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__',
'__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate',
'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']

I knew this, but I never gave much thought to how deeply ingrained this idea was into the language, or what it achieved (other than saving me the trouble of boxing and un-boxing).

CPython Internals

I've been watching Philip Guo's "Ten-Hour Codewalk" through the internals of the CPython Interpreter and made a small but important connection with the simple example above and the more fundamental operations of Python.

As a quick reminder, Python actually compiles source files before "executing" any instructions in the interpreter. The compilation step is simple and won't do much but catch SyntaxErrors. The compilation step is directly available with the function compile:

>>> help(compile)
Help on built-in function compile in module builtins:

compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)
    Compile source into a code object that can be executed by exec() or eval().
    
    The source code may represent a Python module, statement or expression.
    The filename will be used for run-time error messages.
    The mode must be 'exec' to compile a module, 'single' to compile a
    single (interactive) statement, or 'eval' to compile an expression.
    The flags argument, if present, controls which future statements influence
    the compilation of the code.
    The dont_inherit argument, if true, stops the compilation inheriting
    the effects of any future statements in effect in the code calling
    compile; if absent or false these statements do influence the compilation,
    in addition to any features explicitly specified.

compile returns a code object, as seen here:

$ cat test.py
a_variable = 5

def fancy_func(z):
    x = 2
    return x + z

fancy_func(1)

$ python
>>> co = compile(open('test.py').read(), 'test.py', 'exec')
>>> dir(co)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__',
'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__',
'__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount',
'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno',
'co_flags', 'co_freevars', 'co_kwonlyargcount', 'co_lnotab', 'co_name',
'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames']

Further inspecting the code object is best done with the help of the disassembly module, dis. Each line corresponds to a single bytecode instruction, with the instructions contained in a byte-string co_code.

>>> co = compile(open('test.py').read(), 'test.py', 'exec')
>>> co.co_code
b'd\x00\x00Z\x00\x00d\x01\x00d\x02\x00\x84\x00\x00Z\x01\x00e\x01\x00d\x03\x00\x83\x01\x00\x01d\x04\x00S'

$ python3 -m dis test.py 
  1           0 LOAD_CONST               0 (5)
              3 STORE_NAME               0 (a_variable)

  3           6 LOAD_CONST               1 (<code object fancy_func at 0x10e397a50, file "test.py", line 3>)
              9 LOAD_CONST               2 ('fancy_func')
             12 MAKE_FUNCTION            0
             15 STORE_NAME               1 (fancy_func)

  7          18 LOAD_NAME                1 (fancy_func)
             21 LOAD_CONST               3 (1)
             24 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             27 POP_TOP
             28 LOAD_CONST               4 (None)
             31 RETURN_VALUE

The Point Is?

The big idea for me here is that the code object for a function is loaded and then assigned to a name. The operation of the function is compiled, but the specifics of its environment are unbound until run-time. This is how Python implements closures, by deferring environment look-ups for variables or attributes until run-time. Functions create new frames of execution and the co_code attribute, the compiled code of the function, is only part of the object, other data is available in the various attributes of the object itself.

This was, for me, a revelatory idea that simplifies some ideas that I am at least passingly familiar with but never dug too deeply into, things like monkey-patching and meta-programming. It also reinforces that nagging idea that deep down, Python is built entirely on dictionaries - an idea I'll explore more another time.