Python Dependency Inversion Principle

What does dependency inversion mean or look like in the context of a dynamically typed language like Python? First we’ll review the concept of dependency inversion and what it means in a statically typed language like Java. Then we’ll compare the differences in dependency inversion between the two types of languages.

Note: Robert C. Martin originally introduced this term and while I’m not certain I think it likely came out of the java world because I can’t imagine pythonistas or rubyists coming up with such a grandiose word for what is ultimately a fairly simple and specific means of indirection 🙂

Dependency Inversion

According to wikipedia, the dependency inversion principle states that:

  • High-level modules should not depend on low-level modules. Both should depend on abstractions (e.g. interfaces).
  • Abstractions should not depend on details. Details (concrete implementations) should depend on abstractions.

Put another way, it’s saying that if a component needs to be able to switch specific implentations (details), it should not have a source code dependency to those implementations.

By removing the source code dependency and replacing it with a dependency to an interface, you are “inverting” the depedency between the high level component and the low level details.

But that’s all very theoretical mumbo jumbo so lets look at some concrete examples.

Java

In java, this is an example without dependency inversion.

class Cat {
    public void greet() {
        AngryGreeter greeter = new AngryGreeter();
        greeter.greet();
   }
}

The Cat class is our component here. It is directly referencing the class AngryGreeter which contains the specific details of greeting.

In many cases, this source code dependency runs in the same direction as the flow of control. In other words, if Cat needs to invoke methods on instances of AngryGreeter, it needs to have a reference to a concrete greeter object and the most common way to do that is to just use the new keyword to construct the object where it’s needed.

The consequence of this relationship through this construction using new AngryGreeter is that it creates tight coupling between the two modules. More specifically, our Cat class now directly depends on the concrete greeting class AngryGreeter.

Now, if there is no need to add new greeting behaviors at all, this relationship is perfectly fine.

But if we want to add new behaviors such as a HappyGreeter, we would have to modify Cat. Taken to the extreme, if we needed to add 100 greeting behaviors over the course of 100 days and we wanted Cat to be able to perform all of them, we would have to change Cat 100 times!

To avoid that, we need to:

  1. Introduce an interface that concrete greeters impelement
  2. Pass concrete greeters as arguments into Cat (dependency injection)

Here’s what dependency inversion looks like in Java for this simple example:

public interface Greeter{
    void greet();
}

class AngryGreeter implements Greeter { 
    public void greet() {
        System.out.println("YOWL!");
    }
}

class Cat {
    private final Greeter greeter;
        
    public Cat(Greeter greeter) {
        this.greeter = greeter;
    }

    public void greet() {
        this.greeter.greet();
    }
}

Now, rather than Cat depending directly on AngryGreeter, both Cat and Angry greeter depend on the interface Greeter.

What we have now is a dependency that points in the opposite direction of the control flow, hence it’s inverted.

Now, we can add new greeters to our system without ever touching the Cat component. That’s pretty sweet. Previously, Cat had a direct, hard coded reference to AngryGreeter. Now it has a direct reference to the interface instead which will only change when the greeting API changes (and not when new greeting behavior that uses the same API is added).

Python

Now lets look at the first example implemented in Python:

class AngryGreeter:
    def greet(self):
        print("YOWL!")
class Cat:
    def greet(self):
        greeter = AngryGreeter()
        greeter.greet()

Since python is dynamically typed, there is no need to declare an interface the source code in the same way as Java. At run time, objects either can do something or they can’t.

Therefore, if we wanted to invert the dependencies, we can turn the direct reference to the object into a parameter:

class AngryGreeter:
    def greet(self):
        print("YOWL!")
class Cat:
    def greet(self, greeter)
        greeter.greet()

While this is easy to do in python (or any dynamically typed language), there is one glaring drawback when we do this: we don’t know exactly what the interface is without looking at what methods are actually being called.

So while inverting dependencies are easy because interfaces are implicit, the implictness of interfaces means that:

  • Cat may crash if given a greeter object that does not implement the same interface. Missing method?
  • You need to spend a lot more time reading existing source code (mostly existing concrete classes) in order to infer what the interface is

This is one of the fundamental differences (and probably source of many divorces) between statically typed and dynamically typed languages. Given a dynamically typed language like python, can we add more reassurance?

There’s a couple of things you can do.

The first is type hints, which can be useful for your IDE but does not enforce contracts at runtime. The second is abstract base classes which will provide runtime checking.

Abstract Base Classes

Python 3 introduced ABC classes to solve these problems (these are more akin to javas abstract interfaces since they can contain implementation).

import abc

class Greeter(abc.ABC):
    @abc.abstractmethod
    def greet(self):
        pass

class AngryGreeter(Greeter):
    def greet(self):
        print("YOWL!")

class HappyGreeter(Greeter):
    pass

class Cat:
    def greet(self, greeter):
        greeter.greet()

if __name__ == "__main__":
    c = Cat()
    c.greet(AngryGreeter())
    c.greet(HappyGreeter())

Now, you’ll get an exception when HappyGreeter is instantiated without the greet method. In terms of understanding what methods are expected, we don’t have to go digging through multiple classes – we can just look at the interface that the greeters implement.

Nand2tetris Python Assembler

Here’s my source code for the assembler for the nand2tetris HACK assembly language written in Python 3.

This implementation emphasizes readability above all else. Therefore, there are more function calls than necessary and many parts of the implementation assume valid inputs. It has been tested to work with all files provided in the course.

I hope this can serve as a useful reference for others.

import re
import sys
import argparse

def convert_assembly_to_binary_file(asm_file, binary_file):
    with open(asm_file, "r") as f:
        result = translate_lines(f.readlines())
        output = "\n".join([l for l in result if l]) 
        with open(binary_file, "w") as f:
            f.write(output)

def translate_lines(lines):
    lines = strip_whitespace_and_comments(lines)
    symbol_table = build_symbol_table(lines)
    translate_instruction = build_instruction_translator(symbol_table)
    return [translate_instruction(x) for x in lines]

def strip_whitespace_and_comments(lines):
    instructions = []
    for line in lines:
        stripped_line = line.strip() 
        if stripped_line:
            if not stripped_line.startswith("//"):
                if "//" in stripped_line:
                    instructions.append(stripped_line.split("//")[0].strip())
                else:
                    instructions.append(stripped_line)
    return instructions

def build_symbol_table(lines):
    symbols = {
        "R0": "0000000000000000",
        "R1": "0000000000000001",
        "R2": "0000000000000010",
        "R3": "0000000000000011",
        "R4": "0000000000000100",
        "R5": "0000000000000101",
        "R6": "0000000000000110",
        "R7": "0000000000000111",
        "R8": "0000000000001000",
        "R9": "0000000000001001",
        "R10": "0000000000001010",
        "R11": "0000000000001011",
        "R12": "0000000000001100",
        "R13": "0000000000001101",
        "R14": "0000000000001110",
        "R15": "0000000000001111",
        "SP": "0000000000000000",
        "ARG": "0000000000000010",
        "LCL": "0000000000000001",
        "THIS": "0000000000000011",
        "THAT": "0000000000000100",
        "KBD": "0110000000000000",
        "SCREEN": "0100000000000000"
    }
    is_address_instruction = lambda x: x.startswith("@")
    is_compute_instruction = lambda x: "=" in x or ";" in x
    label_value = lambda x: x.replace("(", "").replace(")", "").strip()
    current_line_num = 0
    for line in lines: 
        if is_address_instruction(line) or is_compute_instruction(line):
            current_line_num +=1 
        elif is_label(line):
            symbols[label_value(line)] = decimal_to_binary(current_line_num)
    base_address = 16
    for line in lines:
        if line.startswith("@"):
            value = line[1:]
            if value not in symbols and not value.isnumeric():
                symbols[value] = decimal_to_binary(base_address)
                base_address += 1
    return symbols

def build_instruction_translator(symbol_table):
    COMPUTATIONS = {
        "0": "0101010",
        "1": "0111111",
        "-1": "0111010",
        "D": "0001100",
        "A": "0110000",
        "!D": "0001101",
        "!A": "0110001",
        "-D": "0001111",
        "-A": "0110011",
        "D+1": "0011111",
        "A+1": "0110111",
        "D-1": "0001110",
        "A-1": "0110010",
        "D+A": "0000010",
        "D-A": "0010011",
        "A-D": "0000111",
        "D&A": "0000000",
        "D|A": "0010101",
        "M": "1110000",
        "!M": "1110001",
        "-M": "1110011",
        "M+1": "1110111",
        "M-1": "1110010",
        "D+M": "1000010",
        "D-M": "1010011",
        "M-D": "1000111",
        "D&M": "1000000",
        "D|M": "1010101"
    }
    DESTINATIONS = {
        "": "000",
        "M": "001",
        "D": "010",
        "MD": "011",
        "A": "100",
        "AM": "101",
        "AD": "110",
        "AMD": "111"
    }
    JUMPS = {
        "": "000",
        "JGT": "001",
        "JEQ": "010",
        "JGE": "011",
        "JLT": "100",
        "JNE": "101",
        "JLE": "110",
        "JMP": "111"
    }
    def fn(line):
        if is_label(line):
            return
        if line.startswith("@"):
            value = line[1:]
            if value in symbol_table:
                return symbol_table[value]
            return decimal_to_binary(int(value))
        dest, jump = "", ""
        comp = line.split("=").pop().split(";")[0]
        if "=" in line: 
            dest = line.split("=")[0]
        if ";" in line: 
            jump = line.split(";").pop()
        return f"111{COMPUTATIONS.get(comp, '0000000')}{DESTINATIONS.get(dest, '000')}{JUMPS.get(jump, '000')}"
    return fn

def is_label(line):
    return line.startswith("(") and line.endswith(")")

def decimal_to_binary(decimal_value):  
    return f"{decimal_value:0>16b}"

if __name__ == "__main__": 
    parser = argparse.ArgumentParser(description="Generates a hack binary file from assembly")
    parser.add_argument("asm_file", help="name of a HACK assembly file, i.e input.asm")
    parser.add_argument("binary_file", help="name of the HACK file, i.e output.hack")
    args = parser.parse_args()
    convert_assembly_to_binary_file(args.asm_file, args.binary_file)

Here’s what a translation for the Project 06 Max program looks like:

Line Number Before After
0 @R0 0000000000000000
1 D=M 1111110000010000
2 @R1 0000000000000001
3 D=D-M 1111010011010000
4 @OUTPUT_FIRST 0000000000001010
5 D;JGT 1110001100000001
6 @R1 0000000000000001
7 D=M 1111110000010000
8 @OUTPUT_D 0000000000001100
9 0;JMP 1110101010000111
10 (OUTPUT_FIRST)
11 @R0 0000000000000000
12 D=M 1111110000010000
13 (OUTPUT_D)
14 @R2 0000000000000010
15 M=D 1110001100001000
16 (INFINITE_LOOP)
17 @INFINITE_LOOP 0000000000001110
18 0;JMP 1110101010000111

The full specification for the nand2tetris HACK machine language can be found in the Project 6 materials on the course website.

Let me know if you have any questions!

Practical Guide to Python String Format Specifiers

Python 3.x introduced two key enhancements to string formatting:

  • The .format method and format specifiers with PEP-3101 (Python 3.0)
  • f-strings or literaly string interpolation with PEP-0498 (Python 3.6)

In this article I’m going to give specific examples on how format specifiers work using both .format and f-strings.

Format specifiers give you much greater control over how you want your output to look, but the grammar can feel foreign and confusing if you’re coming from python 2 where the extent of most string interpolation just involved using % symbols.

I expect you to have a basic working knowledge of how to use .format and f-strings, but here’s a refresher:

Using .format:

>>> name = "John"
>>> age = 15
>>> print("{} is age {}".format(name, age))
John is age 15
>>> print("{1} is age {0}".format(name, age))
15 is age John
>>> print("{name} is age {age}".format(name=name, age=age))
John is age 15

Using f-strings:

>>> name = "John"
>>> age = 15
>>> print(f"{name} is age {age}")
John is age 15

Now lets dig into format specifiers.

Format Specification

Here’s the grammar of a format specifier: [[fill]align][sign][#][0][width][grouping_option][.precision][type]

The official documentation on format specifiers does a great job explaining the meaning of each aspect of the format specifier grammar so I won’t repeat that here. I highly encourage you to read through the documentation on what each part of that grammar means first.

Lets start with 3 strings:

item_number = "103"
item_label = "Light Saber"
item_price = "256.128"

Every example will be accompanied by a use of both .format and f-strings. First, here’s a normal, default output without any format specification:

>>> print("{}, {}, {}".format(item_number, item_label, item_price))
103, Light Saber, 256.128
>>> print(f"{item_number}, {item_label}, {item_price}")
103, Light Saber, 256.128

Now I will progressively add format specifiers.

Again, here’s the full grammar of a format specifier: [[fill]align][sign][#][0][width][grouping_option][.precision][type].

Lets start with [[fill]align].

[[fill]align]

Lets make our item number take up a width of 50 characters:

>>> print("{:50}, {}, {}".format(item_number, item_label, item_price))
103                                               , Light Saber, 256.128
>>> print(f"{item_number:50}, {item_label}, {item_price}")
103                                               , Light Saber, 256.128

Now right adjust it:

>>> print("{:>50}, {}, {}".format(item_number, item_label, item_price))
                                               103, Light Saber, 256.128
>>> print(f"{item_number:>50}, {item_label}, {item_price}")
                                               103, Light Saber, 256.128

Don’t like blank spaces? Lets set x as the fill character (instead of spaces) and 50 as the width for item number. Make it left adjusted.

>>> print("{:x<50}, {}, {}".format(item_number, item_label, item_price))
103xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, Light Saber, 256.128
>>> print(f"{item_number:x<50}, {item_label}, {item_price}")
103xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, Light Saber, 256.128

Now make it right adjusted:

>>> print("{:x>50}, {}, {}".format(item_number, item_label, item_price))
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx103, Light Saber, 256.128
>>> print(f"{item_number:x>50}, {item_label}, {item_price}")
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx103, Light Saber, 256.128

Here’s tweaking both the fill and the width:

>>> print("{:j<10}, {}, {}".format(item_number, item_label, item_price))
103jjjjjjj, Light Saber, 256.128
>>> print(f"{item_number:j<10}, {item_label}, {item_price}")
103jjjjjjj, Light Saber, 256.128

What happens if you just give it a fill character and nothing else?

>>> print("{:x}, {}, {}".format(item_number, item_label, item_price))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'x' for object of type 'str'
>>> print(f"{item_number:j}, {item_label}, {item_price}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'j' for object of type 'str'

Woops! When you give a fill character, you must provide at least an alignment symbol (< or >).

>>> print("{:x<}, {}, {}".format(item_number, item_label, item_price))
103, Light Saber, 256.128

However, you dont need to provide a width – if left out, the width will be whatever your content takes up. But wait, what’s the point of providing the adjustment character then? You’re right – there is really no point – it’s really just for the language parser to know what the heck x means. A nice example of a leaky abstraction.

In summary:

  • left / right adjustments are really only useful in the context of providing a fix width value
  • if you set a fill character, you can’t leave out the adjustment specification
  • the default fill is a space
  • the default adjustment is left
  • the default width is the width of the string

[sign]

Lets add signs to our item_number and item_price:

>>> print(f"{item_number:+}, {item_label}, {item_price:+}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Sign not allowed in string format specifier

Woops! item_number and item_price are actually strings! You can only adds the sign symbol to numeric types. Lets re-declare those to be a number and float.

>>> item_number = 103
>>> item_price = 256.128
>>> print(f"{item_number:+}, {item_label}, {item_price:+}")
+103, Light Saber, +256.128

Great! Here’s the output if we negated item_number:

>>> print(f"{-item_number:+}, {item_label}, {item_price:+}")
-103, Light Saber, +256.128
>>> print("{:+}, {}, {:+}".format(-item_number, item_label, item_price))
-103, Light Saber, +256.128

What if we provide a negative sign?

>>> print(f"{-item_number:-}, {item_label}, {item_price:-}")
-103, Light Saber, 256.128
>>> print("{:-}, {}, {:-}".format(-item_number, item_label, item_price))
-103, Light Saber, 256.128

So if you do a negative sign, you get the default formatting behavior which is that only negative numbers have signs.

What happens now if we tried to apply our previous fill and alignment stuff to these numbers?

>>> print(f"{item_number:<50+}, {item_label}, {item_price:+}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Unknown format code '+' for object of type 'int'

You can’t! But you can still align and fill (without providing sign):

>>> print(f"{item_number:<10}, {item_label}, {item_price:+}")
103       , Light Saber, +256.128

In summary:

  • you can only use signs on numeric types
  • you can’t use fill, alignment, and sign all at the same time. You can use fill and alignment on integers – which is probably sufficient for most use cases
  • by default if you don’t provide a sign, only negative numbers display with signs

[#]

Lets ignore the fill and align stuff moving forward – the rest of the specification (just like +) is specific to number types and, just like + will not work in conjunction with fill and align.

Add # to our signed formats:

>>> print("{:+#}, {}, {:+#}".format(-item_number, item_label, item_price))
-103, Light Saber, +256.128

Uh, didn’t do anything. That’s because you didn’t specify the type of integer you want formatted (it defaults to decimal).

Lets make it output binary:

>>> print("{:+#b}, {}, {:+#}".format(-item_number, item_label, item_price))
-0b1100111, Light Saber, +256.128

Hexadecimal:

>>> print("{:+#x}, {}, {:+#}".format(-item_number, item_label, item_price))
-0x67, Light Saber, +256.128

We haven’t covered [type] yet, but we’ll get to that later in more detail. For now, just know that for # to be useful, you need to also supply an integer format type.

[0][width]

I’m addressing [0] with [with] because they actually relate:

When no explicit alignment is given, preceding the width field by a zero (‘0’) character enables sign-aware zero-padding for numeric types. This is equivalent to a fill character of ‘0’ with an alignment type of ‘=’.

Here’s printing with 0 and without the 0 prefix to width:

>>> print(f"{item_number:+09}, {item_label}, {item_price:+9}")
+00000103, Light Saber,  +256.128
>>> print("{:+09}, {}, {:+9}".format(-item_number, item_label, item_price))
-00000103, Light Saber,  +256.128

As you can see, omitting the 0 will create a 9 character size output but without the 0 padding.

[grouping_option]

What if our item number was really big and we wanted some separators to make the magnitude more clear?

>>> print("{:+09_}, {}, {:+9}".format(-item_number, item_label, item_price))
-1_234_565_654, Light Saber,  +256.128
>>> print(f"{item_number:+09_}, {item_label}, {item_price:+9}")
+1_234_565_654, Light Saber,  +256.128

With commas!

>>> print("{:+09,}, {}, {:+9}".format(-item_number, item_label, item_price))
-1,234,565,654, Light Saber,  +256.128

[.precision]

Okay, now lets truncate our decimals for item_price into just a single precision:

>>> print("{:+09,}, {}, {:+9.1}".format(-item_number, item_label, item_price))
-1,234,565,654, Light Saber,    +3e+02

Oh weird – it’s giving my value in scientific notation. Turns out, when you specify a precision, the default display that gets used is exponent notation. Yes, it’s weird. That said, you can change the display back to fixed point by specifying a specific fixed point type as you’ll see next.

[type]

Lets add f for our item_price float interpolation to make it fixed point display.

>>> print("{:+09,}, {}, {:+9.1f}".format(-item_number, item_label, item_price))
-1,234,565,654, Light Saber,    +256.1

Hope that was helpful! I encourage you to go play around with the formatting options to really get familiar with it. It’s a very powerful formatting tool.

Scraping Roger Ebert’s reviews and finding his favorite movies on Amazon Prime

My wife and I are big fans of the late film critic Roger Ebert. We also share an Amazon prime membership.

I wondered: which of Roger Ebert’s favorite movies are available to watch for free on prime? Since there are hundreds of reviews by Roger Ebert, I had the perfect excuse for writing a web scraper!

In this article, I will:

  • Show my not so pretty scraping code
  • Discuss some roadblocks / gotchas I ran into along the way
  • Share with you the list of movies rated as great by Roger Ebert. That’s what you’re here for, right?

PS: If you just want to see the list of movies, just jump to the end of this article.

Code Quality Warning: I hacked this together as fast as I could without much refactoring, so it’s not the most readable or optimized. But it mostly works… for now.

Roadblocks

I hit a few roadblocks while working on this that I think are worth calling out and will clarify some of the decisions I made in the implementation.

scraping rogerebert.com

Performing a regular GET with an Accept: text/html header (which I think is the default for the requests library) against the url assigned to the variable ebert_url will always return the first page of movies (regardless of what you set the page query parameter to).

Solution? The Accept header field needs to be set to application/json for the server to return JSON containing movies for that specific page.

scraping amazon.com

No public API

First, there is no publicaly available Amazon API for their catalog search. It seems like you could email them to get authorization, but I didn’t want to waste my time doing that.

Not automation friendly

I started off using the requests library. Turns out that if you don’t set a proper browser agent, you’ll get a 503 and some message about how automation isn’t welcome. If you do fake a proper agent but you’re not setting cookies from the server respond, you’ll get:

Sorry, we just need to make sure you’re not a robot. For best results, please make sure your browser is accepting cookies.

I got frustrated and switched over to using a more stateful HTTP tool: mechanize.

That worked… 80% of the time? I noticed that if I run my scraper repeatedly it starts to get the anti-robot message again. Maybe there’s some pattern detection going on on the amazon servers?

Bad HTML …

You’ll notice that I’m using some regex in the function amazon_search to parse out the movie title search results on the page. The reason is that when I tried using beautifulsoup‘s find_all function on the search result tags, I got nothing. My guess is that there’s some invalid HTML on the page and confused the beautifulsoup html.parser parser which isn’t super lenient.

Turns out, rather than using regex, I could have switched over to use the html5lib parser.

For example: BeautifulSoup(match, features="html5lib").

The html5lib parser is the most lenient parser – much more lenient than html.parser. So if I needed to make additional changes to this function, I’d refactor it to use that parser and get rid of the nasty looking regex.

Results

Without further ado, here’s a table of all the great movies movies that are included with prime (sorted by most recent release).

If you want the full dataset, I’ve shared it via this google spreadsheet.

TitleYear ReleasedReview URLPrime URL
Moonstruck1987LinkLink
Fitzcarraldo1982LinkLink
Atlantic City1980LinkLink
Nosferatu the Vampyre1979LinkLink
The Long Goodbye1973LinkLink
“Aguirre, the Wrath of God”1972LinkLink
“The Good, the Bad and the Ugly”1968LinkLink
Gospel According to St. Matthew1964LinkLink
The Man Who Shot Liberty Valance1962LinkLink
Some Like It Hot1959LinkLink
Paths of Glory1957LinkLink
The Sweet Smell of Success1957LinkLink
The Night of the Hunter1955LinkLink
Johnny Guitar1954LinkLink
Beat the Devil1954LinkLink
Sunset Boulevard1950LinkLink
It’s a Wonderful Life1946LinkLink
Detour1945LinkLink
My Man Godfrey1936LinkLink
The General1927LinkLink

Enjoy.

Update (2020-6-10)

Lots of really neat discussion happened when I submitted this to hacker news. I’ll just highlight a few additional resources / things I learned that are useful.

And, of course, that there are fans of roger ebert everywhere. I’m glad some of you found this useful. Thank you.