Back to Posts
A chaotic pile of assorted books and printed materials haphazardly strewn about, visually representing the need for best practices for Python custom collections to organize data effectively.

Advanced Python: Custom Collections Made Easy

Have you ever considered why it is considered bad practice to subclass the list, dict, set, and tuple classes? Today, I will delve into the reasons behind this practice and present alternative classes that streamline the implementation and extension of built-in-like objects.

Why it’s considered bad practice

Subclassing built-in types like dict, set, list, and tuple can introduce complications due to method delegation, initialization quirks, unexpected behavior from internal optimizations, and potential performance hits. These issues stem from the fact that built-in types are implemented in C, leading to some methods not being overridden as expected or performing sub-optimally when subclassed. Instead, Python offers more robust alternatives through composition and the abstract base classes in the collections module, which are specifically designed for extension and customization without the pitfalls of direct subclassing.

Note

Mixin classes are generally considered bad practice, as is multiple inheretance. So this post is taking the path of least evil, as this is the tooling the standard library provides for this task.

If you are not interested in inheriting behavior, a Protocol based approach may be more suitable. These classes should only generally be used when implementing built-in-like objects where you intend to override core behaviors.

The standard library is littered with weird code due to many additions to features over the years. This particular oddity is a result of the addition of type hints.

What are the alternatives?

Python provides a number of alternatives in the collections module. For simple dicts and lists, there are the UserDict and UserList classes. But for Tuples. Sets and more complex list and dict like objects we have a number of abstract mixin classes that allow us to create our own implementations. Provided by the collections.abc module, these classes are:

  • dict => MutableMapping
  • set => MutableSet
  • list=>MutableSequence
  • tuple => does not have a direct mixin alternative, and instead Sequence and Hashable would have to be used.

This is not an exhaustive list of the mixin classes, see a complete list here.

Implementing the Mixins

Each of these classes require you to implement a number of methods, and by doing so, Python will provide the rest of the interface based on these methods. Here are the basic interfaces, and the extra functions they provide.

MutableMapping

To achieve a dict like interface, we would need to implement these method:

class MyMapping(MutableMapping):
    def __getitem__(self, key):
        ...
    def __setitem__(self, key, item):
        ...
    def __delitem__(self, key):
        ...
    def __iter__(self):
        ...
    def __len__(self):
        ...

This will provide the following methods:

  • __contains__ `
  • __eq__
  • __ne__
  • keys
  • items
  • values
  • get
  • popitem
  • clear
  • update
  • setdefault

MutableSequence

Just like the previous example, we implement the basic methods, and this will provide us with a list like interface.

class MySequence(MutableSequance):
    def __getitem__(self, index):
        ...
    def __setitem__(self, index, item):
        ...
    def __delitem__(self, index):
        ...
    def __iter__(self):
        ...
    def __len__(self):
        ...
    def insert(self, item)

This will offer the following methods:

  • __contains__
  • __iter__
  • __reversed__
  • __iadd__
  • index
  • count
  • append
  • clear
  • reverse
  • extend
  • pop
  • remove

MutableSet

Like the previous examples, it only takes a handful of methods to provide us with a set like interface.

class MySet(MutableSet):
    def __contains__(self, item):
        ...
    def __iter__(self):
        ...
    def __len__(self):
        ...
    def add(self, item):
        ...
    def discard(item):
        ...

This will offer the following methods:

  • __le__
  • __lt__
  • __eq__
  • __ne__
  • __gt__
  • __ge__
  • __and__
  • __or__
  • __sub__
  • __xor__
  • __ior__
  • __iand__
  • __ixor__
  • __isub__
  • isdisjoint
  • clear
  • pop
  • remove

Hashable & Sequence

This example differs slightly from the previous example, here we have to make use of multiple inheritance to enable the creation of the tuple like interface. This is because Sequence provides the collection interface, but does not provide the Hashable interface, which is required of a tuple like object.

class MyHashableSequance(Hashable, Sequence):
    def __hash__(self):
        ...
    def __getitem__(self, index):
        ...
    def __iter__(self):
        ...
    def __len__(self):
        ...

This will offer the following methods:

  • __contains__
  • __iter__
  • __reversed__
  • index
  • count

As you can see, by implementing a minimal number of methods, we can benefit from a wealth of additional functionality. If necessary, we also have the option to override those methods. These classes will align with the same interface as the pre-existing types, enabling them to be utilized interchangeably. This provides a robust and extensible alternative to extending built-in collections through inheritance.

Final thoughts

The collections.abc module offers a streamlined way to extend Python’s data structures efficiently and in line with the language’s design principles.

The abstract base classes in collections.abc ensure that custom collections adhere to Python’s expected interfaces and behaviours, enhancing usability and integration with other parts of the Python ecosystem. This consistency is especially beneficial in large-scale applications, where predictable data structure behaviour is essential.

Ultimately, it is more advantageous to utilize collections.abc instead of directly subclassing built-in types like list, dict, set, and tuple. By embracing this Pythonic approach, custom collections become more robust, maintainable, and well-integrated into the Python environment. When extending a built-in data structure, consider turning to the collections.abc module first.

Improve your code with my 3-part code diagnosis framework

Watch my free 30 minutes code diagnosis workshop on how to quickly detect problems in your code and review your code more effectively.

When you sign up, you'll get an email from me regularly with additional free content. You can unsubscribe at any time.

Recent posts