The collections module from Python's standard library is one of the most valuable toolkits for any developer. It provides specialized data types that go beyond native types like lists, tuples, and dictionaries, solving common problems elegantly and efficiently. According to the official Python documentation for collections, this module implements alternative container data types with specific characteristics for different scenarios.
If you've ever needed to count elements from a list, create an efficient queue, or access multiple dictionaries as a single unit, the collections module has the ideal solution. It perfectly complements Python dictionaries and Python tuples and sets, which are fundamental data structures in the language. In this complete guide, you will learn about each major component of the collections module with practical examples and real-world use cases.
What is the collections module?
The collections module was introduced in Python 2.4 and has been significantly expanded over the years. It provides alternatives to native data types for situations where regular lists, tuples, and dictionaries are not sufficient or efficient. Each class in the module solves a specific problem: maintaining insertion order, providing default values for missing keys, creating lightweight tuple-like objects with named fields, and more.
The Real Python guide on collections offers an excellent introduction to the fundamental concepts, classifying each data structure by its ideal use case. The module is purely implemented in Python, with C optimizations for maximum performance, as you can see in the official CPython source code on GitHub.
Counter: counting like a professional
The Counter class is a dict subclass designed specifically for counting hashable objects. It stores elements as keys and their counts as values, making it trivial to answer questions like "which word appears most frequently in this text?" or "how many times does this number appear in the list?".
Creating and using a Counter
from collections import Counter
From a list
fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
counter = Counter(fruits)
print(counter)
Counter({'apple': 3, 'banana': 2, 'orange': 1})
From a string
letters = Counter("mississippi")
print(letters)
Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
From a dictionary
counter = Counter({'a': 4, 'b': 2, 'c': 1})
Essential Counter methods
Counter offers methods that go far beyond simple counting. The most_common() method returns the n most frequent elements, ideal for rankings:
from collections import Counter
sales = Counter(['iphone', 'iphone', 'samsung', 'iphone', 'xiaomi', 'samsung'])
print(sales.most_common(2))
[('iphone', 3), ('samsung', 2)]
Arithmetic operations between Counters are naturally supported:
c1 = Counter(a=3, b=1, c=2)
c2 = Counter(a=1, b=2, c=3)
print(c1 + c2) # Sum: Counter({'c': 5, 'a': 4, 'b': 3})
print(c1 - c2) # Subtraction: Counter({'a': 2}) (only positives)
print(c1 & c2) # Intersection (minimum): Counter({'a': 1, 'b': 1, 'c': 2})
print(c1 | c2) # Union (maximum): Counter({'c': 3, 'a': 3, 'b': 2})
The Counter class is widely used in natural language processing, log analysis, and any scenario requiring frequency counting. Check the official documentation on Counter objects for complete API details.
defaultdict: dictionaries with default values
How many times have you written code like this?
dictionary = {}
for key, value in data:
if key not in dictionary:
dictionary[key] = []
dictionary[key].append(value)
defaultdict eliminates this repetition. It is a dict subclass that calls a factory function to provide default values when a missing key is accessed:
from collections import defaultdict
defaultdict with list as factory
data = [('a', 1), ('b', 2), ('a', 3), ('c', 4), ('b', 5)]
dd = defaultdict(list)
for key, value in data:
dd[key].append(value)
print(dict(dd))
{'a': [1, 3], 'b': [2, 5], 'c': [4]}
Common defaultdict use cases
The most commonly used factories with defaultdict are list, set, int, and dict:
from collections import defaultdict
defaultdict(int) for automatic counting
count = defaultdict(int)
for word in ["one", "two", "one", "three", "one", "two"]:
count[word] += 1
print(dict(count))
{'one': 3, 'two': 2, 'three': 1}
defaultdict(set) for sets
grouping = defaultdict(set)
grouping['even'].add(2)
grouping['even'].add(4)
grouping['odd'].add(1)
print(dict(grouping))
{'even': {2, 4}, 'odd': {1}}
defaultdict(dict) for nested dictionaries
nested = defaultdict(dict)
nested['user1']['name'] = 'Alice'
nested['user2']['name'] = 'Bob'
print(dict(nested))
{'user1': {'name': 'Alice'}, 'user2': {'name': 'Bob'}}
defaultdict is especially useful when processing grouped data and building nested structures. See the official documentation on defaultdict for more examples and advanced use cases.
namedtuple: tuples with field names
The namedtuple() function creates tuple classes whose fields can be accessed both by index and by name. This combines the immutability and efficiency of tuples with the readability of dictionaries:
from collections import namedtuple
Defining a namedtuple
Point = namedtuple('Point', ['x', 'y'])
p1 = Point(10, 20)
p2 = Point(x=30, y=40)
print(p1.x, p1.y) # 10 20 (name access)
print(p2[0], p2[1]) # 30 40 (index access)
Clean representation
print(p1) # Point(x=10, y=20)
Namedtuples in real applications
Namedtuples are ideal for representing lightweight records without the overhead of a full class. They are immutable, consume less memory than dictionaries, and are more readable than plain tuples:
from collections import namedtuple
Employee record
Employee = namedtuple('Employee', ['id', 'name', 'role', 'salary'])
employees = [
Employee(1, 'Alice Smith', 'Data Engineer', 12000),
Employee(2, 'Bob Johnson', 'Data Scientist', 15000),
Employee(3, 'Carol Williams', 'Python Developer', 10000),
]
Filtering with list comprehension
engineers = [e for e in employees if 'Data' in e.role]
print(engineers[0].name) # Alice Smith
Beyond named fields, namedtuple provides the _asdict() method for dictionary conversion and _replace() for creating a new instance with altered fields:
p = Point(10, 20)
print(p._asdict()) # {'x': 10, 'y': 20}
p2 = p._replace(x=50)
print(p2) # Point(x=50, y=20)
According to the official documentation on namedtuple, this function is especially useful for replacing plain tuples when code readability matters.
deque: efficient queues and stacks
The deque (double-ended queue) class is a list optimized for insertions and removals at both ends. While a Python list has O(n) complexity for inserting or removing at the beginning, deque offers O(1) for these operations:
from collections import deque
Creating a deque
queue = deque(['a', 'b', 'c'])
Add to the end
queue.append('d')
print(queue) # deque(['a', 'b', 'c', 'd'])
Add to the beginning
queue.appendleft('z')
print(queue) # deque(['z', 'a', 'b', 'c', 'd'])
Remove from the end
last = queue.pop()
print(last) # d
Remove from the beginning
first = queue.popleft()
print(first) # z
Rotating and limiting the deque
The deque offers two powerful features: rotation and maximum length:
from collections import deque
Rotation (useful for games and circular algorithms)
d = deque([1, 2, 3, 4, 5])
d.rotate(2)
print(d) # deque([4, 5, 1, 2, 3])
d.rotate(-1)
print(d) # deque([5, 1, 2, 3, 4])
Maximum length (automatic circular buffer)
buffer = deque(maxlen=3)
buffer.append(1)
buffer.append(2)
buffer.append(3)
print(buffer) # deque([1, 2, 3], maxlen=3)
buffer.append(4) # Automatically removes oldest element
print(buffer) # deque([2, 3, 4], maxlen=3)
deque is the ideal structure for implementing task queues, browsing history (with maxlen), circular buffers, and sliding window algorithms. Check the official documentation on deque objects for a complete view of all available methods.
OrderedDict: dictionaries with guaranteed order
Before Python 3.7, regular dictionaries did not guarantee insertion order. OrderedDict was created to fill this gap. Today, with natively ordered dictionaries, its main differentiator is the move_to_end() method:
from collections import OrderedDict
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
Move key 'a' to the end
od.move_to_end('a')
print(od)
OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])
Move 'd' to the beginning
od.move_to_end('d', last=False)
print(od)
OrderedDict([('d', 4), ('b', 2), ('c', 3), ('a', 1)])
Additionally, OrderedDict considers order when comparing equality, unlike regular dictionaries:
from collections import OrderedDict
od1 = OrderedDict([('a', 1), ('b', 2)])
od2 = OrderedDict([('b', 2), ('a', 1)])
print(od1 == od2) # False (order matters!)
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 2, 'a': 1}
print(dict1 == dict2) # True (order doesn't matter)
OrderedDict is ideal for implementing LRU (Least Recently Used) caches and any structure that needs to track access or insertion order. See the official documentation on OrderedDict for more details and examples.
ChainMap: multiple dictionaries as one
The ChainMap class groups multiple dictionaries or mappings into a single searchable view. Lookups traverse the dictionaries in the order they were passed, returning the first value found:
from collections import ChainMap
Configuration with precedence
defaults = {'theme': 'light', 'language': 'en-US', 'notifications': True}
user = {'language': 'pt-BR', 'notifications': False}
environment = {'theme': 'dark'}
config = ChainMap(environment, user, defaults)
print(config['theme']) # 'dark' (from environment)
print(config['language']) # 'pt-BR' (from user)
print(config['notifications']) # False (from user)
Updates affect only the first dictionary
config['theme'] = 'high-contrast'
print(environment['theme']) # 'high-contrast'
ChainMap is extremely useful for managing configurations with different precedence levels (default → user → environment), processing command-line arguments combined with defaults, and variable scopes in interpreters:
from collections import ChainMap
Variable scope simulating an interpreter
global_scope = {'x': 10, 'y': 20, 'name': 'global'}
local_scope = {'x': 5, 'z': 30}
scope = ChainMap(local_scope, global_scope)
print(scope['x']) # 5 (local)
print(scope['y']) # 20 (global)
print(scope['z']) # 30 (local)
Adding a new scope
scope = scope.new_child({'x': 1, 'w': 100})
print(scope['x']) # 1 (new local scope)
According to the official documentation on ChainMap, this class is particularly useful when you need to manage multiple namespaces without merging them.
Other collections tools
Beyond the main classes, the collections module offers other valuable tools:
UserDict, UserList, and UserString
These classes are wrappers that make it easier to create subclasses of dict, list, and string. Unlike inheriting directly from these native types, these classes expose attributes like .data for accessing internal content, simplifying customization:
from collections import UserDict
class LowercaseDict(UserDict):
def setitem(self, key, value):
key = key.lower()
super().setitem(key, value)
d = LowercaseDict()
d['Name'] = 'Alice'
print(d.data) # {'name': 'Alice'}
Best practices with collections
To get the most out of the collections module, consider the following recommendations:
- Use Counter instead of implementing your own counting logic with dictionaries — the code is more readable and efficient
- Prefer defaultdict whenever you need to check if a key exists before accessing it; this eliminates repetitive
if key in dictblocks - Choose namedtuple over plain tuples when data has semantic meaning; your code becomes self-documenting
- Use deque for queues and stacks instead of lists when there are frequent operations at the beginning of the collection
- Reach for OrderedDict when insertion order matters for your algorithm logic
- Adopt ChainMap for managing configurations with multiple precedence layers without merging dictionaries
Performance: collections vs native types
An often underestimated aspect is the performance gain from using the right classes from the collections module. Here is a practical comparison:
from collections import deque, Counter, defaultdict
import time
deque vs list for front insertion
n = 100000
lst = []
start = time.time()
for i in range(n):
lst.insert(0, i)
print(f"list.insert(0): {time.time() - start:.3f}s")
dq = deque()
start = time.time()
for i in range(n):
dq.appendleft(i)
print(f"deque.appendleft: {time.time() - start:.3f}s")
Typical result: deque is 100x faster
The collections module implements each class with the most appropriate data structure for its purpose. deque, for example, is implemented as an array of fixed blocks (double-ended queue), while Counter inherits the C-optimized implementation of the native dict. For detailed benchmarks, see the performance guide in the official documentation.
Conclusion
The collections module is one of the most useful libraries in Python's stdlib. It provides elegant and efficient solutions for recurring programming problems: frequency counting with Counter, default values with defaultdict, named records with namedtuple, efficient queues with deque, guaranteed order with OrderedDict, and chained scopes with ChainMap.
Mastering these tools not only makes your code cleaner and more readable, but also significantly improves your application's performance. Each class was designed for a specific set of problems, and knowing which one to choose is a hallmark of an experienced Python developer.
To continue your studies, explore the complete official documentation of the collections module and practice implementing the examples from this guide in your own projects. Deep knowledge of Python's standard tools is one of the greatest differentiators of an efficient programmer.