If you have ever written a for loop in Python, you have already used iterators — even if you did not realize it. But when you need to process large data volumes, create infinite streams, or simply write more elegant code, understanding generators and iterators becomes essential.
In this definitive guide, you will learn what iterators and generators are, how they work internally, and how to use them to transform the way you write Python. With practical examples, you will see how these tools can reduce memory consumption, simplify your code, and open doors to more advanced programming patterns.
What Are Iterators?
In Python, an iterator is an object that implements the iteration protocol: the methods __iter__() and __next__(). The first returns the iterator itself, and the second returns the next element in the sequence. When there are no more elements, __next__() raises the StopIteration exception.
Virtually all data structures in Python are iterable: lists, tuples, dictionaries, sets, strings. When you write for item in lista, Python internally calls iter(lista) to get an iterator and then calls next() on each iteration.
lista = [10, 20, 30]
iterator = iter(lista)
print(next(iterator)) # 10
print(next(iterator)) # 20
print(next(iterator)) # 30
# print(next(iterator)) # StopIteration
Understanding this mechanism is the first step toward creating your own custom iterators and, more importantly, mastering generators — the most elegant way to create iterators in Python.
The Iteration Protocol
Any object that implements both __iter__ and __next__ is considered an iterator. The __iter__ method must return the iterator object itself, and __next__ must return the next value or raise StopIteration.
class Counter:
def __init__(self, limit):
self.limit = limit
self.current = 0
def __iter__(self):
return self
def __next__(self):
if self.current >= self.limit:
raise StopIteration
value = self.current
self.current += 1
return value
for number in Counter(5):
print(number) # 0, 1, 2, 3, 4
While this works perfectly, creating iterators with classes is verbose. That is where generators come in.
What Are Generators?
A generator is a special function that uses the yield keyword instead of return. Unlike a normal function that runs and terminates, a generator function maintains its state between calls — it "pauses" at each yield and resumes from where it left off on the next call.
When you call a generator function, it does not execute immediately. Instead, it returns a generator object (which is an iterator). Each time you call next() on it, the function runs until the next yield and returns that value.
def simple_counter():
print("Starting...")
yield 1
print("Continuing...")
yield 2
print("Finishing...")
yield 3
gen = simple_counter()
print(next(gen)) # Starting... 1
print(next(gen)) # Continuing... 2
print(next(gen)) # Finishing... 3
That is the magic of generators: they enable lazy evaluation — values are computed on demand, only when needed.
Yield vs Return: What Is the Difference?
The fundamental difference between yield and return is the state behavior:
return: terminates the function permanently and returns a value.yield: pauses the function, saves its entire state (local variables, code position) and returns a value. On the next call, the function resumes exactly where it left off.
A generator function can have multiple yield statements and can even mix yield with return. When return is used in a generator, it terminates iteration by raising StopIteration, and the value passed to return is available through the exception (though this is rarely used).
For a complete technical deep dive, check the PEP 255 — Simple Generators, which officially introduced generators in Python 2.2.
Generator Expressions
Just as you have list comprehensions to concisely create lists, there are generator expressions to create generators. The syntax is nearly identical but uses parentheses instead of square brackets.
# List comprehension — creates a full list in memory
squares_list = [x**2 for x in range(1000000)] # ~ 8 MB of memory
# Generator expression — creates a lazy generator
squares_gen = (x**2 for x in range(1000000)) # ~ 56 bytes of memory
print(next(squares_gen)) # 0
print(next(squares_gen)) # 1
print(next(squares_gen)) # 4
The memory difference is astronomical: the list comprehension allocates space for one million integers at once, while the generator expression creates a generator that occupies a few bytes and computes each value on demand.
Generator expressions were introduced in Python 2.4 by the PEP 289 — Generator Expressions, which details the motivation and design of this feature.
If you want to better understand list comprehensions before moving to generator expressions, check our complete guide: List Comprehension in Python: Complete Guide.
The Yield From Expression
Introduced in Python 3.3 (PEP 380), yield from allows a generator to delegate iterations to another generator or iterable. This simplifies generator composition and avoids verbose nesting.
def main_generator():
yield from [1, 2, 3]
yield from range(4, 7)
yield from "ab"
for value in main_generator():
print(value, end=" ") # 1 2 3 4 5 6 a b
Without yield from, you would have to manually iterate over each sub-iterator:
def generator_without_yield_from():
for item in [1, 2, 3]:
yield item
for item in range(4, 7):
yield item
for item in "ab":
yield item
The yield from expression also handles bidirectional communication between generators automatically — values sent via send() are propagated to the delegated generator, and exceptions are properly transmitted.
Infinite Generators
One of the most impressive applications of generators is creating infinite sequences. Since values are computed on demand, you can represent theoretically infinite sequences without consuming infinite memory.
def infinite_fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
fib = infinite_fibonacci()
for _ in range(10):
print(next(fib), end=" ") # 0 1 1 2 3 5 8 13 21 34
With infinite generators, you control exactly how many values to consume. The generator does not know it is "infinite" — it simply keeps producing values as long as you keep calling next().
The itertools standard library module provides several tools for working with generators and iterators, including functions like islice() that lets you "slice" infinite generators.
Practical Applications
1. Processing Large Files
Generators shine when you need to process files larger than available memory. A generator can read and process the file line by line:
def file_lines(path):
with open(path, 'r', encoding='utf-8') as file:
for line in file:
yield line.strip()
def lines_with_word(lines, word):
for line in lines:
if word in line:
yield line
def count_occurrences(lines, word):
return sum(1 for line in lines if word in line)
# Chained usage — constant memory!
total = count_occurrences(file_lines("data.csv"), "Python")
print(f"Occurrences: {total}")
Each generator does one thing and does it well. Chaining keeps memory usage constant — at no point is the entire file in RAM.
2. Data Pipelines
Generators are ideal for building data pipelines where each stage is an independent generator:
def read_sensors():
import random
while True:
yield random.uniform(20, 30)
def filter_values(data, minimum, maximum):
for value in data:
if minimum <= value <= maximum:
yield value
def normalize(data, minimum, maximum):
for value in data:
yield (value - minimum) / (maximum - minimum)
pipeline = normalize(
filter_values(read_sensors(), 22, 28),
22, 28
)
for _ in range(5):
print(f"{next(pipeline):.3f}")
3. Lazy Evaluation in Data Science
In data science, generators prevent loading entire datasets into memory. Libraries like pandas integrate well with generators for streaming data processing.
def batches(data, size):
"""Splits an iterable into fixed-size batches."""
batch = []
for item in data:
batch.append(item)
if len(batch) == size:
yield batch
batch = []
if batch:
yield batch
data = range(100)
for i, batch in enumerate(batches(data, 10)):
print(f"Batch {i}: {list(batch)}")
if i >= 3:
break
Bidirectional Iteration with send()
Generators are not just one-way data producers. The send() method lets you send values back into the generator, which receives them as the value of the yield expression. This turns generators into simple coroutines.
def corrector():
print("Waiting for correction...")
while True:
value = yield
print(f"Correcting: {value ** 2}")
c = corrector()
next(c) # Initialize the generator
c.send(5) # Correcting: 25
c.send(10) # Correcting: 100
Although async/await has replaced this pattern for modern coroutines, understanding send() is useful for debugging and for specific generator communication patterns.
Comparison: Generators vs Lists vs Iterators
| Feature | List | Iterator (class) | Generator |
|---|---|---|---|
| Memory | High (everything in RAM) | Low | Very low |
| Re-iterable | Yes | No (single use) | No (single use) |
| Random access | Yes (indexing) | No | No |
| Syntax | [...] |
Verbose | Concise |
| Lazy | No | Yes | Yes |
| Can be infinite | No | Yes | Yes |
Each approach has its place. Lists are great for small collections or when you need random access. Iterators are suitable when you need fine-grained control. Generators are the ideal choice for most sequential processing cases, combining simplicity with efficiency.
Generators in Practice: Web Scraping Example
Let us look at a practical example that combines multiple concepts. Suppose you need to scrape multiple pages from a website and extract information:
import time
def api_pages(base_url, total_pages):
"""Generates page URLs on demand."""
for i in range(1, total_pages + 1):
yield f"{base_url}?page={i}"
def download_pages(urls):
"""Simulates download (replace with aiohttp or httpx)."""
for url in urls:
time.sleep(0.5) # Simulates latency
yield f"Data from {url}"
def extract_data(responses):
"""Extracts relevant information from each response."""
for response in responses:
yield {
"source": response,
"items": len(response),
"timestamp": time.time()
}
urls = api_pages("https://api.example.com/data", 100)
responses = download_pages(urls)
data = extract_data(responses)
# Consumes only 20 pages, no waste
for i, item in enumerate(data):
if i >= 20:
break
print(f"{item['source']} - {item['items']} items")
This pipeline pattern with generators keeps code clean, modular, and efficient. Each stage is independently testable and memory consumption is minimal.
If you work with web scraping, also check our article on Lambda, Map, Filter, and Reduce in Python — functions that integrate perfectly with generators.
Tips and Best Practices
1. Prefer Generators over Large Lists
Whenever you do not need to access data more than once or need random access, prefer generators over lists. The performance and memory difference is significant, especially with large volumes.
2. Use itertools for Composition
The itertools module offers functions like chain(), zip_longest(), islice(), takewhile(), and dropwhile() that compose perfectly with generators. Combining itertools with generator expressions is one of the most expressive ways to write Python.
3. Be Careful with Side Effects
Generators are ideal for pure computation — not for operations with side effects like writing to files or modifying global variables. If you need side effects, be explicit and document them.
4. Do Not Overdo It
For small collections (a few dozen items), a simple list is perfectly acceptable. Premature optimization is the root of all evil — use generators when it makes sense, not out of dogma.
References and Further Reading
To continue your studies on generators and iterators, check these reliable sources:
- Official Documentation — Iterators
- Official Documentation — Generators
- PEP 255 — Simple Generators
- PEP 289 — Generator Expressions
- Real Python: Introduction to Python Generators
- Python Wiki: Generators
- itertools — Standard Library
- Yield Expression — Language Reference
Conclusion
Generators and iterators are fundamental tools in every Python developer's arsenal. They allow you to write cleaner, more efficient, and more expressive code, especially when dealing with large data volumes or continuous streams.
The secret lies in lazy evaluation: computing only what is needed, when it is needed. This not only saves memory but also makes code more modular and testable.
Now that you have mastered these concepts, practice! Create your own generators, experiment with yield from, explore the itertools module, and see how these tools can transform the quality of your Python code.
For more Python content, keep exploring Universo Python!