How Iterables, Iterators, and Generators work in plain English

Iteration, Iterators, iterables, and generators are concepts in python that have unique meanings. Lets discuss the term iteration which is common among many programming languages. Iteration takes an item one-by-one; it’s synonymous with looping. If you count all of the eggs in a carton then you’re iterating through the carton of eggs. An iterable is any object that can be iterated over; strings, lists, dictionaries, and files are all examples of iterables. An iterable object has a builtin __iter__ method which returns an iterator, and a __getitem__ method that accepts sequential indexes starting at 0.

It raises an IndexError when the indexes are no longer valid. An iterator object has a next() method (Python 2) or __next__ (Python 3). Every time you use the for loop the __next__() method is called automatically to get each item from the iterator and therefore engage in iteration. The relationship between an iterable and an iterator is that you obtain an iterator from an iterable object. Generators are according to the python docs a simple and powerful tool for creating iterators.

If you know how to write functions in python then you can easily create your own generators as they’re in essence functions that use the yield keyword to return data. Every time the __next__() method is called on a generator, it picks up last where it left off at. Anything that can be done with a generator can also be done with class based iterators, but an advantage of generators is that they’re compact; they automatically contain the __iter__() and __next__() methods. When generators terminate they automatically raise the StopIteration exception. Let’s step through these concepts bit-by-bit to gain a better understanding.

Iteration

Below is an example of iterating over an iterable x which is a list of integers:

>>> x = [1, 3, 2, 5, 6, 8]
>>> for y in x:
...     print(y)
... 
1
3
2
5
6
8

To see the attributes that’s available in x, type the following in the python interpreter:

>>> dir(x)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

According to the first paragraph, an iterable object contains the __iter__() and __getitem__() methods which are there in the output. As you can see in the above output those attributes are indeed there.

The relationship between iterators and iterables in python

Python has a built in function known as iter(). The details for it is listed below:

iter(object[, sentinel])

This function returns an iterator object. The first argument is processed differently if there’s not a second argument. If the second argument is missing then the object passed must be a collection object that supports the iteration protocol ___iter__() method, or the sequence protocol __getitem__() with integer arguments that start at 0. If the second argument or sentinel is given, then the object must be a callable one. To understand the relationship between iterables and iterators in python let’s take a look at some code snippets:

>>> message = ['H', 'e', 'l', 'l', 'o']
>>> dir(message)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

The variable message stores a list which is an iterable object. However, since a list is a collection object that contains the __iter__() method, we can convert the iterable object to an iterator as shown below:

>>> message = iter(message)
>>> message

>>> dir(message)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__length_hint__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']

Since message is an iterator object it has the __next__() method which allows us to get the next item in the list as shown below:

>>> message.__next__()
'H'
>>> message.__next__()
'e'
>>> message.__next__()
'l'
>>> message.__next__()
'l'
>>> message.__next__()
'o'
>>> message.__next__()
Traceback (most recent call last):
  File "", line 1, in 
StopIteration
>>> 

If __next__() is ran when there are no elements in the list, a StopIteration exception will occur.

How to Create an Iterator Class

To create an object/iterator class that behaves as an iterator you have to implement the __iter__() and __next__() methods. Below is a simple class that implements both of these methods and therefore can be treated like an iterator:

>>> class CreateNums:
...     def __init__(self, count):
...         self.count = count 
...     def __iter__(self):
...         return self
...     def __next__(self):
...         self.count += 1
...         return self.count 
...   

>>> a1 = CreateNums(5)
>>> next(a1)
6
>>> next(a1)
7
>>> next(a1)
8
>>> next(a1)
9
>>> next(a1)
10

Notice that next() is called instead of __next__()? The next() function simply calls __next__(), so they can be used interchangeably but I think next() is easier to type and remember ;).

Generators in python

Let’s assume that we have a list and that we want to triple the elements in it. What’s one way we can do this?

>>> new_list = []
>>> current_list = [1, 2, 3, 4, 5]
>>> for x in current_list:
...         x *= 3
...         print('x = {}'.format(x))
... 

x = 3
x = 6
x = 9
x = 12
x = 15

The above code can be replicated using a generator expression as shown below:

>>> y = (x for x in range(1, 6))
>>> next(y)
1
>>> next(y)
2
>>> next(y)
3
>>> next(y)
4
>>> next(y)
5
>>> next(y)
Traceback (most recent call last):
  File "", line 1, in 

The logic for generator expressions are very similar to that of list comprehensions.

The above code generates a range of numbers in the range of 1…5, and then multiples each number by three. However, instead of returning all of the numbers at once, each number can be fetch ala carte via the next() function until there’s no more elements left. With generators, you can only move forward. The syntactical difference between a generator and a list comprehension is that list comprehensions are in square brackets while a generator is in parentheses. Python enables list, set, and dictionary comprehensions. However, there’s no such thing as a tuple comprehension because the state of a tuple is immutable. Below is an example of a generator function:

>>> def generate_fun(n):
...     while True:
...         yield n
...         n += 1

The above code snippet,this is an infinite loop. However, instead of running out of memory, the generator just returns the next value with each subsequent next() call as shown below:

>>> a = generate_fun(10)
>>> next(a)
10
>>> next(a)
11
>>> next(a)
12
>>> next(a)
13
>>> next(a)
14
>>> next(a)
15
>>> next(a)
16

A generator function uses the yield keyword to return data. Just like in a normal function you can have multiple return statements, the same thing is perfectly fine with yield statements in generators. Generators allow you to generate values on the fly one-by-one instead of having all of the memory consumed. Let’s look at the classic Fibonacci sequence:

def f_nums(n):
    count = 0
    a, b = 0, 1
    while count < n:
        a, b = b, b + a
        count += 1
        print(a, end=' ')

f_nums(10)

1 1 2 3 5 8 13 21 34 55 89 144 233 377 610

However, the above coded as a generator is listed below:

def fib_nums():
    """ computes fib number up until num """
    a, b = 0, 1
    while 1:
        yield a
        a, b = b, a + b

fib = fib_nums()
print(next(fib))
print(next(fib))
print(next(fib))
print(next(fib))
print(next(fib))
print(next(fib))
print(next(fib))

0
1
1
2
3
5
8

There’s two updates in the generator function. One, it’s technically an infinite loop with the while 1 statement that’s never exited out of, and two, instead of a print statement there’s a yield statement. If a function has at least one yield statement then that means that it’s automatically a generator.

============================================================================ Want to learn how to use Python's most popular IDE Pycharm? In the free pdf guide "Getting the Hang of PyCharm" you'll learn all of the amazing features in PyCharm along with how to get started with data science. Subscribe to the Purcell Consult newsletter and get started A.S.A.P.