Iterables in Python are objects and containers that could be stepped through one item at a time, usually using a
for ... in loop. Not all objects can be iterated, for example - we cannot iterate an integer, it is a singular value. The best we can do here is iterate on a range of integers using the
range type which helps us iterate through all integers in the range
Since integers, individualistically, are not iterable, when we try to do a
for x in 7, it raises an exception stating
TypeError: 'int' object is not iterable. So what if, we change the Python's source code and make integers iterable, say every time we do a
for x in 7, instead of raising an exception it actually iterates through the values
[0, 7). In this essay, we would be going through exactly that, and the entire agenda being:
What is a Python iterable?
What is an iterator protocol?
Changing Python's source code and make integers iterable, and
Why it might be a bad idea to do so?
Any object that could be iterated is an Iterable in Python. The list has to be the most popular iterable out there and it finds its usage in almost every single Python application - directly or indirectly. Even before the first user command is executed, the Python interpreter, while booting up, has already created
406 lists, for its internal usage.
In the example below, we see how a list
a is iterated through using a
for ... in loop and each element can be accessed via variable
range is a python type that allows us to iterate on integer values starting with the value
start and going till
end while stepping over
step values at each time.
range is most commonly used for implementing a C-like
for loop in Python. In the example below, the
for loop iterates over a
range that starts from
0, goes till
7 with a step of
1 - producing the sequence
range other iterables are -
dict. Python also allows us to create custom iterables by making objects and types follow the Iterator Protocol.
Iterators and Iterator Protocol
Python, keeping things simple, defines iterable as any object that follows the Iterator Protocol; which means the object or a container implements the following functions
__iter__should return an iterator object having implemented the
__next__should return the next item of the iteration and if items are exhausted then raise a
So, in a gist,
__iter__ is something that makes any python object iterable; hence to make integers iterable we need to have
__iter__ function set for integers.
Iterable in CPython
The most famous and widely used implementation of Python is CPython where the core is implemented in pure C. Since we need to make changes to one of the core datatypes of Python, we will be modifying CPython, add
__iter__ function to an Integer type, and rebuild the binary. But before jumping into the implementation, it is important to understand a few fundamentals.
Every object in Python is associated with a type and each type is an instance of a struct named PyTypeObject. A new instance of this structure is effectively a new type in python. This structure holds a few meta information and a bunch of C function pointers - each implementing a small segment of the type's functionality. Most of these "slots" in the structure are optional which could be filled by putting appropriate function pointers and driving the corresponding functionality.
Among all the slots available, the slot that interests us is the
tp_iter slot which can hold a pointer to a function that returns an iterator object. This slot corresponds to the
__iter__ function which effectively makes the object iterable. A non
NULL value of this slot indicates iterability. The
tp_iter holds the function with the following signature
Integers in Python do not have a fixed size; rather the size of integer depends on the value it holds. How Python implements super long integers is a story on its own but the core implementation can be found at longobject.c. The instance of
PyTypeObject that defines integer/long type is
PyLong_Type and has its
tp_iter slot set to
NULL which asserts the fact that Integers in python are not iterable.
NULL value for
int object not iterable and hence if this slot was occupied by an appropriate function pointer with the aforementioned signature, this could well make any integer iterable.
Now we implement the
tp_iter function on integer type, naming it
long_iter, that returns an iterator object, as required by the convention. The core functionality we are looking to implement here is - when an integer
n is iterated, it should iterate through the sequence
[0, n) with step
1. This behavior is very close to the pre-defined
range type, that iterates over a range of integer values, more specifically a
range that starts at
0, goes till
n with a step of
We define a utility function in
rangeobject.c that, given a python integer, returns an instance of
longrangeiterobject as per our specifications. This utility function will instantiate the
longrangeiterobject with start as
0, ending at the long value given in the argument, and step as
1. The utility function is as illustrated below.
The utility function
PyLongRangeIter_ZeroToN is defined in
rangeobject.c and will be declared in
rangeobject.h so that it can be used across the CPython. Declaration of function in
rangeobject.h using standard Python macros goes like this
The function occupying the
tp_iter slot will receive the
self object as the input argument and is expected to return the iterator instance. Hence, the
long_iter function will receive the python integer object (self) that is being iterated as an input argument and it should return the iterator instance. Here we would use the utility function
PyLongRangeIter_ZeroToN, we just defined, which is returning us an instance of range iterator. The entire
long_iter function could be defined as
Now that we have
long_iter defined, we can place the function on the
tp_iter slot of
PyLong_Type that enables the required iterability on integers.
Once we have everything in place, the entire flow goes like this -
Every time an integer is iterated, using any iteration method - for example
for ... in, it would check the
tp_iter of the
PyLongType and since now it holds the function pointer
long_iter, the function will be invoked. This invocation will return an iterator object of type
longrangeiterobject with a fixed start, index, and step values - which in pythonic terms is effectively a
range(0, n, 1). Hence the
for x in 7 is inherently evaluated as
for x in range(0, 7, 1) allowing us to iterate integers.
Integer iteration in action
Once we build a new python binary with the aforementioned changes, we can see iterable integers in actions. Now when we do
for x in 7, instead of raising an exception, it actually iterates through values
Why it is not a good idea
Although it seems fun, and somewhat useful, to have iterable integers, but it is really not a great idea. The core reason for this is that it makes unpacking unpredictable. Unpacking is when you unpack an iterable and assign it to multiple variables. For example:
a, b = 3, 4 will assign 3 to a and 4 to b. So assigning
a, b = 7 should be an error because there is just one value on the right side and multiple on the left.
Unpacking treats right-hand size as iterable and tries to iterate on it; and now since Integers are iterable the right-hand side, post iteration yields 7 values which the left-hand side has mere 2 variables; Hence it raises an exception
ValueError: too many values to unpack (expected 2).
Things would work just fine if we do
a, b = 2 as now the right-hand side, post iteration, has two values and the left-hand side has two variables. Thus two very similar statements result in two very different outcomes, making unpacking unpredictable.
In this essay, we modified the Python's source code and made integers iterable. Even though it is not a good idea to do so, but it is fun to play around with the code and make changes in our favorite programming language. It helps us get a detailed idea about core python implementation and may pave the way for us to become a Python core developer. This is one of many articles in Python Internals series - How python implements super long integers? and Python Caches Integers.
Source code to this can be found at arpitbbhayani/cpython.
Other articles that you might like
If you liked what you read, consider subscribing to my weekly newsletter at arpitbhayani.me/newsletter were, once a week, I write an essay about programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.
You can always find me browsing through twitter @arpit_bhayani.