This article explores some concepts of iteration in Python – specifically, the concepts of ‘iterables’, ‘generators’ and the keyword ‘yield’.
Iterables are objects which return an iterator (sort of a counting index). You can count elements in an iterable, and traverse through them using an iterator like a for loop. Commonly known Iterables are collection objects such as lists, dictionaries, sets, etc.
However, any object can become an iterable as long as it supports two methods, which form the ‘iterator protocol’:
- __iter__() : This returns the object itself, and is needed to allow the use of the iterable in a ‘for…in’ statement.
- __next__() : This returns the next element in the container. If there aren’t any elements left, this method raises the ‘StopIteration()’ exception.
Let’s try making our very own iterable. This iterable will be an object of class ‘SquareIndex’. We’ve defined the iterator protocol methods, along with an initializing function inside the class.
To iterate over the iterable, we’ve used a for loop.
To confirm that our object ‘s’ of SquareIndex really is an Iterable, we’ve imported abstract subclass ‘Iterable’ from the abstract class ‘collections’. On testing whether SquareIndex is a subclass of abstract class ‘Iterable’ – the result turns out to be true – meaning SquareIndex is now an officially qualified iterable class, whose objects are iterables.
Here are diagrams to demonstrate the control flow in the methods of an iterable object.
That was a lot of work, right? We had to create a class which overrode methods __iter__() and __next__() and raise a StopIteration()to indicate that the iteration had no further return values.
What if we could replace the class containing __iter__() and __next__() with a single method that returns an iterator (to fulfil the __iter__()’s duty) and uses its own for loop (to fulfil the __next__()’s duty) ?
Enter the Generator function.
It’s a regular function which uses the keyword ‘yield’ instead of ‘return’ to return a value. The only difference is that, unlike ‘return’ – ‘yield’ doesn’t terminate the function when it returns the value. It only pauses the function temporarily while the returning occurs. Once the __next()__ of the iterator calling the function with ‘yield’ is run, the function containing the yield resumes to further lines.
The generator function acts as a collection which can be iterated over exactly once. They use less memory than normal collections – and are thus suitable for Big Data applications.
- The reason behind the lower memory use is that ordinary collections store all the elements in the memory at the same time – while generators create, iterate and then destroy the elements.
- It’s like drawing a hundred circles on a piece of paper – vs. drawing a circle, then erasing it and re-drawing a circle on the same spot again a hundred times.
- This comparison might also clear up why you can only iterate over the elements in a generator function once (at the end, you’d be left with a blank piece of paper)
- Also – generators can’t be resumed while actively running. The generator pauses itself using the ‘yield’ keyword, and only the calling iterator can resume it with its ‘next’ command. Resuming while the generator is actively running is like drawing a circle, and then drawing over it again without erasing it first.
‘Generator expressions’ are a form of generators which are an alternative to list comprehensions and perform similar functions to a ‘generator function’.
Let’s draw a diagram to illustrate control flow for the following generator function – hightea ()
The ‘Yield’ Keyword
The function of yield is to:
- Freeze the current state of the function
- Return the current value to the method calling __next__() – in the above example, this method was ‘hightea()’
- The ‘yield’ keyword is not allowed in the ‘try’ cause of a try…finally construct; although it can be used in the ‘finally’ clause.
- It can only be used inside functions (and such functions are called generator functions)
There’s an interesting history behind the use of ‘yield’ in Python.
- In order to initially incorporate it into Python version 2.2, the line: from__future__import generators had to be used at the top of a code using ‘yield’, without which a warning would be triggered.
- When it was first introduced, questions were raised as to why there couldn’t be a new built-in function/syntax change in the place of incorporating a new keyword into the language.
- The reason for introducing a keyword rather than a function was that control flow is better depicted by keywords; and ‘yield’ is a control construct.
- Syntax changes along the lines of replacing ‘yield’ with ‘return and continue’ or ‘return generating’ were suggested; however, it was decided that it was better to clearly specify the action in one keyword rather than have to deduce it from a combination of pre-existing ones.
Hopefully, this post should help make the difference between Iterables, Generators and the keyword “yield”.