Iterators and Generators in PHP

Jeroen De Dauw - Oct 18 '17 - - Dev Community

In this post I demonstrate an effective way to create iterators and generators in PHP and provide an example of a scenario in which using them makes sense.

Generators have been around since PHP 5.5, and iterators have been around since the Planck epoch. Even so, a lot of PHP developers do not know how to use them well and cannot recognize situations in which they are helpful. In this blog post I share insights I have gained over the years, that when sharing, always got an interested response from colleague developers. The post goes beyond the basics, provides a real world example, and includes a few tips and tricks. To not leave out those unfamiliar with Iterators the post starts with the "What are Iterators" section, which you can safely skip if you can already answer that question.

What are Iterators

PHP has an Iterator interface that you can implement to represent a collection. You can loop over an instance of an Iterator just like you can loop over an array:

function doStuff(Iterator $things) {
    foreach ($things as $thing) { /* ... */ }
}
Enter fullscreen mode Exit fullscreen mode

Why would you bother implementing an Iterator subclass rather than just using an array? Let's look at an example.

Imagine you have a directory with a bunch of text files. One of the files contains an ASCII NyanCat (~=[,,_,,]:3). It is the task of our code to find which file the NyanCat is hiding in.

We can get all the files by doing a glob( $path . '*.txt' ) and we can get the contents for a file with a file_get_contents. We could just have a foreach going over the glob result that does the file_get_contents. Luckily we realize this would violate separation of concerns and make the "does this file contain NyanCat" logic hard to test since it will be bound to the filesystem access code. Hence we create a function that gets the contents of the files, and ones with our logic in it:

function getContentsOfTextFiles(): array {
    // glob and file_get_contents
}

function findTextWithNyanCat(array $texts) {
    foreach ($texts as $text) { if ( /* ... */ ) { /* ... */ } }
}

function findNyanCat() {
    findTextWithNyanCat(getContentsOfTextFiles());
}
Enter fullscreen mode Exit fullscreen mode

While this approach is decoupled, a big drawback is that now we need to fetch the contents of all files and keep all of that in memory before we even start executing any of our logic. If NyanCat is hiding in the first file, we'll have fetched the contents of all others for nothing. We can avoid this by using an Iterator, as they can fetch their values on demand: they are lazy.

class TextFileIterator implements Iterator {
    /* ... */
    public function current() {
        // return file_get_contents
    }
    /* ... */
}

function findTextWithNyanCat(Iterator $texts) {
    foreach ($texts as $text) { if ( /* ... */ ) { /* ... */ } }
}

function findNyanCat() {
    findTextWithNyanCat(new TextFileIterator());
}
Enter fullscreen mode Exit fullscreen mode

Our TextFileIterator gives us a nice place to put all the filesystem code, while to the outside just looking like a collection of texts. The function housing our logic, findTextWithNyanCat, does not know that the text comes from the filesystem. This means that if you decide to get texts from the database, you could just create a new DatabaseTextBlobIterator and pass it to the logic function without making any changes to the latter. Similarly, when testing the logic function, you can give it an ArrayIterator.

function testFindTextWithNyanCat() {
    /* ... */
    findTextWithNyanCat(new ArrayIterator(['test text', '~=[,,_,,]:3']));
    /* ... */
}
Enter fullscreen mode Exit fullscreen mode

I wrote more about basic Iterator functionality in Lazy iterators in PHP and Python and Some fun with iterators. I also blogged about a library that provides some (Wikidata specific) iterators and a CLI tool build around an Iterator. For more on how generators work, see the off-site post Generators in PHP.

PHP's collection type hierarchy

Let's start by looking at PHP's type hierarchy for collections as of PHP 7.1. These are the core types that I think are most important:

PHP iterable types

At the very top we have iterable, the supertype of both array and Traversable. If you are not familiar with this type or are using a version of PHP older than 7.1, don't worry, we don't need it for the rest of this blog post.

Iterator is the subtype of Traversable, and the same goes for IteratorAggregate. The standard library iterator_ functions such as iterator_to_array all take a Traversable. This is important since it means you can give them an IteratorAggregate, even though it is not an Iterator. Later on in this post we'll get back to what exactly an IteratorAggregate is and why it is useful.

Finally we have Generator, which is a subtype of Iterator. That means all functions that accept an Iterator can be given a Generator, and, by extension, that you can use generators in combination with the Iterator classes in the Standard PHP Library such as LimitIterator and CachingIterator.

IteratorAggregate + Generator = <3

Generators are a nice and easy way to create iterators. Often you'll only loop over them once, and not have any problem. However beware that generators create iterators that are not rewindable, which means that if you loop over them more than once, you'll get an exception.

Imagine the scenario where you pass in a generator to a service that accepts an instance of Traversable:

$aGenerator = function() { /* ... yield ... */ };
$aService->doStuff($aGenerator());</pre>

<pre class="lang:php decode:true">public function doStuff(Traversable $things) {
    foreach ($things as $thing) { /* ... */ }
}
Enter fullscreen mode Exit fullscreen mode

The service class in which doStuff resides does not know it is getting a Generator, it just knows it is getting a Traversable. When working on this class, it is entirely reasonable to iterate though $things a second time.

public function doStuff(Traversable $things) {
    foreach ($things as $thing) { /* ... */ }
    foreach ($things as $thing) { /* ... */ } // Boom if Generator!
}
Enter fullscreen mode Exit fullscreen mode

This blows up if the provided $things is a Generator, because generators are non-rewindable. Note that it does not matter how you iterate through the value. Calling iterator_to_array with $things has the exact same result as using it in a foreach loop. Most, if not all, generators I have written, do not use resources or state that inherently prevents them from being rewindable. So the double-iteration issue can be unexpected and seemingly silly.

There is a simple and easy way to get around it though. This is where IteratorAggregate comes in. Classes implementing IteratorAggregate must implement the getIterator() method, which returns a Traversable. Creating one of these is extremely trivial:

class AwesomeWords implements \IteratorAggregate {
    public function getIterator() {
        yield 'So';
        yield 'Much';
        yield 'Such';
    }
}
Enter fullscreen mode Exit fullscreen mode

If you call getIterator, you'll get a Generator instance, just like you'd expect. However, normally you never call this method. Instead you use the IteratorAggregate just as if it was an Iterator, by passing it to code that expects a Traversable. (This is also why usually you want to accept Traversable and not just Iterator.) We can now call our service that loops over the $things twice without any problem:

$aService->doStuff(new AwesomeWords()); // no boom!
Enter fullscreen mode Exit fullscreen mode

By using IteratorAggregate we did not just solve the non-rewindable problem, we also found a good way to share our code. Sometimes it makes sense to use the code of a Generator in multiple classes, and sometimes it makes sense to have dedicated tests for the Generator. In both cases having a dedicated class and file to put it in is very helpful, and a lot nicer than exposing the generator via some public static function.

For cases where it does not make sense to share a Generator and you want to keep it entirely private, you might need to deal with the non-rewindable problem. For those cases you can use my Rewindable Generator library, which allows making your generators rewindable by wrapping their creation function:

$aGenerator = function() { /* ... yield ... */ };
$aService->doStuff(new RewindableGenerator($aGenerator));
Enter fullscreen mode Exit fullscreen mode

A real-world example

A few months ago I refactored some code part of the Wikimedia Deutschland fundraising codebase. This code gets the filesystem paths of email templates by looking in a set of specified directories.

private function getMailTemplatesOnDisk( array $mailTemplatePaths ): array {
    $mailTemplatesOnDisk = [];

    foreach ( $mailTemplatePaths as $path ) {
        $mailFilesInFolder = glob( $path . '/Mail_*' );
        array_walk( $mailFilesInFolder, function( & $filename ) {
            $filename = basename( $filename ); // this would cause problems w/ mail templates in sub-folders
        } );
        $mailTemplatesOnDisk = array_merge( $mailTemplatesOnDisk, $mailFilesInFolder );
    }

    return $mailTemplatesOnDisk;
}
Enter fullscreen mode Exit fullscreen mode

This code made the class bound to the filesystem, which made it hard to test. In fact, this code was not tested. Furthermore, this code irked me, since I like code to be on the functional side. The array_walk mutates its by-reference variable and the assignment at the end of the loop mutates the return variable.

This was refactored using the awesome IteratorAggregate + Generator combo:

class MailTemplateFilenameTraversable implements \IteratorAggregate {
    public function __construct( array $mailTemplatePaths ) {
        $this->mailTemplatePaths = $mailTemplatePaths;
    }

    public function getIterator() {
        foreach ( $this->mailTemplatePaths as $path ) {
            foreach ( glob( $path . '/Mail_*' ) as $fileName ) {
                yield basename( $fileName );
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Much easier to read/understand code, no state mutation whatsoever, good separation of concerns, easier testing and reusability of this collection building code elsewhere.

See also: Use cases for PHP generators (off-site post).

Tips and Tricks

Generators can yield key value pairs:

yield "Iterators" => "are useful";
yield "Generators" => "are awesome";
// [ "Iterators" => "are useful", "Generators" => "are awesome" ]
Enter fullscreen mode Exit fullscreen mode

You can use yield in PHPUnit data providers. You can yield from an iterable.

yield from [1, 2, 3];
yield from new ArrayIterator([4, 5]);
// 1, 2, 3, 4, 5 
Enter fullscreen mode Exit fullscreen mode
// Flattens iterable[] into Generator
foreach ($collections as $collection) {
    yield from $collection;
}
Enter fullscreen mode Exit fullscreen mode

Thanks for Leszek Manicki and Jan Dittrich for reviewing this blog post.

Originally posted on my blog as Introduction to Iterators and Generators in PHP.

. . . . . . .