In this post I demonstrate an effective way to create iterators and generators in PHP and provide an example of a scenario in which using them makes sense.
Generators have been around since PHP 5.5, and iterators have been around since the Planck epoch. Even so, a lot of PHP developers do not know how to use them well and cannot recognize situations in which they are helpful. In this blog post I share insights I have gained over the years, that when sharing, always got an interested response from colleague developers. The post goes beyond the basics, provides a real world example, and includes a few tips and tricks. To not leave out those unfamiliar with Iterators the post starts with the "What are Iterators" section, which you can safely skip if you can already answer that question.
What are Iterators
PHP has an Iterator
interface that you can implement to represent a collection. You can loop over an instance of an Iterator
just like you can loop over an array:
function doStuff(Iterator $things) {
foreach ($things as $thing) { /* ... */ }
}
Why would you bother implementing an Iterator
subclass rather than just using an array? Let's look at an example.
Imagine you have a directory with a bunch of text files. One of the files contains an ASCII NyanCat (~=[,,_,,]:3
). It is the task of our code to find which file the NyanCat is hiding in.
We can get all the files by doing a glob( $path . '*.txt' )
and we can get the contents for a file with a file_get_contents
. We could just have a foreach going over the glob result that does the file_get_contents
. Luckily we realize this would violate separation of concerns and make the "does this file contain NyanCat" logic hard to test since it will be bound to the filesystem access code. Hence we create a function that gets the contents of the files, and ones with our logic in it:
function getContentsOfTextFiles(): array {
// glob and file_get_contents
}
function findTextWithNyanCat(array $texts) {
foreach ($texts as $text) { if ( /* ... */ ) { /* ... */ } }
}
function findNyanCat() {
findTextWithNyanCat(getContentsOfTextFiles());
}
While this approach is decoupled, a big drawback is that now we need to fetch the contents of all files and keep all of that in memory before we even start executing any of our logic. If NyanCat is hiding in the first file, we'll have fetched the contents of all others for nothing. We can avoid this by using an Iterator
, as they can fetch their values on demand: they are lazy.
class TextFileIterator implements Iterator {
/* ... */
public function current() {
// return file_get_contents
}
/* ... */
}
function findTextWithNyanCat(Iterator $texts) {
foreach ($texts as $text) { if ( /* ... */ ) { /* ... */ } }
}
function findNyanCat() {
findTextWithNyanCat(new TextFileIterator());
}
Our TextFileIterator
gives us a nice place to put all the filesystem code, while to the outside just looking like a collection of texts. The function housing our logic, findTextWithNyanCat
, does not know that the text comes from the filesystem. This means that if you decide to get texts from the database, you could just create a new DatabaseTextBlobIterator and pass it to the logic function without making any changes to the latter. Similarly, when testing the logic function, you can give it an ArrayIterator
.
function testFindTextWithNyanCat() {
/* ... */
findTextWithNyanCat(new ArrayIterator(['test text', '~=[,,_,,]:3']));
/* ... */
}
I wrote more about basic Iterator
functionality in Lazy iterators in PHP and Python and Some fun with iterators. I also blogged about a library that provides some (Wikidata specific) iterators and a CLI tool build around an Iterator. For more on how generators work, see the off-site post Generators in PHP.
PHP's collection type hierarchy
Let's start by looking at PHP's type hierarchy for collections as of PHP 7.1. These are the core types that I think are most important:
At the very top we have iterable
, the supertype of both array
and Traversable
. If you are not familiar with this type or are using a version of PHP older than 7.1, don't worry, we don't need it for the rest of this blog post.
Iterator
is the subtype of Traversable
, and the same goes for IteratorAggregate
. The standard library iterator_
functions such as iterator_to_array
all take a Traversable
. This is important since it means you can give them an IteratorAggregate
, even though it is not an Iterator
. Later on in this post we'll get back to what exactly an IteratorAggregate
is and why it is useful.
Finally we have Generator
, which is a subtype of Iterator
. That means all functions that accept an Iterator
can be given a Generator
, and, by extension, that you can use generators in combination with the Iterator classes in the Standard PHP Library such as LimitIterator
and CachingIterator
.
IteratorAggregate + Generator = <3
Generators are a nice and easy way to create iterators. Often you'll only loop over them once, and not have any problem. However beware that generators create iterators that are not rewindable, which means that if you loop over them more than once, you'll get an exception.
Imagine the scenario where you pass in a generator to a service that accepts an instance of Traversable
:
$aGenerator = function() { /* ... yield ... */ };
$aService->doStuff($aGenerator());</pre>
<pre class="lang:php decode:true">public function doStuff(Traversable $things) {
foreach ($things as $thing) { /* ... */ }
}
The service class in which doStuff
resides does not know it is getting a Generator
, it just knows it is getting a Traversable
. When working on this class, it is entirely reasonable to iterate though $things
a second time.
public function doStuff(Traversable $things) {
foreach ($things as $thing) { /* ... */ }
foreach ($things as $thing) { /* ... */ } // Boom if Generator!
}
This blows up if the provided $things
is a Generator
, because generators are non-rewindable. Note that it does not matter how you iterate through the value. Calling iterator_to_array
with $things
has the exact same result as using it in a foreach loop. Most, if not all, generators I have written, do not use resources or state that inherently prevents them from being rewindable. So the double-iteration issue can be unexpected and seemingly silly.
There is a simple and easy way to get around it though. This is where IteratorAggregate
comes in. Classes implementing IteratorAggregate
must implement the getIterator()
method, which returns a Traversable
. Creating one of these is extremely trivial:
class AwesomeWords implements \IteratorAggregate {
public function getIterator() {
yield 'So';
yield 'Much';
yield 'Such';
}
}
If you call getIterator
, you'll get a Generator
instance, just like you'd expect. However, normally you never call this method. Instead you use the IteratorAggregate
just as if it was an Iterator
, by passing it to code that expects a Traversable
. (This is also why usually you want to accept Traversable
and not just Iterator
.) We can now call our service that loops over the $things
twice without any problem:
$aService->doStuff(new AwesomeWords()); // no boom!
By using IteratorAggregate
we did not just solve the non-rewindable problem, we also found a good way to share our code. Sometimes it makes sense to use the code of a Generator
in multiple classes, and sometimes it makes sense to have dedicated tests for the Generator
. In both cases having a dedicated class and file to put it in is very helpful, and a lot nicer than exposing the generator via some public static function.
For cases where it does not make sense to share a Generator
and you want to keep it entirely private, you might need to deal with the non-rewindable problem. For those cases you can use my Rewindable Generator library, which allows making your generators rewindable by wrapping their creation function:
$aGenerator = function() { /* ... yield ... */ };
$aService->doStuff(new RewindableGenerator($aGenerator));
A real-world example
A few months ago I refactored some code part of the Wikimedia Deutschland fundraising codebase. This code gets the filesystem paths of email templates by looking in a set of specified directories.
private function getMailTemplatesOnDisk( array $mailTemplatePaths ): array {
$mailTemplatesOnDisk = [];
foreach ( $mailTemplatePaths as $path ) {
$mailFilesInFolder = glob( $path . '/Mail_*' );
array_walk( $mailFilesInFolder, function( & $filename ) {
$filename = basename( $filename ); // this would cause problems w/ mail templates in sub-folders
} );
$mailTemplatesOnDisk = array_merge( $mailTemplatesOnDisk, $mailFilesInFolder );
}
return $mailTemplatesOnDisk;
}
This code made the class bound to the filesystem, which made it hard to test. In fact, this code was not tested. Furthermore, this code irked me, since I like code to be on the functional side. The array_walk
mutates its by-reference variable and the assignment at the end of the loop mutates the return variable.
This was refactored using the awesome IteratorAggregate
+ Generator
combo:
class MailTemplateFilenameTraversable implements \IteratorAggregate {
public function __construct( array $mailTemplatePaths ) {
$this->mailTemplatePaths = $mailTemplatePaths;
}
public function getIterator() {
foreach ( $this->mailTemplatePaths as $path ) {
foreach ( glob( $path . '/Mail_*' ) as $fileName ) {
yield basename( $fileName );
}
}
}
}
Much easier to read/understand code, no state mutation whatsoever, good separation of concerns, easier testing and reusability of this collection building code elsewhere.
See also: Use cases for PHP generators (off-site post).
Tips and Tricks
Generators can yield key value pairs:
yield "Iterators" => "are useful";
yield "Generators" => "are awesome";
// [ "Iterators" => "are useful", "Generators" => "are awesome" ]
You can use yield in PHPUnit data providers. You can yield from an iterable
.
yield from [1, 2, 3];
yield from new ArrayIterator([4, 5]);
// 1, 2, 3, 4, 5
// Flattens iterable[] into Generator
foreach ($collections as $collection) {
yield from $collection;
}
Thanks for Leszek Manicki and Jan Dittrich for reviewing this blog post.
Originally posted on my blog as Introduction to Iterators and Generators in PHP.