Information propagation

Issue with iterators

Iterators are amazing. They allow one to write über efficient code that has little memory overhead and good cache locality. As Shakespeare said though, “all that glitters is not gold”. Iterators, very much like async code, can be viral. What I mean by that is one will often want to chain iterators together from start to finish. An iterator that is based on another iterator though will often be written such that the inner iterator iterates precisely the type the outer iterator needs. A problem with this though is that only the most trivial code is free from errors. An iterator that is expected to be passed to another iterator is left holding errors that it encounters. What is it to do with those errors? Often times if the quantity of errors doesn’t exceed a certain threshold, the iterator will want to resume iteration and log the errors. Here is a quick example in Rust:

struct Foo<I> {
    x: I,
}
struct Bar;
impl<I> Iterator for Foo<I>
where
    I: Iterator<Item = Bar>,
{
    type Item = Result<(), ()>;
    fn next(&mut self) -> Option<Self::Item> {
        self.x.next().map_or_else(|| None, |_| Some(Ok(())))
    }
}
struct Fizz;
impl Iterator for Fizz {
    type Item = ();
    fn next(&mut self) -> Option<Self::Item> {
        // Do some stuff that can surely error.
        // Fizz cannot propagate the error since Foo<I> requires I to only
        // iterate Bars.
        Some(())
    }
}

This problem exists for all the iterators in the chain. If the outer iterator were more altruistic, it would design its inner iterator to iterate results instead. This way any time the inner iterator encounters an error it would be able to propagate that error to the outer iterator. As long as all the iterators in the chain were designed this way, then only the terminating iterator would have to do the work of logging those errors.

Errors are but one kind of “information”. Instead of hard coding a specific type for a successful result, the outer iterator can bind the successful result type to something with the “structure” that it expects. This gives the inner iterator the ability to propagate information the outer iterator does not care about via types.

#![feature(try_trait_v2)]
use core::ops::{ControlFlow::Break, Try};
struct Foo<I> {
    x: I,
}
enum Buzz<E> {
    Inner(E),
}
trait Bar {}
impl<I> Iterator for Foo<I>
where
    I: Iterator,
    I::Item: Try,
    <I::Item as Try>::Output: Bar,
{
    type Item = Result<(), Buzz<I::Item as Try>::Residual>;
    fn next(&mut self) -> Option<Self::Item> {
        self.x.next().map_or_else(
            || None,
            |t| {
                if let Break(e) = t.branch() {
                    Some(Err(Buzz::Inner(e)))
                } else {
                    Some(Ok(()))
                }
            },
        )
    }
}
struct Fizz;
impl Bar for () {}
impl Iterator for Fizz {
    type Item = Result<(), ()>;
    fn next(&mut self) -> Option<Self::Item> {
        // Fizz can use any Bar-like type it wants for a successful
        // result as well as propagate any errors since the bound on I
        // in Foo allows for such.
        Some(Ok(()))
    }
}

This was a strategy that I employed frequently at my last job. Almost anything I did was based on data that originated in a database. That data would be transformed multiple times until eventually the end result was written back to the database. This, along with automatically handling several kinds of database-related errors, was the impetus behind the “bulk writers” that I wrote—which itself was the reason for the SQLServer library.