The Go standard library provides io.Reader, a useful interface to consume data from various sources, which includes a single function: Read.

However for various use cases, one may need more features, such as buffering, or the ability to consume data with more control than a fixed count of bytes. bufio.Reader is regularly recommended for that. It provides buffering, peeking, and the ability to read until a single byte acting as separator.

But in practice, the bufio reader falls short on several points:

  • The basic features are simply not enough. For example there is no way to read until a multi-byte separator.
  • The internal buffer has a limited size.

The second point is quite annoying: for example, reading up to a separator with ReadSlice will fail with ErrBufferFull if the separator was not found after filling the buffer. The caller has to manually copy what was returned, call ReadSlice again, and concatenate the parts until it finds the separator. And if it consumes the entire data source without finding the separator, there is no way to unread everything.

This mechanism has the advantage of guaranteeing full control on memory usage for the caller. It is also very inconvenient for a lot of use cases.

I just published a small package, stream, on Github which tries to address these issues. Stream is nothing more than a buffered reader with a set of convenient functions: it’s very easy to skip, peek and read on various criteria, and the internal buffer will grow as required.

Of course, when the stream is layered on a data source whose total size might cause memory issue, the caller should use a limiting reader between the stream and the source.

For a simple example, let’s see how one may parse what looks like a HTTP header:

s := stream.NewStreamBytes([]byte("Content-Length: 42\r\n"))
s.ReadUntilByteAndSkip(':') // yields []byte("Content-Length")
s.SkipWhile(func(b byte) bool {
	return b == ' ' || b == '\t'
})
s.ReadUntilAndSkip([]byte{'\r', '\n'}) // yields []byte("42")

I’m currently not entirely sure about both the API and some conventions (i.e. should peek functions return freshly allocated data, or should they point to the internal buffer ?), but these abstractions were already useful to me when writing my mbox parser.