Software is not Additive! by Nathaniel Manista

← Back to 2018 talks

Transcript

(Editor's note: transcripts don't do talks justice. This transcript is useful for searching and reference, but we recommend watching the video rather than reading the transcript alone! For a reader of typical speed, reading this will take 15% less time than watching the video, but you'll miss out on body language and the speaker's slides!)

Hi. Good afternoon. Thank you. Thank you, Gary, for having me. I'm Nathaniel. My lecturing partner, Augie, could not be here with us today. He welcomed a new family member just a couple of weeks ago. So he is at home parenting, but he sends his love.

And we composed this together. So when you hear me say we, it's not me being egotistical, or it's not only me being egotistical. Augie's voices is in here too.

We want to tell you that software is not additive. Or at least, it's not additive in the ways that we often kind of assume and sort of let ourselves be led into thinking of. So how many of you have ever used this metaphor to describe the practice of software creation? OK, how many of you have ever used puzzle pieces, or seen puzzle pieces used?

These are pretty good metaphors, because the heart of software creation is about assembling complex aggregates from simpler elements so they capture that part of it right. Where puzzle pieces and where building blocks go wrong is that if I place one piece on top of a block, I can't place another piece in exactly the same place. But with software, if I expose a function, anybody can call that function, no matter who else is calling it.

So if you want to stick with blocks and you know physics, think of them being made out of bosons rather than fermions, and they can just pile right on up. How many of you remember the Curry-Howard isomorphism from school? OK. How many of you remember the Curry-Howard isomorphism from someplace other than school? OK. It means every program is some mathematical proof of a theorem. Probably not a very interesting theorem, but all the same. programs are mathematical theorems, and any theorem can be depended upon by arbitrarily many other theorems.

So there's this big pressure of metaphors. There's this big pressure of mindset, saying that software is additive. Saying that the whole is the sum of its parts, and that if you start off the day with a software system that does A and a software system that does B, you can have a software system that does A plus B by lunchtime. That's not always true.

Now, sometimes when we authors systems, we deliberately make subtractive decisions. We deliberately hold back. Probably because we're empathizing with the user some way, probably because we're trying to look forward in some way. Sometimes we accidentally subtract from the systems that we're offering.

So let's dive into a couple of examples. The first is Mercurial. Augie's been involved with it since it started more or less, and it started following an incident in the Linux kernel community where the Linux kernel was being developed. They had use of a source control system. And kind of overnight, in a story with a lot of gossip and intrigue that we won't get in to, they lost access to their source control system. So they needed a new one-- they needed one fast. And both Git and Mercurial came out of this crisis, literally within the same period of days.

Git and Mercurial are similar on the outside, because they were targeted to the same audience and they were intended to serve the same purpose. But they're very different internally. Mercurial offers a data structure called the revlog. The revlog is an append only array of revisions. It's got a few other nice properties internally-- it guarantees constant time disk seeks for any revision in the repository. But the important thing about it is that it's an append only array of revisions, and the thing about an append only array of revisions is that you get sequential indexing. You get revision numbers.

So if you're working with Mercurial-- mercurial users in the audience? If you're working with mercurial you see sequential numbers for your revisions. And they're local only, so if you and I have been collaborating my revision 13 isn't the same as your revision 13. But most of what I do when I'm developing isn't collaborative. It isn't social. It isn't pushing

I do a lot of my work solo. And these revision numbers are super helpful. They're friendly to users. And that's why they got to be so popular, so widely used. It's so much nicer working with sequential, time-ordered, numerical integers than hexadecimal numbers, that it turns out they're friendly to developers too. And the Mercurial developers in the intermediate layers of Mercurial took advantage of these too.

Well, now it's 10 years later. Mercurial and Git had their adventure. Mercurial is still around. It's being used in even more ambitious places. It's time for new storage in a few large organizations that are using it. They want to implement storage on top of modern storage systems. By which we mean, distributed hash tables, and replicated key value stores, and other places where sequential integers are not great.

Well, it turns out that Mercurial is having a hard time being Mercurial without these sequential index numbers. So even in the course of implementing these new storage layers, the developers are having to kind of awkwardly, kind of hackily try to bolt on a store of sequential integers for the revisions at a cost of a certain scalability-- not too much. But it's tickling everybody's sense of, oh, we wish we hadn't done this. We wish we didn't have to do this now.

Our second example comes from gRPC. And Grpc Python in particular. GRPC users? Yay, thank you.

We started gRPC in 2014, because we wanted to offer to the world an RPC implementation and protocol as powerful as the one that had been used inside Google for about a decade up to that point. The predecessor system had been very popular, and everybody who wanted it. Everybody who enjoyed using it, said this is so great, we should open source this.

And they tried-- and they tried repeatedly. And it was just too intertwined with Google's proprietary bits, that they gave up, and said, OK, it's finally time, let's start over in the open from scratch. So gRPC is the spiritual open sourcing of that earlier system with some extra lessons learned in the intervening years like HTTP/2. In gRPC, the implementation is implemented like this-- there's a full Java stack, there's a full Go stack, and these seven other languages wrap a common core. This has been pretty good for us, because there have been bugs that we've only implemented three times rather than nine.

[LAUGHTER]

So these language wrappings, they're not quite transliterations. They're not completely logic-free, because it's always a bad idea to directly transliterate an API from one language to another. They contain adaptations to the local conventions of the language.

And so for the Python wrapping-- the one on which I've worked-- we wanted the Python API to be Pythonic. So there are context managers for example but you get to use. We also considered duck typing to be something that was very desirable for a Python system. We considered it Pythonic. We considered it a design goal.

We wanted code elements in the API to be substitutable by applications for whatever purposes they might want-- things like transports, things like credentials. And the core that contains our logic does offer multiple kinds of transports, multiple kinds of credentials, so we thought we had everything going for us that we needed. And we set about implementing, and we worked for a few months.

And right around the time that we were hanging the front door on gRPC Python, we noticed that we missed this polymorphism-- we missed this duck typing. And the reason why was that the core interface was shaped like this. This is the way C or C++ code looks when the system presenting this API is reserving for itself how something is implemented. So this looks like a type definition. But according to the way I use type theory it's not a type definition, because it doesn't tell me what do I need to implement in order to implement the type-- what sort of behaviors, methods, attributes, what do I need to do to provide an implementation of the type?

This only says there is an implementation of the type. And this kind of tied our hands in a very specific way, so that at the Python layer our API presented to applications contains a class like this. And this class goes on to say, you can't substitute it. It's not duck typable. It's a thing that we pass to you-- you can't swap it out for one of your own. You can't modify it.

So we really miss our goal in that sense. And I don't think anybody intended for this to happen. We didn't have a meeting where we sat down and we said, Python wants this and is not going to get it because of these choices in core. And we're all going to sign off on it, and that's great. I think it was something that just sort of slipped by-- that the core system, the subsystem, retained a flexibility. It retained an additivity and a recombinatory reality for itself that it did not expose to its users in its API.

So what we think these examples have in common is a certain lack of foresight. Both of these things happened very early in the projects. So when you're starting a project, reflect on what in your designs ties your hands. Spend 15 minutes trying to look forward. Nobody's perfect-- you're going to make some mistakes, but be more diligent, be cooler under pressure, just be better. That's really useless. You can't walk out of this room, and go back to work on Wednesday, and put that to work.

OK, let's find something that's a little more actionable. Let's look for another similarity between these examples. We think there's a mismatch between the implementation and the API. With Mercurial, a part of the implementation was inadvertently exposed, and it turned out badly. The RPC, a part of the implementation of a subsystem, was not exposed, and it turned out badly. So this sort of suggests to us that you want to consider the implementation of your system and its future changes and evolution separately from the API of your system and its future changes and its evolution.

Pretty deliberately, you not only want to consider them separately, but as an aid to considering them separately, just separate them in physical space. This is what we've done in gRPC Python. Our API is in one file-- that init.py file. Our implementation is in channel.py and server.py a few other files. And it's a very clean separation-- it works.

This is a good thing to do [AUDIO OUT] write some glue code. I hope you're not afraid of that. We think that professional software development is not code golf. We think that it is not necessarily true that the best possible expression of a system is the shortest expression of that system. If you disagree, we are fascinated to learn how you're working in this industry.

[LAUGHTER]

And we don't want to maintain your code.

There are going to be some abstraction consequences if you pull your API and your implementation apart. It will be harder to expose in your API concrete classes with encapsulated behavior. We think that's a great thing. We've been advocating for that for years. So that's [AUDIO OUT] a virtue.

By contrast, sometimes I see code reviews, sometimes I see libraries, where a class is included in the API, and it inherits from five different things, and it's got 20 methods. And I know the general target audience of the library, and I can't think of any reason that the classes there, except the author is just so preciously proud of having implemented such a complicated class. And in one of those examples, the library had to be abandoned after a few years, because it just couldn't change in response to changing circumstances without breaking some part of its audience.

So if you make this partition, you're now set up to not forget to consider your API separately. You now have a structural impediment that you will hit every time. And you'll say to yourself, OK, I made this change for my implementation purposes, do I wish to support that in my API? That's a separate question. I want to entertain it separately. It makes it easier to be stubborn about changes to your API.

So my colleagues know that if they send me a change that only affects the implementation of our system, I'll look at it, I'll tweak a few things. And then when it looks about right, I'll say, that's great, integrate it, push it, merge it, release it. If they send me a change that affects the API, I'll look at it, I'll tweak a few things. And then when that looks good, I'll just sit on it for a few days, let it roll around for a little while, consider other options, consider is this a commitment we want to make going forward?

If you separate your API from your implementation, you have a better opportunity to catch including code elements of a dependency in your API. So let's imagine that you have a method in your API that returns some value, and you have that method implemented in your implementation. And the particular value that your implementation gets comes from some volatile library-- some volatile dependency that was released not too long ago by some guy who doesn't really have a profile picture. There's not really a good release history, the bug tracker is kind of empty, and the function works for you, but should you export that type defined by that dependency in your API? Probably not.

You can add a little bit of glue. You can add a little bit of abstraction. You can set up your API to protect itself and its consumers-- your ultimate users-- from having to depend on that volatile library that might disappear tomorrow, or next week, or [? left ?] [? pad. ?]

[LAUGHTER]

Not every dependency needs that kind of protection. It's theoretically possible that you could write an API that uses only the languages built-in data types like integers and types that you defined in your systems. That's probably overkill. Use the standard library, of course-- use established third party packages. But don't include in your API things that might disappear tomorrow-- things that might disappear earlier than the projected life of your system.

All right, who's a fan of white box tests? Who's a fan of black box tests? All right both my hands are up, because I'm a fan of both. If you have separated your API from your implementation, you get to have tests of both, and the tests of your API ARE your black, tests because they just hit your API. The tests that hit your implementation are your white box tests because they get to know about your implementation. Since the behavior of your implementation is a superset of the behavior that your API promises to support, there's probably a lot of overlap, and you should use code sharing mechanisms like parametrized tests and so forth so that you don't duplicate your tests. But you do get to have both.

Once you perform the separation, or know even about the separation, you can do API-driven development. You can author an API, and you can author black box tests that use that API, that cover critical user journeys, things that you really want users of your library or your system to be able to do and for your API to lend a natural shape to them. To lend a certain shape to them to ensure that achieving these things with your APIs comfortable.

And if you've written your tests and you've written your API, that's enough to compile. You don't actually have to implement your API. Alternatively, if you've written your tests and you've written your API, and then you script out the interactions of your API, you can even run those tests. You can run those tests and you can discover, oh, wait, this function it doesn't give the user code the story that I want the user code to have you can fix that.

This is analogous to in the design space and the user interaction space, when they will mock up a user interface, and they'll script out interactions behind it, and they'll have real users come in and sit and use it, and discover that half of it's unusable. And they save themselves the work of implementing that half.

All right. So I've talked a little bit about glue code. I've talked about the work of separation. I've talked about the work of abstracting certain volatile dependencies. I've talked about work.

What about if you don't need this? What if you're just doing your own thing? It's a project that you just started-- you kicked it off. You've accomplished something in a matter of hours or days, and you put it up on your favorite source control site-- which, I'm guessing is Bitbucket-- and it kind of took on a life of its own. People started using it. People started filing issues. People started submitting patches.

It's growing into a real library. Is it too late? It's not too late. This separation is something you can do after the fact. The fact that your implementation is out there, it presents its own implicit API in its exposed raw code elements-- you can hang a new API on the front of it. And you have to go through a deprecation cycle. You have to tell your users, OK, everybody, there's this new set of code elements that are going to be actually supported ones now that we're a grown-up project. But you can do that.

Well, there's nothing particularly special about placing one API on the front of a system. You can just place another. And so something we've done in gRPC Python-- we've done it, I would say, 3/4 of the way, but had we known this at the beginning we could have done a cleaner job of it-- you can offer one API that's the principal API. It contains all the code elements that beginners need to get up and running, it contains all the code elements that 90% of users are going to need. And you can say, your library is useful, because this principal API is supported in a formal way. We're not going to make breaking changes except at major version numbers. We're going to support things on the order of years.

Over here there's the expert API. It contains a lot more code elements, a little more sophisticated things, a lot more control knobs. And because the people who use that API are self-declared experts, you can break them with every revision. They're grown-ups, they can handle it.

[LAUGHTER]

Something that in the last year or so has kind of been blowing in the wind, it's kind of hit me from a few different directions-- too many times to be coincidence, not so many times that it's yet a trend-- is the importance of when you make a decision write it down in software engineering. It's always been true-- for some reason it's coming into fashion now. So I can think of product decisions that I've seen made that have gone badly because it was not written down what was being decided and who agreed to it.

A friend of ours was telling us about life at the startup and how that's different from the life we've known at Google over the last several years. He said his most frequent question was, where is this written down. And it was never his first question, but it was always his second question. He would discover something that needed an answer, he'd go talk to one of his colleagues. He'd say, that's a great answer, where is it written down? And it never was.

How many of you have gotten badgered about writing a longer commit log message in your Git commits, Mercurial commits? How many of you have gotten in trouble for writing hacking, or stuff, or at least gotten a hard time for It? Applied to source control, the wisdom is, your commit log message should contain the what of your commit. Which seems redundant, because the code change contains the what. But you should repeat that-- it is redundant-- in describing the what in your commit message. And you should also describe the why.

And of course, the who in a commit log message is you, plus whoever in your code review system signs off on your code change. I think this emphasis has to do with distinguishing between what's essential about your system and what's incidental. You don't put in your commit log message-- well, I chose greater than or equal to, rather than less than with operands refused. That's something that's incidental to your system, and has to do with how you felt that day, how you happened to be coding that day, what mood you were in.

If you have separated your API from your implementation, you get this distinction for free. The division falls along the same lines. Your API contains what is essential about your system-- what is necessary about your system. Your implementation just contains what happens to be. Thank you.

[APPLAUSE]