What We Can Learn From Software History by Hillel Wayne

← Back to 2019 talks

Transcript

(Editor's note: transcripts don't do talks justice. This transcript is useful for searching and reference, but we recommend watching the video rather than reading the transcript alone! For a reader of typical speed, reading this will take 15% less time than watching the video, but you'll miss out on body language and the speaker's slides!)

[APPLAUSE] Can everybody hear me?

Yes.

Great. OK. So first of all, thank you all for coming to my talk, SOLID is Solid, enterprise pencils in OOP architecture. We're going to be talking about why SOLID is super important for OOP because it's very important. So there's basically five principles of SOLID. There's a single responsibility-- all classes should have one responsibility; open-closed, saying that all classes should be open and/or closed; we have this Liskov substitution principle, which I don't know what that even means; interface segregation principle, all individually segregated; and dependency inversion, you have to flip your dependencies upside-down.

I want to go into detail about all of this, but first, quick show of hands. Who here knows what SOLID is? Oh, that's most of you. Don't need to do that one anymore. It's OK, it's OK, it's OK, I've got a backup in here. I just need to find it real quick. Not that.

[LAUGHTER]

Yeah, I think it'll still work. OK. So who knows why SOLID is so important? Or more specifically, why SOLID and not something else? Sure, there's good ideas there, but there's good ideas everywhere, right? Why did SOLID win out and not, say, OMT or Booch Method or Design by Contract? Why SOLID?

It turns out there is a very, very, very specific reason-- this guy. Robert C Martin is one of the biggest modern influencers in tech. If you've heard of things like Agile or TDD or clean Code or software craftsmanship, these are all ideas he either pioneered or invented. He is very controversial, I will admit that, but you cannot deny his influence.

In 1995, he read a post on a forum, newsgroup, whatever called, "What are the 10 commandments of OOP?" People were giving very different, often contradictory answers to this. He listed his 11. He went-- off by one error. And then divided them into three groups. Five on design, three on packaging, three on systems.

Later, Michael Feathers realized that the first five can be called SOLID based on the initial first letters. And this he pushed-- sorry, Robert Martin pushed in his many, many, many popular bestsellers. And that's it. That's the reason SOLID is everywhere. One person in the right place at the right time with the right audience. And so the industry zigs instead of zags, and what one person believes becomes common wisdom.

We tend to think of history as something in the past, something that ended, a story that's over. A thing we can pick up and examine like an insect trapped in amber. But that's not how the world works. The past flows eternally into the present. History happens and keeps happening. And only by understanding how the world was can we understand where we are now.

My name is Hillel Wayne, and this is a talk on software history. What makes it so valuable and how we can use it to better ourselves as engineers. One thing before I start. I have a lot of sources in this talk. I don't want you to blindly believe what I say. I want you to be able to do the same research yourself. You can go here to see all of the resources on this talk as well as the slides, and if anybody asks me questions after, I'm going to write them down and put them up there, too.

So the past becomes the present infinitely. Everything we do is built upon layer upon layer of abstractions. Some of these abstractions, some of these decisions are years, decades, even centuries old. And they were often decided by people with given constraints and given problems. They did the best solution they could think of at the time. This is normal. We do the same. This is how engineering works.

But their choices, their decisions live with us. But most often, their reasons do not. We do not have the same context that they do that the world has changed. Our companies change, everything changes. We are now faced with what is without knowing why is. So what do we do? We make things up.

This happens all the time. We have something that we'd want to understand about our current world, and we come up with some sort of post hoc justification to explain why it's the way it is. A justification that's quite often wrong, and wrong in ways that make us worse off that are the opposite of the original reasons that make our products worse, our paradigms worse, our process worse.

To give even one small example, I mean, look at SOLID. There are great ideas there, I'm going to say that, but if you basically look online, people essentially treat as a stark division. OOP and SOLID are the same thing. If you do OOP, you must do SOLID. If you hate SOLID, you must also hate OOP. But they're not the same thing. They only seem like the same thing because historically, there was one very charismatic, very influential individual who supported that. Without that understanding of the context of the history, we miss that out. We miss out that it's just a decision.

There's a lot of power in history, but that's not what I want to convince you of. What I want to do today is show you how it can benefit you. How you can study history to learn more about the work you do, to learn more about the decisions made, to learn more about why you do things. And I honestly think the best way to do that is to show you what it's like to take a question and walk you through the process of researching it. To show you how even the smallest things have a rich history behind them that shapes our world today.

So show of hands, who here has done a linked list question on an interview, like reverse a linked list? Yeah. I see that's a lot of you. Now lower your hand-- keep your hands raised-- lower your hand if you have actually used a linked list on your job. Now-- OK, still see a few hands. Now keep that-- now lower your hand if you have used that linked list on a job and you are not a low level systems programmer. Well, a couple of people. That's kind of cool.

Right. So there's actually some like-- so somebody just in the audience said that there's-- some of us are functional programmers. There's actually some really subtle nuances that I'll get into later. But I was simply asked, when I was first interviewing in 2013, to do linked list questions, to reverse linked lists, to do algorithms in Python. It doesn't even make sense in Python.

And I remember asking the first company why I was supposed to do this? What was the value in it? And they said, oh, we want to see if fundamental CS knowledge. Do you know your algorithms, do you know your data structures, et cetera? Then on a different interview for a different company, I got the same question, and I asked the same question. Their reason was completely different. It's, can you take a problem you've never seen before and reason through it and think on your feet?

And this got me suspicious.

[LAUGHTER]

These reasons are actually kind of contradictory. Either it's whether you have the wrong knowledge or whether you don't have the knowledge but can reason quickly. So which is it? What does it actually test? Or does it test neither of them? So here's the process I did to sort of look through the history to figure out the origins of this question and decide why it was actually being asked.

Now one important caveat here. History is big and complicated. This is the process I took. You may disagree with my results and you may disagree with my methods. That's fine. I'm showing you how this works so that you can do it, too. What's important is that we search and we keep searching.

So the first step is to actually pin down the origins. When did it happen? And this is important because we want to find the source, not the post hoc. I can easily find things from 2004 that say, oh, ask these questions about your reasoning ability. And if it comes from 2004, then we're done, we know the reason. But if it comes from earlier, then they were also post hoc justifying it.

So when does it come from? My first guess was it was the 1990s, roughly when like the dot-com era was booming. So I want to find out if this was actually true. How do I know what questions people were asking at that time for job applications? I ask other people. Turns out that just going up to people who remember stuff and asking them is a pretty great way to learn stuff.

[LAUGHTER]

Now it's not perfect, of course. People have fickle memories, we're all fallible, everything gets colored, but nonetheless, a great way to get a lot of information very fast. So my first step was to just go on Twitter and ask a question and see who responded and follow up with people. And what I heard actually surprised me.

By 1990, 1991, people were asking these in interviews, reverse linked list. But the reason-- tradition. That's what we always did. By the time I was born, linked list questions were already so old, people didn't remember why they were asking them. So we gotta go a bit earlier. I decided to take a different source here and look at the employment histories. It turns out that the number of jobs in programming started rapidly increasing around the late 1970s and early mid-1980s.

I then went to my local ACM chapter and talked to some of the people there who interviewed in that time, and that did help me pin down that yes, that was roughly the time these questions started appearing. Not the 1960s, not the early 1970s, but late 1970s, early mid '80s. And that gives us a when.

Next is figuring out what. What was the context people were living in that would give rise this kind of question? What were the challenges they were solving? What were they trying to do? And to do this, we need to get more insight into how people were working at that time. We need to get the impression of the culture.

And looking at graphs and talking to people 40 years later just isn't going to cut it. We need to look at the primary sources. Something like a book is a secondary source. It is written after the fact by somebody compiling the information, and these are often very good, very useful, very dense. But they can't give us new insights into the things. The primary sources are the direct artifacts and information from that time contemporary we build our theories on.

It could be something like, I don't know, a book from 1978 about interviewing practice that suggests using linked lists to test algorithm knowledge. It doesn't have to be that contrived, of course, it can also be something like jokes, it can be something like emails, letters, postings, arguments, just anything that gives us a snapshot of what people were thinking and doing at that time. The downside to primary sources is that they're very messy. They're often missing, and they are not really written to be read by historians 40 years later. So we have to do work to both find them and interpret them.

My first guess for a primary source was a 1978 interview guide that suggests using linked lists. Unfortunately I was not that lucky. My next idea was to say, OK, people interviewing usually are looking for people with specific skills. They then ask questions related to those skills. So if we get an insight into what people were hiring for, we can get insight into what they need to find out and the kinds of questions they'd ask.

Who here has heard of Usenet? That's good. For those of you who haven't, the best way to sort of think about it-- and this is both very, very broad and also kind of a lie, Usenet is essentially a cross between a forum and an email newsletter. It's broken up into separate topics by newsgroup.

It started in the 1980s, which makes it very accessible for what we're trying to do. It started roughly the same time we're looking for. Additionally, it's almost all archived online. I could say go to the Museum of Living Computing History thingy downtown Seattle to maybe get more detailed information, but I don't live in Seattle, I live in Chicago. This is a lot more accessible to me.

So I downloaded Usenet dumps of net dot jobs from the Internet Archive focusing on 1982 to 1986. This covers about 2,000 messages, people both looking for jobs, people offering jobs, and people complaining about jobs. This isn't necessarily a perfect slice of the computing world, it's only pandering to people on Usenet, but nonetheless, it does give us a lot of insight.

And then what I did was I sat down and just read all 2,000 messages. And in doing so, I learned two things. The first thing is that, well, people didn't really care all that much about CS degrees. A decent number of places did ask for them, but plenty were fine with an electrical engineering degree, and plenty wanted no degree at all. This does go in line with what we know of the era. The number of bachelor's degrees started dropping dramatically starting like mid-1980.

And this was because, from the same report that I got this from, more and more companies cared much more about experience at that point than credentials. If you had program for five years, you were fine even if you didn't have the degree. And this kind of throws a wrench into the assumption it was about CS. If it really was about CS, people would be using questions that-- would also be wanting to have CS degrees, but they really didn't. That would suggest that it's more about thinking quickly, but we can't be sure, we have to keep digging.

Now for the second thing, does anybody want to guess what the most popular language people were looking for was? Shout it out.

C.

Pascal.

C.

C. It was C. One in six job requests was for a C programmer. And actually, over half of them are for Unix programmers. Who knows what the second most popular was?

Assembly.

Fortran.

Pascal.

OK. I've heard a few. I've heard Assembly, Fortran, Pascal. It was actually Pascal. About 7% of jobs wanted that. Third was Fortran. Then all the Lisps combined. So we know people were hiring a lot of C and Pascal programmers at the time. And this is interesting. It gives us both an insight into what people were looking for, and we have that plus our original question, why do people ask linked list questions?

Maybe those two are related. If people were really using questions to find out what they needed from candidates, maybe there's something about C and Pascal that makes linked list questions more useful. And it turns out there is. Let's say you have not much memory and you want to dynamically allocate memory. At runtime, add to a data structure to get a bigger data structure.

Now how you do this really depends on the language you're working with. For example, in Fortran, you can't. Sorry, you can't do it. Everything is set at the allocated. If it's not big enough-- Fortran 77, I mean. If you can't do it, you're kind of stuck. With other languages, like Lisps and Smalltalk, you have pretty strong fundamental primitives to handle this for you. You may need to know some basic algorithms to say like-- to search for nodes or delete nodes, but a lot of the basic boilerplate is handled by the primitives.

But for C and Pascal, you have arrays and you have structs and that's it. If you want anything more complex, you have to use pointers and allocate the memory yourself. You have to do the manipulations yourself. If you want a dynamically-growing list of values, you have to implement the linked list yourself. If you wanted to delete from that linked list, you have to implement the algorithm yourself.

You had to basically be working with these low-level memory operations day in, day out. And in that context, it becomes obvious that even the most complicated-- well, not the most complicated, but pretty complicated questions about data structures-- linked lists, hash tables, trees, anything, it is not as hard to do those if you've already done them 20 times, which you'd have to be programming in these languages.

In that context, it's not about do you know CS, and it's not about can you think quickly. A linked list question is testing, first and foremost, have you programed in C? It'd be like if I ask you, hey, here's a list of employees, find the oldest employee in Brisbane. I'm not testing your knowledge of abstract database normalization or query optimization or relational algebra, I'm testing if you can use SQL.

Now this is a pretty nice theory. I like this theory, but just because I like it doesn't necessarily mean it's correct, right? We have to-- to make sure that this is actually watertight, we should be making predictions from it. We should be able to see if it gives us more insight or it tells us things that we can later verify.

One of the questions I chose was, if linked lists are more useful in this context for C, we'd see that in C and Pascal languages, more people would be talking about memory manipulation. It turns out that pointer ends up being a good shibboleth for that. Even languages without them are usually using them and talking about that term to discuss implementations.

So what I did was I took dumps of five languages-- Prolog, Lisp, Smalltalk, C, Pascal. I found contemporary Usenet dumps for all of them around the same time frame as the jobs, and I grep'd them all for pointers. And from that, we see that Prolog, Lisp, Smalltalk, the high-level languages, at least nowadays, are about 6% to 8% are supposed to use that term, pointer. For Pascal, it's about 10%, and for C, it's 17%. So that seems like pretty good evidence. I'm lying, it's actually terrible evidence.

See, a very common mistake that people make when they sort of start out-- and even professionals-- is to trust automated tooling to understand the context. To trust automated tooling is always accurate. And it's really terrible, because let's take the following. Does anybody a version of Smalltalk that runs on Apollo? I've heard rumors, but have no firm pointers.

Yeah, so it turns out that pointer is also an English word. This sounds kind of silly, but I have seen big, high-profile papers both on history and modern-- like research completely ruined by these kinds of mistakes. We can't really trust grep. We have to go through every single post ourselves and cross off the ones that are using it to mean English and not memory.

So I got a coffee, I got a beer, I got a shot of Malort, and I sat down with all these posts and looked through them. So Prolog, Lisp, Smalltalk. After you do this kind of pruning, it goes for about 6% to 8% talking about pointers to about 3%. Mostly implementation details or making fun of C.

[LAUGHTER]

And what that sort of implies to me is that these languages probably didn't place as high priority, if at all, on lots of memory-manipulating algorithms. On the other hand, Pascal goes for about 10% to 7%. So it does seem to still have a strong use of that. And C goes from 17% to 16.5%. So from that, we get sort of the impression that in C, primarily in Pascal to a lesser degree, this kind of memory manipulation, these kinds of simple algorithms was a very, very common problem and pattern.

Now I like this theory a lot more, and I think that it does have a lot of weight behind it. But to really be thorough, to understand the history better, we should also be looking for the loose ends. We should be tying it up, we should be thorough. So what's one loose end here? The big one is that, well, these questions might have had one purpose, but they're still around and have a new reason. Well why did they stick around? Why did people not just stop asking them?

Our hypothesis here predicts that there are someone or something that makes these questions persist, that change the dynamics of the question, that makes it about what we say it is now than about what it used to be. In this case, there are two someones. The first is Joe Spolsky. He is the co-founder of Stack Overflow. He got a start as a product manager on Excel in 1994. A lot of the stuff being done then on that product was very low-level and very little manipulation, and that profoundly influenced his view of software.

And we see it thread through most of his blog posts, through most of his writing. In particular in 2006, he wrote a post called, "The Gorilla Guide to Interviewing-- Telling Startups and Companies How to Better Interview for Software Engineers." Among other things in it, he says this. It's not-- did I-- did this break-- there we go. "Pointers require a complex form of doubly-indirected thinking that some people just can't do. A lot of "script jocks" who started programming never learned about pointers, and they could never produce the code of the quality you need."

Later he says, "That's the source of all these famous interview questions, like 'reversing a linked list.'" So it isn't perfect. It implies at some point between the 1980s and early 2000s, people started rethinking what the question was supposed to be, but he probably more than anybody else cemented it as meaning thinking quickly. His post was profoundly influential and influenced a lot of people and how they did their hiring practice. It's worth noting that his first company, Fog Creek Software, the first product, FogBugz, was mostly written in VB and C#. There wasn't actually a need to do linked list manipulation. But he came from a context where people did that a lot, and it sort of shows.

The final person here is Gayle Laakmann McDowell in 2008. She wrote Cracking the Coding Interview, based on her experiences both interviewing and interviewing for low-level program-- like low-level as in like C and machine-level systems for Apple, Google, and Microsoft. In this, she more than anybody else cemented linked list as a de facto interview question. It turns out that coming up with interview questions is really hard, and a lot of people read her book and said, ah, that's what we'll ask people. And that's one of the big reasons why it persists independent of the original context.

So, to summarize our theory here, linked list questions started out being a simple test of do you know C? Have you programmed C before? It was fairly successful at that and people kept using it. People kept reusing the same questions, but over time, fewer and fewer people were working with that kind of memory manipulation. They were working with languages like Java, C#, Python, Ruby, PHP, et cetera, and this question didn't really mean that much. So people repurposed it and said, it's actually about thinking on your feet or algorithms.

This is actually sort of the opposite of what it was supposed to be. If you could sort of think quickly through it, it doesn't show you know C, it shows you can think quickly through a problem. But so it still persists independent of its original context. Now you may agree with that thesis or you may disagree. Maybe you know something I don't. Maybe you do have that 1975 interviewing manual I tried so hard to find.

But what's important is that this is something possible. If you disagree with me, you can do the same searching. You can find the same sources I did. You can research the same history. I didn't need access to the Vatican or special skills or a lot of money. I had an afternoon, Python scripts, and internet, that's it.

So I'd like to conclude by turning the question on you. I believe that we have questions about our world, and there are answers out there, and that we can find them. So I want to ask all of you, what are those questions? What are the ones that matter to you? I want to give you some time to think about this, call it 20 seconds or so before I wrap up. This will also give me time to clean up.

[LAUGHTER]

Thinking about it?

So people did think about it, right? You weren't just watching me?

[LAUGHTER]

OK, so to finish, don't just leave them as questions, seek out the answers. I can't tell you where you'll find them because I don't know where they'll be, but I can give you some suggestions to start, some of my favorite places to start. I listed them again at this link, things like the Usenet Archives or the C2 Wiki or the ACM Digital Library. And through that, I'm hoping that you can find the same answers to the questions you have. And with that, I've said what I need to say. My name is Hillel Wayne, thank you so much for listening.

[APPLAUSE]