The Wet Codebase by Dan Abramov

← Back to 2019 talks

Transcript

(Editor's note: transcripts don't do talks justice. This transcript is useful for searching and reference, but we recommend watching the video rather than reading the transcript alone! For a reader of typical speed, reading this will take 15% less time than watching the video, but you'll miss out on body language and the speaker's slides!)

[APPLAUSE] Hi. I learned to drink a lot of water. Hi, my name is Dan Abramov. I work on a JavaScript library called React. This is actually the first conference that I speak at that is not specific to React with JavaScript. So I'm just curious, have any of you ever used React at all. OK. Yeah, a lot of people use React. That's cool. This talk is not about React. You can say it's a talk about something that, if I had a time machine and could come back to my past self, I would tell myself that talk. So it's a talk about the code base far, far away, deep under the sea.

And it's a code base that I worked on a long time ago. And in the code base, there were two different modules, two files. And my colleague and friend was working on a new feature in one of those files. And they noticed that actually that feature, something very similar was already implemented in another file. So they thought, well, why don't I just copy and paste that code because it's pretty much the same thing?

And they ask me to review the code. And I just read all the books about the best practices. Pragmatic Programmer, Clean Coder, Well Groomed Coder, and I knew that I needed to-- you're not supposed to copy and paste code because it creates a maintenance burden, it's pretty hard to work with. I just learned this acronym DRY, which stands for don't repeat yourself. And I was like this looks like a copy paste, so can we DRY it up a little bit?

And so my colleague was like, yeah, sure, I can totally extract that code to a separate module and make those two files depend on that new code. And so an abstraction was born. OK. So when I say abstraction, I mean it doesn't matter which language you're using. It could be a function or a class, a module, a package, something reusable that you can use from different places in your code base.

And so it seems like, this is great. And they live happily ever after. So let's see let's see how that abstraction evolved. So the next thing that happened, we hadn't looked at that code for a while but then we were working on a new feature and it actually needed something very similar. So let's say that the original abstraction was asynchronous, but we needed something that had pretty much the same exact shape, except it was synchronous.

So we couldn't directly reuse that code anymore, but it also felt really bad to copy and paste it because it's pretty much exactly the same code except it's slightly different. And, well, it looks like we shouldn't repeat ourselves so let's just unify those two parts and make our abstraction a bit fancier so that it can handle the case as well. And we felt really good about it. It is a bit unorthodox, but that's what happens when code meets real life, right? You make some compromises, and at least we didn't have to duplicate the code, because that would be bad, right?

So what happened next is we found out that actually, this new code, this new feature, had a bug in it, and that bug was because we thought that it needs exactly some the same code as we have. But actually it needed something slightly different. But we can fix that bug, of course, by adding a special case. So our abstraction, we can have an if statement. If it's like this particular case, then do something slightly different. Sure. Ship it. Because that happens to every abstraction, right?

And so as we were working with that code, we actually noticed that the original code also had a bug. So those two cases that we thought were the same, they were also slightly different, we just didn't realize it at the time. And so we added another special case. And at this point, this abstraction looks a bit weird and intimidating. So maybe lets make it more generic. Why do we have all those special cases in the abstraction?

Let's pull them out from the abstraction where they belong in our concrete use cases. So looks like this. So now our abstraction doesn't know about any concrete cases. It is very generic, very beautiful. Nobody really understands what it represents anymore. Oh, by the way, we need to add, now that it's parametrized from different places, we need to make sure that all code size are parametrized.

But it was such a gradual progression that at each step it makes sense to the people writing and reviewing the code, so we just left it at that. And some time passed. And so, during that time, some people have left the team, some people have joined the team. There were many fixes. Somebody needed to just do this one small fix here. I don't really know what this thing is supposed to be doing but just fix it up a little bit, add this new feature, improve the metrics. So we ended up with something like this, right?

And again, each of those individual steps kind of made sense. But if you lose track of what you were trying to do originally, you don't really know that you have a cyclical dependency or this weird thing that is growing somewhere to the side just because you don't see the whole picture anymore. And, of course, in real life, that's actually where the story ends because nobody wanted to touch the part of the code base and it just was stagnant for a long time and then somebody rewrote it. And maybe got a promotion. I don't know.

But if we could go back in time, because it's a talk, it's not real life, if we had a time machine we could go back and fix it, right? So I want to go back to the point where the abstraction still made sense. But if we had this third case and we really didn't want to duplicate that code even though it needed something slightly different. And they were like, yeah, sure, let's compromise on our abstraction. Make it funny. So this is if I from today was there, what I would've told myself is, please inline this abstraction.

And so what I mean by inline, I mean literally take that code and just copy and paste it back to the places that use it. And that creates some duplication but that destroys that potential monster we were in the process of creating. And of course duplication isn't perfect in long term, but wrong abstraction is also not perfect in long term. So we need to balance these two problems. And so the way this helps us is that now if we have a bug here and we realized actually this thing is supposed to do something different, we can just change it. And it doesn't affect any of the other places because it's isolated. And similarly, maybe we get a different bug here and we also change it.

And I'm not suggesting that you should always copy paste things. In longer term, maybe you realize that these pieces really stabilized and they make sense. And maybe you pull something out and it might not be the thing that you originally thought was a good abstraction. Might be something different. And a thing like this is as good as it gets in practice. And if I heard this when I was a sweet summer child, I would have said that that's not what they tell us. I heard that copy pasting is really bad.

And I think it's actually a self-perpetuating loop. So what happens is that developers learn best practices from the previous generation and they try to follow them. Because there were concrete problems and concrete solutions that were born out of experience. And so the next generation tries to pass them on. But it's hard to explain all this context and all this trade off, so they just get flattened into these ideas of best practices and anti-patterns.

And so they get taught to the new generation. But if the new generation doesn't understand the trade offs and the reasons they came to these conclusions, they don't have the context to decide when it's actually a bad idea and how far can you stretch this. So they run into their own problems from trying to take these best practices and anti-patterns to extreme. And so they teach the next generation. And maybe this is just you can't break out of this loop and it's just bound to happen over and over again, which is maybe fine.

I think one way to try to break this loop is just when we teach something to the next generation, we shouldn't just be two-dimensional and say here's best practices and anti-patterns. But we should try to explain what is it that you're actually trading away. What are the benefits and what are the costs of this idea? And so when we talk about the benefits of abstraction, of course it has benefits. The whole computer is a huge stack of abstractions. And I think concrete benefits are-- abstractions let you focus on a specific intent, right? So if you have this thing and they have to keep it all in their head.

But it's actually really nice to be able to focus on a specific layer. Maybe you have several places of code where you send an email and you don't want to know how an email is-- I don't know how emails are being sent. It's a mystery to me that they even arrive. But I can call a function called send email and well, it works most of the times. And it's really nice to be able to focus on it. And of course another benefit is just being able to reuse code written by you or other people and not remember how it actually works.

So if we need something, exactly the same thing that we already use from different places, it's very nice to be able to reuse it. So that's a benefit of abstraction. And abstraction also helps us avoid some some bugs. So in the example where we have a bug, maybe we copy pasted something. And that's an argument against copy paste, is we copy pasted something and then we found the bug in one version and we fix it, but then the other version stays broken because we forgot about the copy paste. So that's a good argument for why you'd want to extract something and pull it away.

But when we talk about benefits we should also talk about costs. And so one of these costs is that abstraction creates accidental coupling. And what I mean by that is, so we have these two modules using some abstraction, and then we realize that one of them has a bug. And we have to fix it in the abstraction because that's literally where the code is. But now it's your responsibility to consider all of the other call sites of this abstraction and whether you might have actually introduced a fix in another, introduced the bug in another part of the code base. So that's one cost. Maybe you can live with it. Most of us live with it. But it's a real cost.

And I think an even more dangerous cost is the extra indirection an abstraction can create. So what I mean by that is that the promise was that I would just be able to focus on this specific layer in my code and not actually care about all the layers. Is that really what happens? I'm sure most of you probably had this bug where you started one layer, oh, it goes here. And it's like, well, actually, no. You need to understand this layer and this other layer because the bug, it goes across all of those layers. And we have a very limited stack in our heads.

And so what happens is you just get a stack will fall, which is probably why the site was coded that way. And so what I see happen a lot is that we try so hard to avoid the spaghetti code that we create this lasagna code where there are so many layers that you don't know what's going on anymore at all. So that's extra indirection. And all of them wouldn't be that bad if they didn't entrench themselves.

So abstraction also creates inertia in your code base. And that's a social factor more than technical. What I've seen happen many times is you start with an abstraction that looks really promising and makes sense to you. And then with time it gets more and more complex. But nobody really has time to refactor or unwind this abstraction, especially if you're a new person on the team. You might think that it would be easier to copy and paste it, but first you don't really know how to do that anymore because you're not familiar with that code. And second you don't want to be the person who just suggests worst practices. Who wants to be the person who says, let's use copy paste here? How long do you think you're going to be on that team?

So you just accept the reality for what it is and keep doing it and hope that this code is not going to be your responsibility anymore soon. And the problem is that even if your team actually agrees that the abstraction is bad and it should be inlined, it might just be too late. So what might happen is that you're familiar with just this concrete usage and you know how to test it. If you unwind the abstraction, you can understand how to verify that change didn't break anything. But maybe there is another team who uses it here and another team who uses it there, and maybe this team has been reorged so there is no team that maintains that code, and you don't really know how to test it anymore. So you just can't make that change even if you want to.

So I really like this tweet. It's a bit hard to read. Easy-to-replace systems tend to get replaced with hard-to-replace systems, which is kind of like the Peters Principle. There's this Peter's Principle that everybody in the organization continuous raising until they become incompetent and then they can't raise anymore. And it's similar that if something is easy to replace, it will probably get replaced. And then at some point you hit the limit where it's just a mess and nobody understands how it works.

So I'm not saying that you shouldn't create abstractions. That would be a very two-dimensional or one-dimensional takeaway. I'm saying that there are things that, we're going to make mistakes. So how can we actually try to mitigate or reduce the risks from those mistakes? And so one of them that I learned on the React team in particular is to test code that has concrete business value. So what I mean by that is, say we have this a little bit wonky abstraction, but we finally got some time to write some proper tests, because we fixed some bugs and we have a gap before the new half of the year starts and we can fix some things.

So we want to write some unit test coverage for that part. And intuitively, where I would put unit test is, well, here's the abstraction where the complex code lies. So let's put unit test to cover that code. And that's actually a bad idea in my opinion, because what happens is that if later you decide that this abstraction was bad and you try to turn it into copy paste, well, guess what happens through your tests? They all fail. And now you're like, well, I guess I'll have to revert that because I don't want to rewrite all my tests. And I don't want to be the person who suggested to decrease the code coverage. So you don't do that.

But if you have a time machine you can go back and you can write your unit tests or integration tests or whatever you want to call them, fad of the day tests, against the code that we actually care about, that this code works against concrete features. And then there's this test that don't care about your abstraction. So you can inline the abstraction back. You can create five layers of abstraction. The test will tell you whether this code works. So actually they will guide you to refactor it because they can tell you that your refactoring is in fact a correct one. So testing concrete code is a good strategy.

Another one is just to restrain yourself. You see this full request. You get this itch, like, this looks duplicate. And you're like, no, take a walk. Because if you have this, you might have a high school crush and they are really into the same obscure bands on Last.fm that you're into. That doesn't mean that you have a lot in common and they're going to be a good life partner. So maybe you shouldn't do the same to the code. Just because the structure of these two snippets looks similar, it might just mean that you don't really understand the problem yet. And give it some time to actually show that this is the same problem and not just accidentally similar code.

And finally, I think it's just important that if that happens, if you make a mistake, it should be part of your team culture to be OK with, this abstraction is bad. We need to get rid of it. You should not only add abstraction, but you should also delete them as part of your healthy development process. So that means that it should be OK to leave a comment like this and say, hey, this is getting out of control. Let's spend some time to copy and paste this and later we'll figure out what to do with it.

But there is also a technical component to this. So if your dependency tree looks like this, it might actually be really challenging to inline anything because you're like, well, I have this thing I want to inline but, OK, I can copy it, but there's some mutable shared state that is now being duplicated. And I need to figure out how to rewire all of those dependencies together. And it might not even be feasible. So you just give up. And I don't really have a good solution for this. What I've noticed is that, for some code, you can't really avoid it. For example, in the source code of React itself, we do have a problem like this. Because we try to mutate things for you so you don't have to mutate them. So we have all this interdependencies between modules that can be a bit difficult to think about.

But then what's cool about React, in my opinion, is that it lets you write apps with dependency trees that are more like this. So you have a button component that's used from form, and that form is used from app. And so on like this. And it follows this tree shape. And we have these constraints for data flows only in one direction. So you don't really expect things to get weird circular. And what it means is that you're going to make mistakes, you're going to create bad abstractions, but does your technology make it easier for you to get rid of them?

Because I think with React components and some other constrained forms of dependency, like management, you have this nice property where it's usually a matter of copy and pasting things in order to inline them. And so even if you make a bad decision, you can actually undo it before it gets too late. So this is something to consider in both social and technology part of it. So don't repeat yourself. DRY is just one of those principles that are probably pretty good ideas.

And there are many good ideas that you might hear about as a developer and entering this industry. Or even as somebody who's been doing it for 15 years and then stepping outside for a few months. And we see a lot of evangelism around those things. And that is fine. But I think it's important that when we try to explain what those things do or why they're a good idea, we should always explain what exactly are you trading away and which things led us to that to that principle or idea. And what is the expiration date for those problems? Because sometimes there is some context that is assumed and that context actually changes but you don't realize that. And so the next generation needs to understand what exactly was traded off and why.

And so my challenge for you is to pick some best practices and anti-patterns that you strongly believe are true, whether from your experience or because somebody told you or because you came up with them, and really try to break it down and deconstruct why you believe these things and what exactly is being traded away. And if you found this talk interesting, you might like these other talks. So All the Little Things by Sandi Metz is an amazing talk that goes into way more detail on these ideas and many others. Minimal API Surface Area is a talk by my colleague, Sebastian, who I learn all of this stuff from. And On the Spectrum of Abstraction is an interesting talk by Cheng Lou, who goes into how abstractions help us trade the power and expressiveness for constraints and how those constraints can actually limit us, but let us do things we wouldn't be able to do otherwise. It's a good talk. And thank you for having me. That's all I have.

[APPLAUSE]