All programmers must learn C and Assembly by Joe Damato

← Back to 2017 talks

Transcript

(Editor's note: transcripts don't do talks justice. This transcript is useful for searching and reference, but we recommend watching the video rather than reading the transcript alone! For a reader of typical speed, reading this will take 15% less time than watching the video, but you'll miss out on body language and the speaker's slides!)

Hi. So welcome, this talk is called all programmers must learn C and Assembly. Greetings, my name is Joe. I like computers. I once had a blog called timetobleed.com and while I, generally speaking, always have time to bleed. I don't really have time to blog on my personal blog anymore. But if you want to hear me talk about computers, you can follow me on Twitter, I'm just Joe Damato or not.

I'm the CEO and founder of a company called packagecloud. We're a small, bootstrap startup that makes it easy for developers to create and manage repositories of RPM, Debian, Ruby gem, Python and Java packages. So you should check this out for all of your artifact storage needs.

Anyway so my talk today, generally speaking, most of the talks I like to give are about tire fires. But this talk is a little bit different. But you know, just to keep everything honest, right, generally speaking, any talk involving computers involves some amount of tires that may or may not be on fire.

And that's fine or whatever but this talk is a little bit more of a reality check, right? Where I want to examine some of the actual real, palpable reality that you know, programmers must exist in. And sort of some ways of dealing with our existence and the environment and so on. And obviously, along the way there's numerous tire fires we're going to try to navigate this boat through.

So I think one way to start this talk about computing and the computing environment and sort of like what we're all subject to is by opening with three questions that I think most programmers potentially have thought before or will at some point in their career think through. And the first question that I would pose is how do I, as a programmer, prevent myself from going obsolete, right? How do I make sure that my skills and stuff that I use today, that are keeping me employable today, will also keep me employable in the future?

Second question, how do I, as a programmer, increase my efficiency over time, right? How do I get better work done in the same amount of time as I did before? How do I get more code deployed and so on? And thirdly, is there a set of tools that I can invest my time and energy in? That I can reuse every time I need to do something different so that I don't need to learn a whole set of new tools every single time?

So my claim is that the answer to all of these questions and infinitely more questions, in so far as like computing is related, is simply become proficient in C and x86 Assembly. And all of these questions are answered immediately.

And what I mean by becoming proficient in C and x86 Assembly-- I do not mean that you need to be able to sit down and just compute out and x86 bootloader off the top of your skull. That's not really what I'm advocating. I'm just saying that becoming familiar with C and Assembly and getting to the point where you feel comfortable reading it and talking about it and looking up things about it online, will pay enormous dividends.

And to make the case for that I need to start with a really important diagram that'll be sort of the fundamental basis for like everything else I want to say. So this next diagram is super important. Yeah, so actually this is just the picture of me roasting my computer, while saying, greetings. So the actual diagram is this. And this an amazingly, beautiful diagram that I drew, this is like very obviously a computer, as you can see.

And the bottom layer of this + computer is comprised of hardware, right? And then like the next layers above that are the operating system. And then above that are your system libraries, right? And like all of this stuff here is implemented in C and Assembly, right?

So you might wonder, why do I care about the stuff below where I'm operating at, right? I'm not an operating systems programmer, I'm not a systems programmer. Why do I care that all of this stuff is written in C and Assembly? What's the use of looking down here? And so I'll leave you with like sort of the only interesting, philosophical quote in my talk, which is, you know, some guy named Churchill said, "The further back you can look, the farther forward you're likely to see." Smart guy.

So that quote I think applies to computers and the computing environment because it doesn't matter whether you're a Ruby programmer, a Python programmer, a Java programmer, or a GOALLINE programmer, a whatever programmer, right? Because you're faced with this environment and whatever you end up doing, you're just adding on top of this, right?

So if you're a Ruby programmer, what are you adding to this? You're adding another slab of C and Assembly, which is the Ruby VM. And then you're adding like your slab of Ruby code on top of that, right? And so you're still subject to everything that's going on below you. And the same is true if you're a Python programmer, right? You're just adding another slab of C and Assembly for your Python VM and then a slab of Python code on top of that. And the same thing if you're a Node.JS programmer, right?

It's all the same thing. And that's sort of like what I was trying to make everyone notice by looking at these diagrams is essentially that it doesn't really matter what your candy coating is on the outside because the chewy, chocolatey center is always C and Assembly. It just doesn't matter. And I think there's a really interesting analogy here, right? You can look at actual science and real engineering, right?

If you roll up to Earth and you just take a slab out of the Earth, you probably get something like this where you can actually look at sort of like the layers of rock on the planet. And you can understand a lot about the evolution of the planet over time and what sort of creatures existed at different periods and why they went extinct. And really interesting things about how the Earth actually works and the effect of Earth on the environment and creatures at different stages of the development of the planet, right?

And it doesn't really matter what you put on top of earth because all this stuff is still underneath, right? It doesn't matter if you're building the Burj Khalifa in Dubai or whatever, or you're building a barnyard wherever people build barnyards, or you're just building a huge tire fire, it doesn't really matter because underneath you, you're still subject to all of Earth and it's plate tectonics and everything else.

And if you fail to take into account any of that stuff or just don't have the technology to, then your sort of subject to bad things happening, right? Like in this case, earthquakes, right? But by taking into consideration all the things below you before you build something, you can invent really cool stuff, like earthquake resistant buildings, right? To prevent your stuff from collapsing.

And I think that this analogy makes a lot of sense when applied to computing as well because you have this immutable thing below you, or something that seems immutable, right? Like your system's libraries, your operating system, your VM and stuff, and you're building your structure on top of all that.

So hopefully, at this point of the talk you're thinking, this is really dope, right? This is the best talk, ever. I totally agree. But like how do I actually get started, right? Learning all of this stuff that I've never messed with before is really complicated, right? There's like lots of different pathways I can turn down, how do I actually begin to compute some C and Assembly? And to answer this question, I have prepared another really important diagram that I think will really, really help me explain one way that I like to think about it. This diagram's a little bit shocking so please prepare yourselves.

This is a freshly hatched human being. This is actually my cousin's kid, his name's Luciano. And he has one of those names that it's hard to say without embellishing it a little bit, Luciano. Anyway, the dope thing about my little homie, Luciano, is that he didn't just come into the universe with a cigar in his mouth and sit at the kitchen table and say, yo, ma, where's the pasta e fagioli at? He actually had to learn Italian-American New Jersey English word by word.

And he learned it because other people, like his mom, his dad, me, were all around him talking, saying stuff to him, right? And he was like sort of confronted with all this and he saw all these patterns, right? They imprinted on his brain and like slowly but surely he started using sounds and words to slowly build up to get to the point where he can actually speak Italian-American New Jersey English, right? It just takes time and it takes a lot of looking at stuff and saying weird things, right? Babies say weird stuff until they learn how to speak, right?

And I think the same is true for learning anything in computing, especially things that typically, people perceive as difficult. Namely operating systems, systems libraries, C, Assembly, and so on. And so if I'm taking this analogy to getting back to computing, I think there's two really important things you should read a lot of if you want to get to the point where you feel comfortable messing with your operating system and all the layers below you. And those two things that I think you should read are just man pages and source code.

So if you're new to computing or if you don't compute on Linux or whatever, man pages are just like the documentation that you can read on your command line for various tools and various aspects of your operating system. And source code, obviously, is important. You can read it in lots of different ways, right? You can go on the internet, you can read stuff off GitHub, you can download source code.

Just a tip, the way that I like to read source code is I use Debian and Ubuntu systems. So if I want to read the source code for something, I could just run apt-get source, whatever and then the computer will figure out how to assemble the source in exactly the same way as it was assembled prior to being compiled for my computer. So I can make sure that I'm looking at the source for the thing that I'm actually curious about.

So that's cool, right? But there's lots of source code to read, there's lots of man pages to read, if I want to invest in the future of my own ability to compute, where do I put the first penny of this investment? Where do I get started if this is what I care about? And so I posit that-- and this is just an opinion. I posit that if you want to get into systems programming and knowing a bunch of dope stuff about computers, you should put your first penny in the jar for ptrace and ptrace related tools, like strace, ltrace, gdb, and so on.

Why do I think that ptrace is the best place to start? I think it's the best place to start because if we look at this diagram of and computer system, learning about ptrace gives you a specific slice of this. It gives you this slice right here, in orange. And this is an important slice to be able to take a bite out of. because it doesn't matter-- anything you put above this, you're going to be able to understand how whatever you put above this interacts with everything below it, right?

So if you're a Ruby programmer, let's say, you're extra layers or whatever just get added on the top. So if you write a Ruby program, for example, that opens a file, or reads from a file, or writes data to a file, or whatever-- being able to take a bite out of your computer right here in the middle where this orange part is, it's going to reflect everything that your code is actually doing and Ruby land. And the same is true for Python code, or Node.JS code, or whatever, right?

So if you're convinced that this is a good place to start, right? So how do you actually begin this journey, right? So I would recommend that you begin this journey by learning about strace and ltrace, right? So begin by reading the man page for strace. Read it, again. Read it like a million times. And then read the man page for ltrace and read that a million times. And then start using both of them and trace everything in sight, right? So if you're developing a bunch of programs on your development workstation or whatever, just trace all that shit. And just keep tracing it and-- until you just get tired of tracing stuff.

And the thing is, is that when you use these tools, you're going to get confronted with a huge amount of output and it's going to be really confusing and really overwhelming. But that's the same way that Luciano feels when he's trying to learn how to speak, right? He's confronted with all these patterns, and all these words, and none of it means anything to him, right? But like slowly over time, he's going to see these patterns emerge, and he's going to copy them himself, and he's going to be able to speak.

And the same is going to happen to you, right? You just run strace, look at system call output, and just keep doing that for long enough, and eventually it's going to make sense to you. And the good thing for you is that you can actually read, unlike Luciano, which means that if you want to learn what a particular thing in f strace output means you can just read the man page for it, or go Google it, or whatever, right?

And so I think that that's honestly, in my opinion, that's the best way to start if you want to get into-- if you're convinced that learning about your computer is important and you want to do it, I think that this is a really good way to start out. But I would go a step further and I would say, don't just learn how to use the tool, learn how the tool actually works, right? Don't stop at just learning how to use strace and ltrace. Don't stop at just using them every single day or whatever, actually figure out how does strace trace system calls. How does ltrace trace library calls? How does this actually work?

And the answer to that question is ptrace, right? And ptrace is a really, really complicated piece of the Linux machinery. And I very, very strongly recommend that everyone try to read the man page for ptrace, at least once. You're probably going to need to read it at least 100 times. It's really, really complicated because it exposes a huge amount of really important interfaces to a really complex part of the computer system. And a lot of programs and a lot of pieces of your operating environment actually rely on ptrace existing, which I'll get to in a second.

But anyway you should read the man page for ptrace like a million times, minimum. And then read a program that uses it and a program that uses it is a program like strace. You can just run apt-get source strace get the source code for strace and actually read that shit.

So why ptrace? Well, like I said before, a lot of things rely on ptrace. Strace, ltrace, gdb, the debugger, relies on ptrace, the Linux runtime dynamic linker relies on ptrace, right? Like the way that the linker lets other programs know that libraries have been loaded is by using this really weird trick using ptrace. Something called libthread_db relies on ptrace.

Libthread_db is what makes it possible to debug multi-threaded programs, right? If you ever wondered, yo, how does gdb know that there's threads in my program? And how do I debug multi-threaded programs? Libthread_db has all the answers to that, right?

So if you ever wonder that and you want to know how do you debug multi-threaded programs? Read the source for libthread_db and you'll see that it relies on ptrace, right? So all these things are hovering around ptrace and that's why I think ptrace is a really important place to invest time and energy understanding.

But look, I'm not going to lie to you. When you open this Pandora's box and you sort of go down this rabbit hole to figure out how all this stuff works, your life is forever different after you do that. You will never be this cat laying in a bed of lavender enjoying its environment. Everything's going to be pretty, pretty bad when you open up that box. And I'm going to show you some of the things that you'll find when you open that box, shortly. And it's pretty gnarly.

And that's just like history, right? It's just like Earth. The history of Earth isn't like super dope, right? A lot of bad things have happened. But you know, at this point I assume there's probably at least one + programmer in the room, who is a functional programmer and they're thinking, yeah this is a really cool story, Joe, but I'm really nice with the monads and they doesn't afraid of side effects.

And so I would say a few things to someone who would say that. And one of those things is this isn't your part of the show, so take a seat. And then the second thing is that's cool, compute your Haskell, compute your OCaml, do whatever you want. Because at the end of the day, you still end up with an Assembly program anyway.

And truthfully it doesn't really matter, I'm just trying to keep the computing flowing. And that's what you should be doing, too. So program whenever you want, at the end of the day, understanding C and Assembly is going to help you, regardless of whether or not you're a Haskell programmer, or a Ruby programmer, or a Java programmer, or whatever.

OK. So now that I've sort of explained why I think these layers of the computer are important and what you should take away from them, I think it will be useful if I show a couple of examples of things you can find by just using a tool like strace. So I'm going to show you two really interesting bugs that you could find by just using strace that actually have ripple effects, up the stack on higher level programming languages.

So the first one is a bug that-- it's just a little story time, these are the main points of my story. This bug is actually-- OK, on Amazon EC2, it turns out that there is a set of system calls that run about 77% slower than they do on actual real computers. And the reason for why this happens is actually kind of complicated. But the gist of the story is essentially that there is a set of system calls of which the system call gettimeofday is part of.

Now gettimeofday, as its name implies is a system call that a programmer can use to get the time of day. And programs use this system call regularly. They use it so much actually-- for everything from log messages with a time stamp, to constructing SQL queries with a time stamp, to anything that involves timing or performance. Gettimeofday is used a lot.

And it's used so much that the Linux kernel actually added an entire feature to itself, called the vDSO, that allows system call-- a few system calls, including gettimeofday and a few others, to run much, much faster than every other system call. And they did this because people were calling these system calls so much. So the way this vDSO thing works is you can you sort of imagine that making a function call into the kernel, which is essentially what a system call is. You can imagine that calling into the kernel is really expensive.

And so a bunch of kernel programmers were like, yo, wouldn't it be dope if we could just move all these system calls out of the kernel and into each program? That way when a program needs to call a gettimeofday, infinity times it can do it much, much more inexpensively than calling into the kernel. And that's what the vDSO is, it allows that to happen, right?

And so if you run strace on a program you shouldn't see any system calls to gettimeofday because the vDSO is preventing system calls from happening, right? It's increasing the efficiency of your program by preventing this expensive thing from happening, which is making a system call.

It turns out that, for a bunch of reasons, this is impossible to do on EC2 and so if you look at strace output on EC2 for any program that does anything with timing information, you will see gettimeofday in the strace output. So if you ever see gettimeofday and strace output on any computer, something is wrong. Because this entire feature, the vDSO, was added to Linux specifically to prevent that from happening.

And I stumbled across this because a buddy of mine sent me some strace output. And he was like, yo, help me figure out what my program's doing. And as I scrolled through it, I was like, yo, hold up. Why is gettimeofday here?

So anyway, you can read more about exactly why this happens by just going to this URL. I posted the slides online, so if you just check the deconstruct.com hashtag you'll find them. But just bit.ly/aws-ec2-slow and you can read more about the technical deep dive as to why exactly this is impossible on EC2.

And that affects every language, right? Any language will use gettimeofday to do timing information, whether it's Ruby, Java, Python, just doesn't matter, right? So like everyone is affected by that. If you're running code on EC2, you're affected by that.

OK, so my next story time is another strace story and this actually pertains to Java. So there's a function you can call in Java called getCurrentThreadCPUTime. And all that this function is supposed to do is it's supposed to return the amount of CPU time that a Java thread had consumed.

And so the way that it is implemented is it's implemented using something called POSIX clocks. And it turns out that the Linux kernel 2.6.12 and prior to 2.6.12, there was no support for POSIX clocks in the kernel. And so any time something wanted to compute the thread CPU time, it needed to open a bunch of files, read a bunch of data out of them, convert that data into another format, do a bunch of math. And then it could compute the amount of time a thread consumed a CPU for.

And so it turns out that someone was running some Java code that was calling this function getCurrentThreadCPUTime. And they were like, yo, this is really, really slow. What is happening here? And they straced it and they saw this strace output that every time they thought they were just getting the time, they were actually opening like infinity files, reading infinity gigabytes of data out of them, and then doing infinity amounts of math.

And it turns out that doing that's really, really slow. It's much slower than just making a single call into the kernel. And so that ended up getting fixed in kernel 2.6.12. They added actual support for POSIX clocks. So then the JVM was updated to take advantage of that. And then you know, programs that are built on top of that, like performance monitoring programs, and so on, could now call getCurrentThreadCPUTime without tanking the CPU on the system.

And you know that was all computed by just someone submitting strace output from a really slow running Java program. If you want to read more about this you can just check this out it's on bit.ly/Java-slow. And you know, I think that this idea's pretty interesting, right?

I think there's an idea here which basically-- I feel like programmers sort of perpetuate these memes and like these myths about x or y being slow or x or y being bad without actually having data behind it, right? I've heard a lot of people say over the years like Java's slow or whatever and I don't know how that started. I don't know where that came from. Maybe this thing was like part of that whole idea that Java could potentially be slow.

But there's like another bigger thing and this is sort of a much more complicated bug I want to get into now that you could solve by just reading code. But this interesting bug that I'm going to get into, I've tried to explain this bug before. And usually, when I try to go into this story, everyone just starts-- it's like the beach scene in Saving Private Ryan where everyone's laid out on the beach or whatever. That's usually what happens when I start telling the story because it's gnarly.

So anyway there's this meme in computing, threads are slow, I've heard this repeated on various news websites. Or this other one, context switches are expensive, or context switches are slow. Or the only way to scale an application is with event driven programming or whatever, right? All of that is 100% universally just completely false.

And actually it's so false that anytime anyone ever says context switches are expensive, context switches are slow, you can only scale programs with events, you can find me curled up like a piece of fried shrimp in the back of a room somewhere convulsing. So please stop saying that.

It turns out that I have a pretty good theory as to how this meme started. Why do people seem to think that event driven programming is more scalable or whatever than threads. And I think that there's an actual answer to this. So I think I've identified the culprit and I believe the culprit is XFree86.

So this is a really gnarly story. So you know, I don't know how many of you are familiar with XFree86. If you're not, it's not a big deal. XFree86 is just a program people used on Linux for a while for drawing UI's. Essentially, it's the thing that displays your window manager and how you get GUI's on Linux.

And so anyway XFree86 had a bug fix that was introduced that over the course of seven years completely broke threads on Linux, really badly. And it took seven years after this change went in for someone to figure out all the different parts that were damaged and fix them later, right? So threads were broken for a really long time. And over the course of seven years, a lot of people were trying to build scalable applications, were trying to write research papers. And they were measuring stuff and people were like, yo, threads are slow, right?

And I'm going to tell the story about what this feature is that was added for XFree86 and sort of the evolution of it, until it was fixed. So I don't know, hope it's not too boring. All right, so the first story time in like a series of story times to explain this bug is there are three things that happen at the same time in computing history that are really important. And these three things are first 64-bit computers were invented, right?

And when 64-bit computers were invented, you have to keep in mind that up until that point, everyone had been computing their opcodes so that their programs only worked in 32-bit, right? Everyone was like, yo, 32-bit is the only thing in the universe, all my programs are 32-bit. Then 64-bit came out, right, and a lot of 32-bit programs started misbehaving because there were all these assumptions that were built into the source. And those assumptions suddenly became invalidated when they ran on 64-bit computers.

So that's like thing number one that happened. Simultaneously, right, so XFree86 you know, the Window managing thing, it has its own module loading system. And there's a lot of ways, on Linux, to dynamically load code while programs are running, right? So why does XFree86 have its own way? It turns out that XFree86 has its own way because XFree86 is so old that it existed before there was a way for programs to dynamically load code at runtime.

So they built their own runtime loader thing. And there is a bunch of XFree86 modules that people wanted to use that were built on 32-bit computers. And since they were built on 32-bit computers, they started failing on 64-bit computers because they had all these assumptions built into the code. So you would try to run your Window manager and I would just segfault because those models weren't designed for a 64-bit world.

And the way that sort of all this stuff sort of revolves around a system call called mmap. mmap is just a system call you can use for your program to get a bunch of RAM from the kernel, right? And we're going to sort of show how mmap is at the sort of center of like threads breaking for seven years.

So on June 29, 2001, the story starts off. Basically, someone had this idea. A kernel programmer was like, OK, look, we have all this code that expects to be-- that still thinks it's running on 32-bit computers, it's exploding now that we have 64-bit computers. We can either make every single person fix their code or we can just provide a way for programs to allocate memory to make it look like they're still running on a 32-bit computer.

And so the decision was made to add a way to allocate memory using mmap that'll look sort of the same as if you were running on a 32-bit computer. Essentially tricking all of these programs so that they would stop segfaulting.

So this commit message went into the kernel. It just says, this adds a new mmap flag to force mappings into the low 32-bit address space. The flag that was added is this flag, map_32bit. And it's useful for XFree86's ELF loader, right? So this is like, we're going to add this, this will fix the problem. A few months later, November 11, 2002, someone commits something to XFree86. OK, we have this new thing we can use in the kernel. Let's start using this knob to allocate memory in such a way that it won't cause our modules to explode anymore.

So then fast forward like a year. So January 4th, 2003. Sorry, just a couple of months. Fast forward just a couple months, January 4th, 2003. This is when the train really, really starts to go off the rails with this new knob that was added, the map 32-bit knob. This commit goes into the kernel. Make map_32bit only mapped the first 31-bit because it is usually used to map a small code model. So what the hell does that mean?

It turns out that there's a few different ways to store programs on a computer, on the computer's hard drive and read them back out. And one way of storing computer programs is using a specification called ELF, which explains this is how you store your program on disk, right? And there's different ways of arranging ELF programs depending on what you're trying to do.

One of those ways is something called the small code model. And the small code model says that if you're going to use this way of running programs, OK, you have to map all your memory into the first 31-bits of the address space. So the current developers are like, OK, we want to support this use case. Let's just take map_32bit but leave them in the same, but have it only map 31-bits. This is definitely fine.

So a few months later, in February 12th, 2003. Basically just saying, someone's reporting like, yo, context switching is really slow. When I try to make system calls, it's much slower now than it was before. What happened? And after some investigation, they realized what the problem was.

And on March 4, 2003, a change went into glibc. And the change was, the comment on the change was this: for Linux x86-64 we have one extra requirement, the stack must be in the first 32-bits otherwise, we can't use threads. I'm paraphrasing a little bit because otherwise, we'd have to explain a lot of weird stuff.

So people were basically saying that threads need to be-- thread memory has to be in the first 32-bits otherwise, threads won't work, right? And that you know, that went in and things were fine for a while. A few months later, on May 9, 2003, they decided OK, but what if we can't get memory in that first 32-bits, right? What if we're creating a bunch of threads, we suddenly run out of memory in the first 32-bits, what are we going to do next?

So they decided to add a retry mechanism, right? And so they changed the comment, they changed the code. They said OK, we prefer to have memory allocated in the low 32-bits since this allows faster context switches. They also added a retry mechanism to go through and if you fail to allocate memory, try again to allocate memory just anywhere else, right? So the strategy was give me memory that I think I can context switch into fast. If you can't give me that, just give me whatever you've got.

And the important thing about the change in this comment and a change in this code is essentially that the justification for the use of this knob changed completely, right? Initially it was threads don't work unless we use this knob. And then it was threads will be faster if we use this knob. And those are two very, very different things.

So we fast forward like five years. On August 13th, 2008, and infamous thing was posted to the Linux kernel mailing list. I call it the Pardo Report. Someone named Pardo posted this and they said, yo, mmap is slow on map 32-bit allocation failure sometimes causing thread creation to run about three orders of magnitude slower. So in one case, creating new threads goes for about 35,000 cycles up to 25 million cycles, which is less than 100 threads a second, right? So things slow down tremendously, about 1,000 times.

And so what actually happened? Why did this slow down thousands of times? This slowed down over 1,000 times and it's interesting, right? Because like the map_32bit thing that we just showed, that was added to glibc for speed, right? The justification was that use this flag so we can get memory in a certain region because that's faster for us to use, right? But that map_32bit flag, as we saw earlier, only maps 31-bits right? And 31-bits, it turns out, is a lot smaller than 32 bits.

So what was actually happening was that this our person had consumed the entire 31-bit address space, which is not that hard to do. 31-bits is only about 1 gigabyte of RAM. And what ended up happening was that subsequent allocations fell back to this retry mechanism that was added to glibc. So basically what was happening is this person was trying to allocate a bunch of threads, they consumed all 31-bits of memory, right?

And then the retry mechanism kicked in a year. And the retry mechanism was causing a linear search in the kernel. And it turns out that as you add more threads, the linear search gets bigger, and bigger, and bigger. And linear search of large lists is bad and slow, which is pretty dope because the thing that was added for speed, the retry mechanism, is the same thing that made it 1,000 times slower. This is fine.

So how is this all fixed, right? We got into this situation because we started adding-- we added a knob called map_32bit. We changed the meaning of the knob without changing the name of the knob. And then we used like a weird retry mechanism that didn't actually work because no one understood what the actual intention of the knob was to begin with and how it changed internally, right? So how do we fix this? And the answer-- the kernel answer for this question is fix this by adding another knob. This is fine.

So the new knob that was added anytime you need to create a thread on Linux is this knob called map_stack. And the definition is, use map_stack when you need to allocate thread stacks. So what does map_stack actually do? It doesn't do anything. It's completely ignored. It's just placed there as a thing that people can use to make them think it's doing something. This is fine.

So if they're adding that, then they can probably remove map_32bit, right? Wrong. Map_32bit is there forever because it was already exposed. And so programs were already built assuming that it would exist, like XFree86. And the weirdness of map_32bit, i.e., that it only maps 31-bits, also remains forever. This is fine. And this is why computing is a complete tire fire, right? Because as time goes on people are adding new things. And it's really hard to keep track of all these moving pieces.

Anyway I wrote out this entire sort of time line. You can check this out if you check out the slides online. Hopefully, you can follow me through that. If you couldn't and you wanted-- you know, you have any questions about this just run up on me later or whatever. Find me and I'll hit you up about it.

Anyway so I just want to leave you with this parting thought: learning C and Assembly means that you can debug literally anything built on a modern day computer, right? And sort of the previous bugs that I just showed, those things have ripple effects in programming languages built on top of them and sort of our understanding and the assumptions that we make based on them.

So keep that in mind next time someone runs up to you and says like, yo, events are a lot faster than threads or context switches are slow. Like maybe they're under that impression because of this weird map_32bit thing that existed over the course of you know, nearly eight years. Anyway, thanks.

[APPLAUSE]