DeconstructSeattle, WA - Thu & Fri, Apr 23-24 2020

← Back to 2019 talks

Transcript

(Editor's note: transcripts don't do talks justice. This transcript is useful for searching and reference, but we recommend watching the video rather than reading the transcript alone! For a reader of typical speed, reading this will take 15% less time than watching the video, but you'll miss out on body language and the speaker's slides!)

[APPLAUSE] Hello, everyone. My name is Ramsey Nasser. I am a game designer, an artist, and an educator and this talk is titled A Personal Computer for Children of All Cultures. And I want to talk to you about a problem in computer programming that I've been thinking about for some years now and what I think we might be able to do about it.

So this talk is partially based on an essay of the same title that I wrote last year for a collection of essays titled Colonizing the Digital, Technology is Cultural Practice. It's available online, and parts of it get into more detail than I have time to get into here. So you should check it out if you have a chance. I want to thank Josh Harley for inviting me into that collection and supporting you along with [INAUDIBLE].

So the title is a response to the seminal 1972 Alan Kay paper "A Personal Computer For Children Of All Ages." In it, Kay describes the Dynabook, a programmable mobile personal computer that would actually go on to inspire the form factors of modern laptops and tablet computers. It was ahead of its time in a lot of ways. Prominently featured in the paper is the story of Beth and Jimmy, pictured here, learning physics through programming and reprogramming a video game together.

Kay champion the idea that computer programming could be more than an engineering tool but a tool to really learn by doing and empower young creative minds. That thinking would go on to inspire either directly or indirectly the Smalltalk, Scratch, Squeak, Processing, Arduino projects, the work of Brett Victor, if you're familiar with it. And as an artist and an educator, programming as an empowering educational expressive craft is something that has mattered a lot to me.

And this paper and the work that inspired has been very informative. But there's a problem with the story that it tells. It doesn't take much looking around to realize that every programming series used today is based on words and punctuation taken from the English language. In order to use any modern programming tool, some knowledge of English is a requirement.

In order to use any of these tools most effectively, actual proficiency in English is unavoidable. This favors programmers natively familiar with English over others and makes a truly equitable programming experience impossible. This isn't addressed in the Alan Kay paper, and it doesn't generally come up in contemporary conversations or on programming language design.

So in 2012, during a fellowship at the Eyebeam Art and Technology Center in New York, I started exploring this apparent bias by making a programming language named [ARABIC]. My goal was to provide a programming experience entirely in my native Arabic. [ARABIC] is basically a scheme interpreter that uses Arabic words in place of English words. It's sort of a boring language by design. This is a screenshot of a REPL session, implementing and learning the algorithm to compute the Fibonacci sequence, which is something you're required to do when you make a programming language.

I want to explore what a non-English programming experience might look like. And really, I wanted to understand why such a thing didn't exist yet [INAUDIBLE] talk earlier this morning mentioned sort of just throwing yourself into things, and that's the best way that I learned. So this is how I got into language design.

So [ARABIC] succeeds superficially in that it functions as a language, it provides Arabic keywords, and it rejects non-Arabic characters in general. But ultimately, it's unable to consume libraries, APIs, and SDKs because they weren't written in Arabic. As a result [ARABIC] and languages like [ARABIC] will never really be more than toys.

So thinking about the failure of the [ARABIC] project brought me to ask a bigger question. What does a programming experience that does not privilege one culture overall others even look like? I spend a lot of time thinking about character encodings, but ultimately, my research brought me to focus on names in programming languages.

Names are already understood to be a challenge. The famous Phil Carlton quote goes, "there are only two hard things in computer science-- cache invalidation and naming things." I think the latter part is true, not just because picking a good name is hard, but because names are a major vector by which culture, in practice, English written culture, becomes permanently embedded into programming systems.

To explain what I mean, let's pick apart an example. This is an excerpt from an OpenGL graphics tutorial in C. The full program spawns a window and renders a starfield in about 300 lines of code. The specifics are not terribly important-- just note that just about every line is full of English words.

So what would it take to translate this example into a non-English programming language? Is that even possible? So not all the English words are the same thing. We can put them into a few categories and analyze them in turn. The first category to look at is what are called in programming language theory keywords.

So most languages provide some basic set of functionality that is not subject to creation or modification by the user-- usually, things like loops and conditional statements and stuff like that. Syntax as a functionality is typically provided via keywords. These are tokens that are given special treatment by the interpreter or the compiler. So it's literally looking for these tokens and acting on them differently than a way than the way it would other tokens.

As a programmer, you have no agency over keywords. They're just baked into the language. To change them, you'd need to make a new programming language, which is hard but not an insurmountable task.

In fact, that's exactly what [ARABIC] does. [ARABIC] provides you with non-English keywords. So we can address this. This is a solvable problem somewhat.

This isn't a formal term. But for this talk, I want to call these local identifiers. These are names that the programmer assigns to their own functions, variables, and data structures.

So you do have choice as a programmer over what these words are going to be. Although, historically, languages will would impose restrictions on what constituted a legal identifier. Historically, that would be the ASCII character set. So you'd be sort of limited to Latin characters.

But if you're making a new language to provide new keywords anyway, you may as well relax that restriction and allow people to use Unicode to use characters in any language that they want. In fact, many modern languages do this. So in Swift, C#, and JavaScript-- and I'm sure there are others-- they allow Unicode identifiers. So right now, today, in any of these languages, you could make Arabic function names, and there's nothing stopping you.

So keywords and local identifiers, these are things that could be addressed by redesigning the programming languages themselves. Things get a lot trickier when we get to the third category, which for this talk, I want to call external identifiers. These are identifiers that you yourself did not write, but you still have to use, nonetheless.

These are names that are provided by libraries, SDKs, frameworks by the useful code that other people wrote that you build your software on. In the case of this example, the libraries are OpenGL and GLUT. These are graphics libraries.

Modern programming is not possible without code sharing like this, but because the way languages are built today, using a library means you have to use the exact names of all the declarations that the library author originally used in order to invoke them. Even if your language supports keywords in your native language, even if your language supports Unicode identifiers, if the author of a library you're using chose English names, which is true of every software library I've ever seen, you have to use the exact same English names that they chose in your own program.

So designing a new programming language is a bounded problem. It's hard, but you can do it. Tackling the entire ecosystem of libraries that are currently in circulation is not. Think of every library you've ever used. I don't know how many tens of thousands or hundreds of thousands or millions of libraries are in circulation today, some going back decades.

Think of the operating system SDKs. If you want to write POSIX programs, if you want to write Win32 programs, these are all libraries exporting English language names that you have to use if you want to build on top of that software. That's because names in programming are not external from the things that they name. Names in programming in a strange way become an intrinsic part of the thing that they name.

This is visible in a dump of a machine code and a symbol table of the GLUT library that the example that I was showing uses. You could see the names that the authors of the librarian assigned the functions are visible in the binary itself alongside the machine code that actually makes up the bodies of the functions that would run. These names are used by compiler toolchains, particularly parts of the tool chain tend to be called the linker to look up code objects and stitch together a working program.

If you use different names or the wrong names, the compiler toolchain isn't going to find the correct code, and you're going to gather a linker error. Your code's not going to compile. I'll show you examples in C because it's fairly low-level, but this is true of every programming system that I've worked with.

If you built a wrapper or a translation layer around these names, the names the original author chose are still the real ones as far as the computer is concerned. It's almost as when it was decided that podium would be the English word for podiums that the word podium, just its sound and its spelling became part of the molecular structure of the object. And you couldn't sort of extract them from each other. This is very much how programming works.

This is not an indictment of names themselves, and it's not a mistake that programming is full of names. Computer programming is an exercise in managing enormous amounts of complexity with a relatively tiny brain. Programming languages give us tools designed for our human minds as an interface into the vast complexities of computer science.

And one of the most important of these tools is the ability to name things. You can try and imagine programming using memory offsets, instead of function names, and what a nightmare that would be. Some people who work in very low-level embedded systems do a little bit of this stuff. But there's a limit to the scale and correctness of a system that you can build. If you're trying to keep mappings in linear memory in your head, as opposed to just using meaningful names.

So names are important for human cognition, and names get baked into code in a way that forces you to use names chosen by others in your own code. These two realities combine into a cultural and political problem because there's no such thing as a neutral name. Absent from most conversations about names in programming is how deeply cultural the act of naming something is. At the very least, a textual name is in one writing system and not any others. So at the very least, it carries with it the written culture of the person who assigned that name.

Historically, naming a territory was part of the spoils of war. That's still visible today in the dozen-odd cities named Alexandria between Egypt and Afghanistan left over from Alexander the Great's violent march across the Middle East. There's a story of lasting violence and war in the names of every one of these cities.

My own personal name was deliberately chosen by my immigrant parents to be pronounceable in the west where I'm called Ramsey and in my native Lebanon where it's pronounced Ramsey, as they imagined my future before I was even born. This is my mom and me as a baby where I'm already skeptical about computers.

This is a story of hope for a better life in the name that my parents chose for me. Naming is a deeply human act that records history and language. And it could be poetic and beautiful and violent and just about anything but neutral.

So what do we do? Names are what give the human mind a fighting chance to comprehend and manage the vast complexities of computing. Modern programming is only possible by building on existing systems, which means using names chosen by others in any new code that you write. Since every name carries with it the assumptions and worldview of the person who assigned it, every programmer today is forced into familiarity with the written culture of programmer's past.

So how do we build a programming experience that's not like this that doesn't favor one written culture over all others? I admit it's hard to even imagine what an alternative might look like, and it's taken me time to even arrive at a sketch of a solution. I think there's a particularly scary kind of oppression that robs you of even an imaginary liberation. But I do have a sketch that I want to share with you in my remaining time. So I'd like to first enumerate what features an acceptable solution would have and their implications.

So in my view, an acceptable solution must, number one, allow everyone to use meaningful names. now remembering that a meaningful name is going to be different to different people, it's going to be in a different language to different people. The implication of this is that constructs in this system must support multiple names for themselves. If everything had to have a single name, then someone's name is going to win out over someone else's, and we're back to basically the status quo.

Feature two, an acceptable solution must support global collaboration. So code written in India, for example, must be usable by programmers in France without invalidating 0.1. So, currently, this is possible. You can write code and import code written anywhere in the world, but that's because we force everyone to learn English. And that's exactly what we're trying to get away from

Point three is that it's somewhat definitional from the stated goal of the system, but an acceptable solution must not privilege one culture over others. And the implication of that is that we can't treat one written culture as the real one and translate to and from it because, again, that's not an equitable system or programming experience. If this looks hard, it is. This is a tall order, and I'm going to pretend that it's not.

Here's the basic idea. I think we need to decouple human-friendly readable names from machine-friendly canonical names, where the canonical name becomes the real thing that your compiler toolchain uses to stitch together your correct program. And the readable names exists for human thought and human convenience.

This is not a terribly new or groundbreaking idea. There are versions of this-- no pun intended-- in a variety of systems, most prominently in git. So in a git repository, the canonical names for things are their hashes. All right?

History is encoded as hashes. References are encoded as hashes. And in this case, they're all content hashes, and hashes have a bunch of really interesting sort of formal properties, but they're hard to remember. And they're hard to say out loud.

So git separately allows programmers to maintain branches and tags, which are readable names that can reference hashes and then they mutated and moved around. So in this example, there's a git history where the master branch is pointing to f3oab testing and head or pointing to 87ab2.

So some of these tags and branches can be shared. So, typically, in an open-source project, your master branch should be publicly viewable, and that would be the this is where this project is at branch. And some of them might be kept local. So your head branch typically in a local Git repository will refer to the branch-- to the commit that you're working against in your working directory.

So this is the kind of deep coupling I mean. One of these names is canonical. The other one is not. And it could be sort of shared or not shared in a way that's very fluid.

There are already programming languages that are investigating adopting an approach like this-- not for cultural reasons, but because if you build your programming language around these ideas, you start to get interesting benefits as far as distributed computing is concerned and granular version control is concerned. There's a programming language called unison it's on unisonweb.com is their website that is exploring this. Thomas Getgood is a closure dev who has built a closer implementation of this idea or something similar to the unison idea in a project called XYZZY.

So it's in the air. People are thinking about this already for programming languages-- again, not for cultural reasons. This is the sketch of it in my head, and I'm intentionally loose on some details because I'm not sure how most of this would work.

But I want you to imagine this is more of a virtual machine that supports multiple programming languages than a single programming language that everyone would use. So in that way, it's more like the common language runtime or the JVM, if you're familiar with those technologies. Each language built on top of it could expose keywords and local identifiers in whatever native language the programmer might want. The core system's job would be to facilitate the problem of external identifiers and libraries.

How do we reuse each other's code independent of the names that we assigned to our declarations? Also, in this slide and the next few slides, because are a bunch of hashes, I've given all the hashes a consistent and unique coloring just to make it a little bit easier to follow, but that's the meaning of these hashes. That's the meaning of the colors, I mean.

So when you compile a declaration in the system, three things happen. First of all, the resulting binary that the compiler produces is hashed, and that hash becomes the declaration's canonical name. So the draw rectangles canonical name is 3026. The draw triangles canonical name is b2e4 and so on. Importantly, the name that the programmer used is not part of the hash, just the resulting compiled code.

Second thing that happens is the hash, the compiled binary, and any dependencies get published to a globally visible distributed hash table. That's the arrow pointing to Africa. In an earlier version of this, that had been a cloud, but my girlfriend got mad at me because the cloud is a really stupid icon. And she's right. So it goes out into the world.

So here we see draw house has its own hash, but it depends on two other things. So that whole bundle gets published together. And finally, the name of the programmer used is associated with the hash in a separate data structure in a dictionary. These dictionaries might be published. You might keep them local.

It's kind of like git branches and tags. It's kind of up to you. It could be that downloading a library just means downloading its dictionary because then your VM will know which hashes to go out and grab because all the hashes are globally visible. So I want to walk through what an asynchronous kind of global collaboration might look like under this scheme.

This slide builds on the previous slide. So you see that draw rectangle and draw a triangle have the same hashes. But what's happened here is imagine with me, if you will, that somewhere in the Middle East, someone runs a workshop using the system, using this little drawing library. They acquire or write a dictionary that provides Arabic names for the same functions for their students.

So draw a rectangle. If it's visible to you up there, it is the Arabic version [ARABIC], and then draw triangle is an Arabic version that is [ARABIC]. Importantly, we're not really translating to and from English. The core thing here is the hash.

So what we're doing is we're giving a new name to a thing that already exists in the same way that a table is called table in English, but we call it [ARABIC] in Arabic. And that's not necessarily a translation. I'm rendering new function declarations in the middle and what they might output on the bottom.

So imagine that a student writes a function, their own draw house function. So here the Arabic, in the middle says [ARABIC], which is define draw house. It gets its own hash, 89c34, and you see that it depends on the red and green hashes from the previous slide and from this slide.

Now that same student-- imagine with me-- elaborates on that and writes a new function that is [ARABIC], draw a city, and that hashes to e51, that purple hash, and you see that it depends on their draw house function. It could be that you have some kind of looping construct and are calling that function in a loop to plop down houses within a radius. This step could happen entirely in their native language. They don't need to know or care that somewhere down the dependency tree there are hashes for which there also exists English names or names in other languages. This experience is entirely an Arabic one.

Finally, imagine a third student now somewhere else in the world-- maybe someone bilingual knows both English and Arabic, like some programmers are told to do, sees that this function has been written and decides to incorporate it into their own draw map function, which gets this dark yellow hash and depends on if I want A. The system could allow that dependency fluidly because there's nothing special about whether or not the function was written in one language or another. It all compiles into this actually neutral hash land.

This is the kind of back and forth collaboration that I think is crucial for a system like this to have. It's not that things are being written in English and then flowing downwards to other languages. It's that their code can really be written independently in separate languages and flow between people without passing through some written culture that acts as a gatekeeper. And English and Arabic here are not special, of course, this could be any two languages. These are just the two that I know.

So that's the sketch as I have it so far. I hope that was somewhat coherent. I really tried to get a demo working in time, but it turns out building this is really hard. I do have a prototype based on web assembly and IPFS that I'm playing around with.

And a system like this couldn't be retrofitted onto existing systems. You couldn't consume code that has to be looked up by the English language names. So it constitutes something of a reset button, and adopting it would be a major undertaking, which is true for any nonincremental tool. And if I'm being honest, I don't really expect people to massively refactor their workflows for a system that only gives you a cultural benefit. That's not enough of an offering to give most working programmers.

But if a system like this does enable other interesting things like distributed computing properties or granular version control, you might be able to sneak a cultural fix into a system that gets adopted for its more formal technical properties. Even then this doesn't solve everything. International collaboration across language barriers is hard and fundamentally not a technical problem. I found out before I came on stage that the talks are being live transcribed, and I'm very curious like what happened with the Arabic words that I said. This is a hard thing to do.

But the state of programming currently is one where we force everything to be in a single language, and we can't even have a conversation of what that might look like. So I don't know where this is headed or if any of these ideas are the right ones, but I keep working on this because the alternative is to just give up and just tell people to please learn English if you want to be a good programmer, and that's just not fucking good enough, so thank you.

[APPLAUSE]