(Editor's note: transcripts don't do talks justice.
This transcript is useful for searching and reference, but we recommend watching the video rather than reading the transcript alone!
For a reader of typical speed, reading this will take 15% less time than watching the video, but you'll miss out on body language and the speaker's slides!)
[APPLAUSE] [INAUDIBLE] fullscreen.
Present fullscreen. Present fullscreen. Next window. Present notes. Oh. Present notes. Talon sleep.
So you might have noticed that I talk to my computer, and I'm here to explain myself.
Before I do that, I have a quick disclaimer, that I have written this presentation using voice, and I will be also navigating the presentation using voice. And there's some live demo going on, and you might be surprised to know that this is not the normal acoustic environment that I work in. So all I have to say that this is probably going to be a little bit less accurate than I normally get in my day-to-day, so just keep that in mind-- the presentation goes along. Go next.
So who am I? My name's Emily. You can find me on GitHub as Tuesday 2shea or on Twitter as @yomilly. I write code for Fastly. Go next. So Fastly, we help make the internet go fast, and for Fastly, I work on the platform for delivering core CDN configurations and-- go next, go next-- I write code, mostly Perl, using my voice. Go next.
So I said I was here to explain myself. But why? Why am I doing this? Go next. So I have struggled with RSI. If you're not familiar, it's repetitive strain injury caused by overuse and repetitive movement, and it's pain felt in your muscles, nerves, or tendons. In the fall of 2017, I started suffering from pretty severe RSI and it significantly impaired my ability to type in both of my arms. So-- go next.
First I tried taking breaks, I did a lot of yoga stretching, I wore the wrist braces, I did massage, I did acupuncture, chiropractor. Go next. I tried pain creams, anti-inflammatories, I was taking like four Aleve a day. I went through rounds of occupational and physical therapy. I got trigger point injections, which sounds just as exciting as it-- which was just as exciting as it sounds. Go next.
I had an ergonomic evaluation. So I had someone come look at my workstation and completely change like all the things I was doing, make sure I'm at the right angles and I switched to fully standing desk. I got a left-hand mouse to match my right-hand mouse so I could split the load between my arms. Go next.
I got into a specialty keyboards. So I first started with a split keyboard, and then I eventually moved to the top-right keyboard here with the Mitosis, which helped me move load from my fingers to my thumbs, which your thumbs have a lot more strength. So this did help, but I still, over time, was feeling limited. Go next.
Even with all the modifications, taking like a lot of anti-inflammatories today, which was not really sustainable, and the split mouse, standing desk, and my specialty keyboard that I can only get in a group buy, which also got its challenges, I was only able to type maybe 30 to 45 minutes straight of typing, and I'd have to take long breaks in between. And in a day, I could probably get two to three hours of typing.
And even though I was getting two to three hours, it wasn't good quality of work time. It was-- I was still in a lot of pain, a lot of interruptions-- it's like your body's having this like alert that's just going crazy and just interrupting you all day long, which is pretty annoying. So not a fun time. Go next.
Which led me to start thinking some alternatives. I started thinking, can I do some of my work by voice? If I can use my voice instead of my arms, if I can just completely like remove my arms from the equation, I can get by some of the restrictions that I was feeling, and maybe-- even if by voice, it was it was clumsy and difficult. It might be better than what I was doing, so let's check it out. Go next. Go next.
So I was expecting when I started that writing Perl by voice was going to go something like this. And I'm going to show you a video. Go next. This is a gentleman on YouTube that goes by the handle scrubadub1, and he used Windows Vista speech recognition to write Perl. And so this is what I was expecting, and I'll show you how it goes. Video play.
- Open. Open parentheses. Go to beginning of document. Delete O. Press O. Go to end of document. Press Caps Lock. Info. Delete info. Press Caps Lock. Press Caps Lock. Info. Delete info. Press capital I. Delete I. Press capital I. Delete I. Press capital I. Press Caps Lock. Press Caps Lock. Delete i scroll this conflict.
Delete scroll this conflict.
Delete adult scrolls conflict for.
Delete adult scrolls conflict for delete adult scrolls conflict for.
Open parenthesis dollar sign, string, comma, dollar sign, times, close parenthesis, equals, at input, sum equal. Correct the goals as imports. Equal sign, at input semicolon. Equal sign, at sign, input semicolon. 1, OK. Delete. Backspace. Backspace. Backspace. Enter. Correct string. 1, OK. Print, dollar sign, string, x. Dollar sign x. Correct print. Print. Print. Print. Print. 2, OK. Correct ex. Times. X. 9, OK. Lowercase the x. Semicolon. Enter.
So I forgot to mention that this is actually a 10-minute long video. I've cut it so it didn't take up all the time, but if you want more of that, that's on YouTube.
Go next. So when I started looking at voice, what I was surprised to find what was that there were people that were doing it, they were writing code using voice, and it didn't work like that, it was much better. And so I'm about to show you a video of me writing the same code that scrubadub1 was writing, except using my toolset that I use on a daily. So you'll get to see a little bit of the difference. And I also intentionally left some errors in there so you can see that it's quick to go back and correct errors. Go next. Video play.
- Base code, dash, push, semicolon, Enter. At sign, phase input, op equals. Op input, all caps, info, push. Semicolon, enter. Phrase close, args. All caps, input. Push, semicolon, enter. Args, dollar 3 string, comma, space, dollar phrase, times, push, op equals. Snipline. Print dollar phrase string, comma, space. Dollar phrase times push op equals, at sign, phrase input. Semicolon, go up, go left second, delete word. All caps, info. Go down, slap. Phrase print, space, dollar phrase, string, perl, times dollar phrase times, semicolon.
Go next. Go next. OK, so how does that all work? How did I even write that? What's going on? I'm here to explain. Go next. Go next. So the technology I'm using, I have a microphone so you can hear. I've got this like dual mic situation going on. This one feeds into my laptop. It's an XLR mic, cardiod mic meant to only pick up noise from one direction. And then I'm running Dragon Dictation. So Dragon Dictation's been around for a long time. It's mostly been used for dictating prose. So I'm writing an email, I'm writing a document, but it hasn't really been used for writing code.
The technology that I am excited to tell you about today and that makes writing Perl possible is Talon. Go next. So what is Talon? Talon is a hands-free input technology. It runs on top of Dragon. So it uses Dragon's voice engine, but I can programmatically configure using Talon's API all in my toolset for whatever I need.
So it's very flexible. I can have a voice command mapped to Python function, right? It's free. So the author of Talon, he's unfortunately also a sufferer of RSI, so he's really committed to the cause of making this technology free, and I'm very thankful for that. There is a Patreon support if people are feeling generous and want to support the project.
This talk is not about Talon itself, this is about how I use Talon, and all of the demos you'll see today are tools written using Talon's API-- they're not Talon itself. Go next. So let's start with the ABC's or the ASDF's. So how do you just hit basic keys on the keyboard? Go next. So you think for-- I want to hit letter A, you just say A. But there's a problem with that. The alphabet, as it's pronounced, a lot of the letters sound very similarly. So this is a diagram I'll show you.
So A, H, J, K sound very similar. B, C, D, E, G, P, T, V, Z. Those all could be confused. I'm sure everyone here has been on the phone with tech support somewhere and you're spelling your name, M as in Mike, I is an ice cream or something, because you don't want to be misunderstood and it's really easy to-- for pronouncing the alphabet to be misunderstood. So we can't just pronounce the alphabet. Go next. Go next.
There's already a system that solved the problem of accuracy in terms of the letters. So I could use the phonetic alphabet. The phonetic alphabet, it solves the problem of not being misunderstood. But I don't want to say November everyday a million times. It's very long, so while this would be accurate, it's wouldn't be very efficient. So I have my own alphabet. Go next.
And this is my alphabet that I use. Each word is very short and phonetically different enough from the rest of the letters that they're not as likely to get confused with each other. So go next. I'm now at the scary portion of my presentation, the live demo. So I'm going to just demoed a few basic things for you, how I spell words, how I combine letter keys with the modifiers control, command, things like that, and I'll show you. Talon demo.
Harp each look, look, odd, space. Whale, odd, red, look, dip. Talon sleep. So that's how I spell. And then if I want to get-- so most of the command keys are just Control, Command, Shift, they are what you think they are. And I can combine those, I can chain commands together to get like keyboard shortcuts. So once you get the alphabet and then your Control, Command, et cetera, you have pretty good coverage of like what a keyboard can do if you know a lot of keyboard shortcuts. Talon mode. Command air. Command cap. Go down. Command vest. Oh. Enter. Command vest. Talon sleep.
So from there, I don't want to spell everything all day long, so I need a way to get words. So I have a couple of commands that will drop me back into dictating spoken word. So I'll demo that. Talon mode. Enter. Phrase hello world. Talon sleep. So what if I have words but I want it formatted, right? I've got-- I need it snake case or I need kebab or camel, or I want my text like all caps, right?
So I have a set of formatters that are really handy to help me format that text. I'll demo that. Talon mode. Enter. All caps, deconstruct is awesome, enter. Kebab, deconstruct is awesome. Oh. Enter. Kebab, deconstruct is awesome, enter. Pack title, deconstruct is awesome. Talon sleep. For the Perl folks who might be here.
So there's a unique problem with working by voice in that there are words in the English language that are spelled differently but have the same pronunciation. So how do I disambiguate-- they're called homophones, how do I disambiguate them on the fly as I'm working? So I have some tools to do that-- ooh, how did we get there? Next window. Next window. Tab. Enter. Talon sleep. Talon mode. Phrase kernel. Select words. Phones. Phones bite. Phones bite. Talon sleep.
OK, so a couple more things to live demo for you. Repetition. So as you saw in the demo from scrubadub1, he at some point was saying backspace, backspace, backspace, backspace. And that doesn't take very long to get really annoying. So you need a way to repeat commands efficiently.
So naturally I want to say a command and then say how many times to do it, but I can't just say, delete five, because there's no way to disambiguate between delete one time and the number five or delete five times. So I need a different system to like represent the number so that I can get the repetition I want, and you could say, well, delete twice, delete thrice, but what's after thrice? I don't know. I don't want to know, because I-- something I'd have to learn, I don't-- you know.
But there's another way to represent numbers, and I don't use them very often, so I just decided that's what I'll do. And the ordinal representation works well enough, so I use ordinal numbers to do repetition. So Talon mode. Enter. Emoji monocle third. Emoji snail second. Talon sleep.
And then another thing you might run into is you need some words that your voice engine doesn't know about already. So I have some custom vocabulary, maybe there's some terminology in my company that I don't want to spell over and over again, right? So I can actually add custom vocabulary to the voice engine through Talon. There's a way to-- if the voice engine can guess the pronunciation good enough, then I can add it and it'll just work. So I'll demo a few words that I ran into that I added in, and these are all custom vocabulary.
Talon mode. Enter. Phrase upsert. Enter. Phrase undef. Enter. Phrase mojolicious. Enter. Phrase perltidy. Enter. Talon sleep. Go next. OK, scary part's over. [LAUGHS]
So now that I've shown you some of the basics of working by voice, I have a longer demo for you which I will show you how I-- more closely to how I would work on a daily basis. So I'm actually building a feature using Talon that I'm going to use then after I built it. So it's like an emoji like web view that will help me pick-- search for emojis and pick them. So go next.
The caveat with this-- so someone asked me if I had sped up the video, and I did not. This is just me hitting record and recording myself doing this demo. It is faster and it is like a little more rehearsed than what I would do on a normal basis, but my point showing you here today is not to show you how I work or like the inner workings of how I think, but to showcase the tool. So video play.
- Cd talon emily. Get checkout new. Snake emoji search. Enter. subble dot enter. Diffie up. Go file phrase emoji. Enter. Page. Spring 1 8, push, delete start tab, dubquote, phrase emoji search, space dragon words. Push colon space phrase search comma. Spring 1 4 slap slap. State deaf. Phrase search. Paren. Mad. Push colon. Slap. phrase name op equals. Snake extract word, paren, mad, slap. Praise emoji sun op equals. Snake selection picker. Undo. Phrase set paren, square. Snake emoji names, square. Praise key go right. Dot phrase char, go right. Go left space, for in. Praise key, go right fourth.
Snake emoji name dot phrase keys. Paren. Go right space. Phrase if name in key. Slap. Undo. Stone. Delete word phrase name in. Space. Slap. Pop, jump. Snake selection picker. Args. Phrase title equals dubquote title emoji smear. Go left sun, smear, comma space. Phrase template equals dubquote phrase picker dot harp trap mad look smear comma space. Phrase data equals phrase emoji sun. Jump start, spring 3, slap, from import, dot phrase utils push. Snake extract word, slap. From import, dot phrase picker, push.
Snake selection picker. Go down, push. Comma space. Snake emoji short names. Delete word. Snake emoji short names. Space phrase as space. Snake emoji names. Save. Go file, phrase picker, enter. Spring 3 slap. In bracket in percent. Pad. Phrase for item in data. Enter. Tag table row, undo. Slap. Tag table row. Tag class count. Enter. Tag table data. tag class pick. Undo. Undo. Tag class pick. Title pick, space. Slap. Tag table data.
In bracket second pad phrase item. Go down slap. In bracket, in percent, space. Each near dip far odd red space. Save. Emoji search moon. Cancel. Spring 6. Jump fourth. Emoji search sound. Pick 2. Space save. Emoji search dog. Emoji search tree. Cancel. Focus iTerm. Run get status. Get add dot enter. Run get commit. Sit. Phrase add space. Delete, delete. Deep deep space. Phrase emoji searching.
Escape, colon, whale quench, enter.
Phrase emoji tab. Focus from. New window. Open talon configs. Link, air mad. Tab, tab, tab. Go down third. Enter. Link. Sun mad. Escape, page, link, dip.
So I have a command that will like do all the Vim, save, quit, but my co-workers really loved the whale quench, so--
So I left it just for you. [LAUGHS] Go next. Challenges. I want to spend a few minutes just quickly talking about some of the challenges with working by voice. Go next. The learning curve can be a bit steep. It is like learning a crazy group by keyboard, except you're using a different part of your body and now you're hearing yourself while you're working. So it can be a little bit tricky to pick it up, and I do think that-- and this can also be very exacerbated by injury. If you have a hard time typing but you need to use a computer in order to learn how to do this, the bootstrapping can be kind of tricky.
But I do think that the learning curve will get better. This is still-- some of this technology is pretty new. Like Talon was new in 2018, so tools and conventions and things are still being built and formed. So over time, I think it's will get better. Go next.
Tools with poor accessibility. So more often than not, when I run into a roadblock or something frustrating in my daily work, it's not a limitation of my tools, it's more a limitation of, I'm using a site or an app that has not the accessibility support that I would like. And this is not what this talk is about, but there are a lot of really great talks about accessibility, and I would really love it if everyone here could like spend some time like learning about them and build the apps that you would want if you were at some point needing the accessibility features. Go next.
Voice strain. So before I had to take care of my hands, make sure I'm not straining my hands, but what happens if you talk to your computer for eight hours a day? You sort of have to start treating your voice as if it's the same as you would your hands if you're typing. So make sure you're drinking a lot of water, and some people hire voice coaches to-- and specifically make sure you're not talking too loud or too soft so that you're really being healthy about how you use your voice. Go next.
Open offices. So I'm not here to tell you open offices are good or bad. I'm just going to say that adding a microphone is a challenge. And it's a two-way problem. It's not just that I'm generating noise that people might be curious about or distracted by, it's that all the noise around me is going to feed into my microphone and potentially get in the way of me working. So it's a two-way problem. There are solutions, and I'm going to talk a little bit about a few of them. Go next.
Stenomask. So they use these in courts a lot, I guess. I had never heard of them before going down the voice route. They are-- it's like a microphone, but it seals around your mouth. And so you can talk into it and it'll pick up everything you say and block out everything else, and it will also keep what you're saying private. I know some people in the voice community are using these and they find the very useful. So go next.
Another solution, I talked to at least one person who had his company buy him like one of these fancy pods. So it's like a glass thing that you go sit in and you work in there and it has some acoustic-like considerations. So it's another solution. Go next. Go next. There's remote work. So for me, I am lucky enough to work at a company and on a team that is very supportive of remote working. And so I really enjoy working at home from my couch or whatever-- I don't need to be near the computer anymore, I just need the cord. So I can lay on the floor, it's great. So remote working for me, I love it. Go next.
Not so big challenges. So what things was I expecting to be difficult and actually weren't? So supporting specific programming languages. I get a lot of questions about how difficult it is for this language, this language. It turns out, once you get functionality of the basic keys, you pretty much can do anything a keyboard can do, and you can make optimizations for different languages to get faster, but it's not like, oh, I need to write Go tomorrow, so let me go spend some hours with my toolset to make sure I can, you know? It doesn't quite work that way.
But I can do fancy things like, here's a set of voice commands that I want available when I'm working in a Python file, or here's a set of commands that I want working if I'm in a-- if I'm in a Perl file or whatever, and those won't collide and that's really cool. Go next. Oh, last slide. Last slide. I forgot. Building tools with Talon, very easy. So you saw a little demo of it, but I can-- as I'm working, I can iterate, oh, I need a thing. It'd be really handy to have this thing. It can open up my Python files, add some tools, save it, it's loaded automatically into Python and-- or in Talon, and I can just go. So that was also a thing that was surprisingly easy. Go next.
So why? I want to stop here, take a pause. Why am I talking to my computer? Why am I here talking to you about talking to my computer? I would be doing a disservice to the tech and to folks like myself if I didn't reiterate the impact that this kind of technology has on people who might otherwise not be able to use a keyboard or a mouse. It, for me, meant the difference between I'm looking at a potentially career-limiting, career-ending situation, an unsustainable situation where I'm taking too many medications and just in a lot of pain in general, and I was able to turn that around quite dramatically, and now I get to be here telling you all about talking to computers.
This can be a really impactful technology, and yeah, so go next. Takeaways. Take care of your bodies and appreciate what they can do for you. They're quite amazing instruments, and I think just spending some time and just appreciating that is good for all of us. People are using speech to write code in real life today. This is not like in the future this may happen. I was surprised to learn there are already people who've been doing this, and I found them, and I found that it works way better than I thought, so this is happening now.
Remember your non-keyboard users when you're building things. Not everyone is using keyboard and mouse, and don't lock out certain users by relying on mouse-only features or not building in keyboard shortcuts, things like that. And the last point I will make to you today is that the future might actually be more diverse than just the keyboard and mouse model.
We've been relying heavily on touch control for computers for a long time now, but it feels like there might be some changes coming along, especially with the popularity of things like voice assistance like Siri, Alexa, home automation. There's-- and not just voice, there are companies right now working on trying to actually read your neurological signals. So you wear devices and then it'll read your neurological signals to control your computer. So I just-- I think it's a really good time that we take a step back and look at what limitations-- what biases do we have because of the keyboard and mouse that we're relying on. Go next.
Thank you. Go next.
Go next. Go next. By the way, we're hiring. Go next. Go next. Here are some useful links for you if you want to learn more. Talon Voice, that's a link to the Talon software. My Talon-- all my configuration is public on GitHub. So if you're curious about what that looks like, it's there. The Talon Community, there's also a collection of tools that people have built, so you can find that there. People who are sort of new and willing to start Talon, we point them there. And then the Patreon for Talon link if you're feeling generous and want to support the cause, so thank you all.
I think I just--