These didn’t track as AI-generated at first…and then I tried to read the text — THE STANFORD PRESERIBENT. You can see the whole set on Bluesky (if you have access).
The four members of the Beatles, assisted by machine learning technology, come together one last time to record a song together, working off of a demo tape recorded by John Lennon in the 70s.
The long mythologised John Lennon demo was first worked on in February 1995 by Paul, George and Ringo as part of The Beatles Anthology project but it remained unfinished, partly because of the impossible technological challenges involved in working with the vocal John had recorded on tape in the 1970s. For years it looked like the song could never be completed.
But in 2022 there was a stroke of serendipity. A software system developed by Peter Jackson and his team, used throughout the production of the documentary series Get Back, finally opened the way for the uncoupling of John’s vocal from his piano part. As a result, the original recording could be brought to life and worked on anew with contributions from all four Beatles.
Ok, this is a little bit bonkers: HeyGen’s Video Translate tool will convert videos of people speaking into videos of them speaking one of several different languages (incl. English, Spanish, Hindi, and French) with matching mouth movements. Check out their brief demo of Marques Brownlee speaking Spanish & Tim Cook speaking Hindi or this video of a YouTuber trying it out:
The results are definitely in the category of “indistinguishable from magic”.
Photographs have always been an imperfect reproduction of real life — see the story of Dorothea Lange’s Migrant Mother or Ansel Adams’ extensive dark room work — but the seemingly boundless alterations offered by current & future AI editing tools will allow almost anyone to turn their photos (or should I say “photos”) into whatever they wish. In this video, Evan Puschak briefly explores what AI-altered photos might do to our memories.
I was surprised he didn’t mention the theory that when a past experience is remembered, that memory is altered in the human brain — that is, “very act of remembering can change our memories”. I think I first heard about this on Radiolab more than 16 years ago. So maybe looking at photos extensively altered by AI could extensively alter those same memories in our brains, actually making us unable to recall anything even remotely close to what “really” happened. Fun!
But also, one could imagine this as a powerful way to treat PTSD, etc. Or to brainwash someone! Or an entire populace… Here’s Hannah Arendt on constantly being lied to:
If everybody always lies to you, the consequence is not that you believe the lies, but rather that nobody believes anything any longer. This is because lies, by their very nature, have to be changed, and a lying government has constantly to rewrite its own history. On the receiving end you get not only one lie — a lie which you could go on for the rest of your days — but you get a great number of lies, depending on how the political wind blows. And a people that no longer can believe anything cannot make up its mind. It is deprived not only of its capacity to act but also of its capacity to think and to judge. And with such a people you can then do what you please.
This is the incredible and interesting and dangerous thing about the combination of our current technology, the internet, and mass media: “a lying government” is no longer necessary — we’re doing it to ourselves and anyone with sufficient motivation will be able to take advantage of people without the capacity to think and judge.
P.S. I lol’d too hard at his deadpan description of “the late Thanos”. RIP, big fella.
Artist and filmmaker Paul Trillo made Thank You For Not Answering, an artful experimental short film, using a suite of AI tools. The end credits of the film read:
Trillo demonstrated the process to me during a Zoom call; in seconds, it was possible to render, for example, a tracking shot of a woman crying alone in a softly lit restaurant. His prompt included a hash of S.E.O.-esque terms meant to goad the machine into creating a particularly cinematic aesthetic: “Moody lighting, iconic, visually stunning, immersive, impactful.” Trillo was enthralled by the process: “The speed in which I could operate was unlike anything I had experienced.” He continued, “It felt like being able to fly in a dream.” The A.I. tool was “co-directing” alongside him: “It’s making a lot of decisions I didn’t.”
I know, I know. Too much Wes Anderson. Too much AI. But there is something in my brain, a chemical imbalance perhaps, and I can’t help but find this reimagining of the Lord of the Rings in Anderson’s signature style funny and charming. Sorry but not sorry.
Expanding on his previous thoughts on the relationship between AI and capitalism — “I tend to think that most fears about A.I. are best understood as fears about capitalism” — Ted Chiang offers a useful metaphor for how to think about AI: as a management-consulting firm like McKinsey.
So, I would like to propose another metaphor for the risks of artificial intelligence. I suggest that we think about A.I. as a management-consulting firm, along the lines of McKinsey & Company. Firms like McKinsey are hired for a wide variety of reasons, and A.I. systems are used for many reasons, too. But the similarities between McKinsey — a consulting firm that works with ninety per cent of the Fortune 100 — and A.I. are also clear. Social-media companies use machine learning to keep users glued to their feeds. In a similar way, Purdue Pharma used McKinsey to figure out how to “turbocharge” sales of OxyContin during the opioid epidemic. Just as A.I. promises to offer managers a cheap replacement for human workers, so McKinsey and similar firms helped normalize the practice of mass layoffs as a way of increasing stock prices and executive compensation, contributing to the destruction of the middle class in America.
A former McKinsey employee has described the company as “capital’s willing executioners”: if you want something done but don’t want to get your hands dirty, McKinsey will do it for you. That escape from accountability is one of the most valuable services that management consultancies provide. Bosses have certain goals, but don’t want to be blamed for doing what’s necessary to achieve those goals; by hiring consultants, management can say that they were just following independent, expert advice. Even in its current rudimentary form, A.I. has become a way for a company to evade responsibility by saying that it’s just doing what “the algorithm” says, even though it was the company that commissioned the algorithm in the first place.
Good stuff — I especially enjoyed the mini You’re Wrong About on the Luddites — do read the whole thing.
No matter which side you come down on in the debate about using AI tools like Stable Diffusion and Midjourney to create digital art, this video of an experienced digital artist explaining how he uses AI in his workflow is worth a watch. I thought this comment was particularly interesting:
I see the overall process as a joint effort with the AI. I’ve been a traditional artist for 2 decades, painting on canvas. And in the last five years I’ve been doing a lot of digital art. So from that part of myself, I don’t feel threatened at all.
I feel this is an opportunity. An opportunity for many new talented people to jump on a new branch of art that is completely different from the one that we have already in digital art and just open up new way of being creative.
I’m not going to make a habit of posting AI generated video and photography here (mainly because most of it is not that interesting) but Pepperoni Hug Spot is just too perfect a name for a pizza place to pass up. And it’s got Too Many Cooks vibes.
Well this is some bizarre good fun — turns out that the campy goofiness of Star Wars and the campy seriousness of high fashion make for a pretty good combination.
[Yesterday I spent all day answering reader questions for the inaugural Kottke.org Ask Me Anything. One of them asked my opinion of the current crop of AI tools and I thought it was worth reprinting the whole thing here. -j]
Q: I would love to know your thoughts on AI, and specifically the ones that threaten us writers. I know you’ve touched on it in the past, but it seems like ChatGPT and the like really exploded while you were on sabbatical. Like, you left and the world was one way, and when you returned, it was very different. —Gregor
A: I got several questions about AI and I haven’t written anything about my experience with it on the site, so here we go. Let’s start with two facts:
ChatGPT moved me to tears.
I built this AMA site with the assistance of ChatGPT. (Or was it the other way around?)
Ok, the first thing. Last month, my son skied at a competition out in Montana. He’d (somewhat inexplicably) struggled earlier in the season at comps, which was tough for him to go through and for us as parents to watch. How much do we let him figure out on his own vs. how much support/guidance do we give him? This Montana comp was his last chance to get out there and show his skills. I was here in VT, so I texted him my usual “Good luck! Stomp it!” message the morning of the comp. But I happened to be futzing around with ChatGPT at the time (the GPT-3.5 model) and thought, you know, let’s punch this up a little bit. So I asked ChatGPT to write a good luck poem for a skier competing at a freeski competition at Big Sky.
In response, it wrote a perfectly serviceable 12-line poem with three couplets that was on topic, made narrative sense, and rhymed. And when I read the last line, I burst into tears. So does that make ChatGPT a soulful poet of rare ability? No. I’ve thought a lot about this and here’s what I think is going on: I was primed for an emotional response (because my son was struggling with something really important to him, because I was feeling anxious for him, because he was doing something potentially dangerous, because I haven’t seen him too much this winter) and ChatGPT used the language and methods of thousands of years of writing to deliver something a) about someone I love, and b) in the form of a poem (which is often an emotionally charged form) — both of which I had explicitly asked for. When you’re really in your feelings, even the worst movie or the cheesiest song can resonate with you and move you — just the tiniest bit of narrative and sentiment can send you over the edge. ChatGPT didn’t really make me cry…I did.
But still. Even so. It felt a little magical when it happened.
Now for the second part. I would say ChatGPT (mostly the new GPT-4 model), with a lot of hand-holding and cajoling from me, wrote 60-70% of the code (PHP, Javascript, CSS, SQL) for this AMA site. And we easily did it in a third of the time it would have taken me by myself, without having to look something up on Stack Overflow every four minutes or endlessly consulting CSS and PHP reference guides or tediously writing tests, etc. etc. etc. In fact, I never would have even embarked on building this little site-let had ChatGPT not existed…I would have done something much simpler and more manual instead. And it was a *blast*. I had so much fun and learned so much along the way.
I’ve also been using ChatGPT for some other programming projects — we whipped the Quick Links into better shape (it can write Movable Type templating code…really!) and set up direct posting of the site’s links to Facebook via the API rather than through Zapier (saving me $20/mo in the process). It has really turbo-charged my ability to get shit done around here and has me thinking about all sorts of possibilities.
I keep using the word “we” here because coding with ChatGPT — and this is where it starts to feel weird in an uncanny valley sort of way — feels like a genuine creative collaboration. It feels like there is a “someone” on the other side of that chat, a something that’s really capable but also needs a lot of hand-holding. Just. Like. Me. There’s a back and forth. We both screw up and take turns correcting each other’s mistakes. I ask it please and tell it thank you. ChatGPT lies to me; I gently and non-judgmentally guide it in a more constructive direction (as you would with a toddler). It is the fucking craziest weirdest thing and I don’t really know how to think about it.
There have only been a few occasions in my life when I’ve used or seen some new technology that felt like magic. The first time I wrote & ran a simple BASIC program on a computer. The first time I used the web. The first time using a laptop with wifi. The first time using an iPhone. Programming with ChatGPT over the past few weeks has felt like magic in the same way. While working on these projects with ChatGPT, I can’t wait to get out of bed in the morning to pick up where we left off last night (likely too late last night), a feeling I honestly have not consistently felt about work in a long time. I feel giddy. I feel POWERFUL.
That powerful feeling makes me uneasy. We shouldn’t feel so suddenly powerful without pausing to interrogate where that power comes from, who ultimately wields it, and who it will benefit and harm. The issues around these tools are complex & far-reaching and I’m still struggling to figure out what to think about it all. I’m persuaded by arguments that these tools offer an almost unprecedented opportunity for “helping humans be creative and express themselves” and that machine/human collaboration can deepen our understanding and appreciation of the world around us (as has happened with chess and go). I’m also persuaded by Ted Chiang’s assertion that our fears of AI are actually about capitalism — and we’ve got a lot to fear from capitalism when it comes to these tools, particularly given the present dysfunction of US politics. There is just so much potential power here and many people out there don’t feel uneasy about wielding it — and they will do what they want without regard for the rest of us. That’s pretty scary.
Powerful, weird, scary, uncanny, giddy — how the hell do we collectively navigate all that?
(Note: ChatGPT didn’t write any of this, nor has it written anything else on kottke.org. I used it once while writing a post a few weeks ago, basically as a smart thesaurus to suggest adjectives related to a topic. I’ll let you know if/when that changes — I expect it will not for quite some time, if ever. Even in the age of Ikea, there’s still plenty of handcrafted furniture makers around and in the same way, I suspect the future availability of cheap good-enough AI writing/curation will likely increase the demand and value for human-produced goods.)
In a piece about how the pace of improvement in the current crop of AI products is vastly outstripping the ability of society to react/respond to it, Ezra Klein uses this cracker of a phrase/concept: “the difficulty of living in exponential time”.
I find myself thinking back to the early days of Covid. There were weeks when it was clear that lockdowns were coming, that the world was tilting into crisis, and yet normalcy reigned, and you sounded like a loon telling your family to stock up on toilet paper. There was the difficulty of living in exponential time, the impossible task of speeding policy and social change to match the rate of viral replication. I suspect that some of the political and social damage we still carry from the pandemic reflects that impossible acceleration. There is a natural pace to human deliberation. A lot breaks when we are denied the luxury of time.
But that is the kind of moment I believe we are in now. We do not have the luxury of moving this slowly in response, at least not if the technology is going to move this fast.
Covid, AI, and even climate change (e.g. the effects we are seeing after 250 years of escalating carbon emissions)…they are all moving too fast for society to make complete sense of them. And it’s causing problems and creating opportunities for schemers, connivers, and confidence tricksters to wreck havoc.
In this final installment of Everything is a Remix, Kirby Ferguson offers his perspective on image generation with AI, how it compares to human creativity, and what its role will be in the future. In watching the part about the anxiety in the creative community about these image generators, I was reminded of what Ted Chiang has said about fears of technology actually being fears of capitalism.
It’s capitalism that wants to reduce costs and reduce costs by laying people off. It’s not that like all technology suddenly becomes benign in this world. But it’s like, in a world where we have really strong social safety nets, then you could maybe actually evaluate sort of the pros and cons of technology as a technology, as opposed to seeing it through how capitalism is going to use it against us.
I agree with Ferguson that these AI image generators are, outside the capitalist context, useful and good for helping humans be creative and express themselves. Tools like Midjourney, DALL-E, and Stable Diffusion allow anyone to collaborate with every previous human artist that has ever existed, all at once. Like, just think about how powerful this is: normal people who have ideas but lack technical skills can now create imagery. Is it art? Perhaps not in most cases, but some of it will be. If the goal is to get more people to be able to more easily express and exercise their creativity, these image generators fulfill that in a big way. But that’s really scary — power always is.
In 2020, before the current crop of large language models (LLM) like ChatGPT and Bing, Emily Bender and Alexander Koller wrote a paper on their limitations called Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In the paper, Bender and Koller describe an “octopus test” as a way of thinking about what LLMs are capable of and what they aren’t. A recent profile of Bender by Elizabeth Weil for New York magazine (which is worth reading in its entirety) summarizes the octopus test thusly:
Say that A and B, both fluent speakers of English, are independently stranded on two uninhabited islands. They soon discover that previous visitors to these islands have left behind telegraphs and that they can communicate with each other via an underwater cable. A and B start happily typing messages to each other.
Meanwhile, O, a hyperintelligent deep-sea octopus who is unable to visit or observe the two islands, discovers a way to tap into the underwater cable and listen in on A and B’s conversations. O knows nothing about English initially but is very good at detecting statistical patterns. Over time, O learns to predict with great accuracy how B will respond to each of A’s utterances.
Soon, the octopus enters the conversation and starts impersonating B and replying to A. This ruse works for a while, and A believes that O communicates as both she and B do — with meaning and intent. Then one day A calls out: “I’m being attacked by an angry bear. Help me figure out how to defend myself. I’ve got some sticks.” The octopus, impersonating B, fails to help. How could it succeed? The octopus has no referents, no idea what bears or sticks are. No way to give relevant instructions, like to go grab some coconuts and rope and build a catapult. A is in trouble and feels duped. The octopus is exposed as a fraud.
The paper’s official title is “Climbing Towards NLU: On Meaning, Form, and Understanding in the Age of Data.” NLU stands for “natural-language understanding.” How should we interpret the natural-sounding (i.e., humanlike) words that come out of LLMs? The models are built on statistics. They work by looking for patterns in huge troves of text and then using those patterns to guess what the next word in a string of words should be. They’re great at mimicry and bad at facts. Why? LLMs, like the octopus, have no access to real-world, embodied referents. This makes LLMs beguiling, amoral, and the Platonic ideal of the bullshitter, as philosopher Harry Frankfurt, author of On Bullshit, defined the term. Bullshitters, Frankfurt argued, are worse than liars. They don’t care whether something is true or false. They care only about rhetorical power — if a listener or reader is persuaded.
The point here is to caution against treating these AIs as if they are people. Bing isn’t in love with anyone; it’s just free-associating from an (admittedly huge) part of the internet.
This isn’t an exact analogue, but I have a car that can drive itself under certain circumstances (not Tesla’s FSD) and when I turn self-drive on, it feels like I’m giving control of my car to a very precocious 4-year-old. Most of the time, this incredible child pilots the car really well, better than I can really — it keeps speed, lane positioning, and distance to forward traffic very precisely — so much so that you want to trust it as you would a licensed adult driver. But when it actually has to do something that requires making a tough decision or thinking, it will either give up control or do something stupid or dangerous. You can’t ever forget the self-driver is like a 4-year-old kid mimicking the act of driving and isn’t capable of thinking like a human when it needs to. You forget that and you can die. (This has the odd and (IMO) under-appreciated effect, when self-drive is engaged, of shifting your role from operator of the car to babysitting the operator of the car. Doing a thing and watching something else do a thing so you can take over when they screw up are two very different things and I think that until more people realize that, it’s going to keep causing unnecessary accidents.)
What I’ve described sounds a lot like ChatGPT, or most any other large-language model. Think of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry jpeg, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.
Reframing the technology in that way turns out to be useful in thinking through some of its possibilities and limitations:
There is very little information available about OpenAI’s forthcoming successor to ChatGPT, GPT-4. But I’m going to make a prediction: when assembling the vast amount of text used to train GPT-4, the people at OpenAI will have made every effort to exclude material generated by ChatGPT or any other large-language model. If this turns out to be the case, it will serve as unintentional confirmation that the analogy between large-language models and lossy compression is useful. Repeatedly resaving a jpeg creates more compression artifacts, because more information is lost every time. It’s the digital equivalent of repeatedly making photocopies of photocopies in the old days. The image quality only gets worse.
Indeed, a useful criterion for gauging a large-language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model. If the output of ChatGPT isn’t good enough for GPT-4, we might take that as an indicator that it’s not good enough for us, either.
I tend to think that most fears about A.I. are best understood as fears about capitalism. And I think that this is actually true of most fears of technology, too. Most of our fears or anxieties about technology are best understood as fears or anxiety about how capitalism will use technology against us. And technology and capitalism have been so closely intertwined that it’s hard to distinguish the two.
Let’s think about it this way. How much would we fear any technology, whether A.I. or some other technology, how much would you fear it if we lived in a world that was a lot like Denmark or if the entire world was run sort of on the principles of one of the Scandinavian countries? There’s universal health care. Everyone has child care, free college maybe. And maybe there’s some version of universal basic income there.
Now if the entire world operates according to — is run on those principles, how much do you worry about a new technology then? I think much, much less than we do now.
Just about everything on the web is on TikTok, and going viral there too, so it shouldn’t be a surprise that people who’ve been laid off are there too, trying to figure out what it all means.
Part of me is cynical about this. You mean that as people, we’re so poorly defined without our jobs that our only resource is to grind out some content about it? But on the other side of the coin, making content is what human beings do. Other animals use tools, but do they make content? Apart from some birds, probably not.
My favorite TikTok layoff video is by Atif Memon, a cloud engineer who offers a clear-eyed appraisal of her situation:
“At the company offsite, we celebrated our company tripling its revenue in a year. A month later, we are so poor! Who robbed us?”
“Even if ChatGPT can take away our jobs, they’ll have to get in line behind geopolitics and pandemic and shareholders and investors. I lost my job because the investors of the company were not sure will become 400x in the coming year. ‘How will we go to Mars?’ Someone else lost their job because the investors thought ‘Hmm, if this other company can lay off 12k people and still work as usual, shouldn’t we also try?”
“Artificial intelligence can never overtake human paranoia and human curiosity. AI can only do what human beings have been doing. Only humans can do what no human has done before.”
A lot to chew on in four minutes.
Update: Apparently this is not native to TikTok, but was posted to YouTube by a comedian, Aiyyo Shraddha. It really is a perfect TikTok story! The video is a ripoff.
Google Research has released a new generative AI tool called MusicLM. MusicLM can generate new musical compositions from text prompts, either describing the music to be played (e.g., “The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls”) or more emotional and evocative (“Made early in his career, Matisse’s Dance, 1910, shows a group of red dancers caught in a collective moment of innocent freedom and joy, holding hands as they whirl around in space. Simple and direct, the painting speaks volumes about our deep-rooted, primal human desire for connection, movement, rhythm and music”).
As the last example suggests, since music can be generated from just about any text, anything that can be translated/captioned/captured in text, from poetry to paintings, can be turned into music.
It may seem strange that so many AI tools are coming to fruition in public all at once, but at Ars Technica, investor Haomiao Huang argues that once the basic AI toolkit reached a certain level of sophistication, a confluence of new products taking advantage of those research breakthroughs was inevitable:
To sum up, the breakthrough with generative image models is a combination of two AI advances. First, there’s deep learning’s ability to learn a “language” for representing images via latent representations. Second, models can use the “translation” ability of transformers via a foundation model to shift between the world of text and the world of images (via that latent representation).
This is a powerful technique that goes far beyond images. As long as there’s a way to represent something with a structure that looks a bit like a language, together with the data sets to train on, transformers can learn the rules and then translate between languages. Github’s Copilot has learned to translate between English and various programming languages, and Google’s Alphafold can translate between the language of DNA and protein sequences. Other companies and researchers are working on things like training AIs to generate automations to do simple tasks on a computer, like creating a spreadsheet. Each of these are just ordered sequences.
The other thing that’s different about the new wave of AI advances, Huang says, is that they’re not especially dependent on huge computing power at the edge. So AI is rapidly becoming much more ubiquitous than it’s been… even if MusicLM’s sample set of tunes still crashes my web browser.
Neural Radiance Fields (NeRFs) is a relatively new technique that generates well-lit, complex 3D views from 2D images. If you’ve seen behind-the-scenes looks at how image/motion capture is traditionally done, you know how time-consuming and resource intensive it can be. As this video from Corridor Crew shows, NeRFs changes the image capture game significantly. The ease with which they play around with the technology to produce professional-looking effects in very little time is pretty mind-blowing. (via waxy)
I’m still trying to wrap my mind around it all. There seems to be a correlation between how Alejandro’s work was absorbed and referred to by subsequent filmmakers and how his work was ingested and metabolized by computer programming. But these two things are not the same. I want to say that influence is not the same thing as algorithm. But looking at these images, how can I be sure?
It’s hard to find many shortcomings in the software. It can’t render text. And like many painters and sculptors throughout history, it has trouble getting hands right. I’m nitpicking here. The model contains multitudes. It has scanned the collected works of thousands upon thousands of photographers, painters and cinematographers. It has a deep library of styles and a facility with all kinds of image-making techniques at its digital fingertips. The technology is jaw-dropping. And it concerns me greatly.
Using AI image processing software, Hidreley Diao creates photorealistic portraits of familiar cartoon characters. The one of Moe from The Simpsons is kind of amazing — he’s got the look of a long-time character actor who’s developed so much depth over the years that he starts getting bigger roles and everyone’s like, this guy is actually kind of enigmatic and attractive and fantastic.
In the video above, a bunch of game-playing AI bots are pitted against each other in an attempt to find the best strategy for the game. No word on whether the bots had any fun playing the game.
I mean, I would go to town on some Orb Crumpets. And don’t these sound delicious?!
Original Cool Ranch Cheese and Dried Cranberry Oatmeal — all the wholesome, cheesy oatmeal with a choice of mild, sweet or salty!
Ingredis Fiberwaste Cream Cheese Cheerios — kids grab a box and put them in their mouths, making fun flavors taste even better !!! !!! !!! !!!
Fibrewaste is probably an element in many American grocery items, so kudos for this brave truth in advertising on the part of our robot friend. (via waxy)
With nearly instant reaction times, superhuman button tapping frequency, and an inability to fatigue, an AI called StackRabbit can play Tetris better than any human player. But how much better? Well, it can play all the way to the end of the game, which…did you know Tetris ended? I didn’t. But before that happens, it plays flawlessly through hundreds of levels while the game itself is throwing up weirdo color schemes and scores from random places in its memory — the game’s creators didn’t imagine anyone or anything would get anywhere close to these levels. Also, I got surprisingly anxious watching this — it was just so fast with so much constant peril! (via waxy)
There is a moment at the end of the film’s second act when the artist David Choe, a friend of Bourdain’s, is reading aloud an e-mail Bourdain had sent him: “Dude, this is a crazy thing to ask, but I’m curious” Choe begins reading, and then the voice fades into Bourdain’s own: “…and my life is sort of shit now. You are successful, and I am successful, and I’m wondering: Are you happy?” I asked Neville how on earth he’d found an audio recording of Bourdain reading his own e-mail. Throughout the film, Neville and his team used stitched-together clips of Bourdain’s narration pulled from TV, radio, podcasts, and audiobooks. “But there were three quotes there I wanted his voice for that there were no recordings of,” Neville explained. So he got in touch with a software company, gave it about a dozen hours of recordings, and, he said, “I created an A.I. model of his voice.” In a world of computer simulations and deepfakes, a dead man’s voice speaking his own words of despair is hardly the most dystopian application of the technology. But the seamlessness of the effect is eerie. “If you watch the film, other than that line you mentioned, you probably don’t know what the other lines are that were spoken by the A.I., and you’re not going to know,” Neville said. “We can have a documentary-ethics panel about it later.”
Per this GQ story, Neville got permission from Bourdain’s estate:
We fed more than ten hours of Tony’s voice into an AI model. The bigger the quantity, the better the result. We worked with four companies before settling on the best. We also had to figure out the best tone of Tony’s voice: His speaking voice versus his “narrator” voice, which itself changed dramatically of over the years. The narrator voice got very performative and sing-songy in the No Reservation years. I checked, you know, with his widow and his literary executor, just to make sure people were cool with that. And they were like, Tony would have been cool with that. I wasn’t putting words into his mouth. I was just trying to make them come alive.
As a post hoc ethics panel of one, I’m gonna say this doesn’t appeal to me, but I bet this sort of thing becomes common practice in the years to come, much like Errol Morris’s use of reenactment in The Thin Blue Line. A longer and more nuanced treatment of the issue can be found in Justin Hendrix’s interview of Sam Gregory, who is an “expert on synthetic media and ethics”.
There’s a set of norms that people are grappling with in regard to this statement from the director of the Bourdain documentary. They’re asking questions around consent, right? Who consents to someone taking your voice and using it? In this case, the voiceover of a private email. And what if that was something that, if the person was alive, they might not have wanted. You’ve seen that commentary online, and people saying, “This is the last thing Anthony Bourdain would have wanted for someone to do this with his voice.” So the consent issue is one of the things that is bubbling here. The second is a disclosure issue, which is, when do you know that something’s been manipulated? And again, here in this example, the director is saying, I didn’t tell people that I had created this voice saying the words and I perhaps would have not told people unless it had come up in the interview. So these are bubbling away here, these issues of consent and disclosure.
I have an unusually good memory, especially for symbols, words, and text, but since I don’t use regular expressions (ahem) regularly, they’re one of those parts of computer programming and HTML/EPUB editing that I find myself relearning over and over each time I need it. How did something this arcane but powerful even get started? Naturally, its creators were trying to discover (or model) artificial intelligence.
That’s the crux of this short history of “regex” by Buzz Andersen over at “Why is this interesting?”
The term itself originated with mathematician Stephen Kleene. In 1943, neuroscientist Warren McCulloch and logician Walter Pitts had just described the first mathematical model of an artificial neuron, and Kleene, who specialized in theories of computation, wanted to investigate what networks of these artificial neurons could, well, theoretically compute.
In a 1951 paper for the RAND Corporation, Kleene reasoned about the types of patterns neural networks were able to detect by applying them to very simple toy languages—so-called “regular languages.” For example: given a language whose “grammar” allows only the letters “A” and “B”, is there a neural network that can detect whether an arbitrary string of letters is valid within the “A/B” grammar or not? Kleene developed an algebraic notation for encapsulating these “regular grammars” (for example, a*b* in the case of our “A/B” language), and the regular expression was born.
Kleene’s work was later expanded upon by such luminaries as linguist Noam Chomsky and AI researcher Marvin Minsky, who formally established the relationship between regular expressions, neural networks, and a class of theoretical computing abstraction called “finite state machines.”
This whole line of inquiry soon falls apart, for reasons both structural and interpersonal: Pitts, McCullough, and Jerome Lettvin (another early AI researcher) have a big falling out with Norbert Wiener (of cybernetics fame), Minsky writes a book (Perceptrons) that throws cold water on the whole simple neural network as model of the human mind thing, and Pitts drinks himself to death. Minsky later gets mixed up with Jeffrey Epstein’s philanthropy/sex trafficking ring. The world of early theoretical AI is just weird.
But! Ken Thompson, one of the creators of UNIX at Bell Labs comes along and starts using regexes for text editor searches in 1968. And renewed takes on neural networks come along in the 21st century that give some of that older research new life for machine learning and other algorithms. So, until Skynet/global warming kills us all, it all kind of works out? At least, intellectually speaking.
A new iOS app called Brickit has been developed to breathe new life into your old Lego pile. Just dump your bricks out into a pile and the app will analyze what Lego bricks you have, what new creations you can build with them, and provide you with detailed build instructions. It can even guide you to find individual pieces in the pile. View a short demo — I’m assuming they’re using some sort of AI/machine learning to do this?
My kids have approximately a billion Legos at my house, so I downloaded Brickit to try it out. The process is a little slow and you need to do a little bit of pre-sorting (by taking out the big pieces and spreading your pile out evenly), but watching the app do its thing is kinda magical. When I have more time later, I’m definitely going to go back and try to build some of the ideas it found for me. (via @marcprecipice)
In 1715, a significant chunk of Rembrandt’s masterpiece The Night Watch, including a 2-foot-wide swath from the left side of the painting, was lopped off in order to fit the painting in a smaller space. (WTF?!) Using a contemporary copy of the full scene painted by Gerrit Lundens and an AI program for getting the colors and angles right, the Rijksmuseum has “restored” The Night Watch, augmenting the painting with digital printouts of the missing bits. The uncropped Rembrandt is shown above and here is Lundens’s version:
I’m not an expert on art, but the 1715 crop and the shift of the principal characters from right-of-center to the center appears to have radically altered the whole feel of the painting.
With the addition especially on the left and the bottom, an empty space is created in the painting where they march towards. When the painting was cut [the lieutenants] were in the centre, but Rembrandt intended them to be off-centre marching towards that empty space, and that is the genius that Rembrandt understands: you create movement, a dynamic of the troops marching towards the left of the painting.
Writer Ted Chiang (author of the fantastic Exhalation) was recently a guest on the Ezra Klein Show. The conversation ranged widely — I enjoyed his thoughts on superheroes — but his comments on capitalism and technology seem particularly relevant right now. From the transcript:
I tend to think that most fears about A.I. are best understood as fears about capitalism. And I think that this is actually true of most fears of technology, too. Most of our fears or anxieties about technology are best understood as fears or anxiety about how capitalism will use technology against us. And technology and capitalism have been so closely intertwined that it’s hard to distinguish the two.
Let’s think about it this way. How much would we fear any technology, whether A.I. or some other technology, how much would you fear it if we lived in a world that was a lot like Denmark or if the entire world was run sort of on the principles of one of the Scandinavian countries? There’s universal health care. Everyone has child care, free college maybe. And maybe there’s some version of universal basic income there.
Now if the entire world operates according to — is run on those principles, how much do you worry about a new technology then? I think much, much less than we do now. Most of the things that we worry about under the mode of capitalism that the U.S practices, that is going to put people out of work, that is going to make people’s lives harder, because corporations will see it as a way to increase their profits and reduce their costs. It’s not intrinsic to that technology. It’s not that technology fundamentally is about putting people out of work.
It’s capitalism that wants to reduce costs and reduce costs by laying people off. It’s not that like all technology suddenly becomes benign in this world. But it’s like, in a world where we have really strong social safety nets, then you could maybe actually evaluate sort of the pros and cons of technology as a technology, as opposed to seeing it through how capitalism is going to use it against us. How are giant corporations going to use this to increase their profits at our expense?
And so, I feel like that is kind of the unexamined assumption in a lot of discussions about the inevitability of technological change and technologically-induced unemployment. Those are fundamentally about capitalism and the fact that we are sort of unable to question capitalism. We take it as an assumption that it will always exist and that we will never escape it. And that’s sort of the background radiation that we are all having to live with. But yeah, I’d like us to be able to separate an evaluation of the merits and drawbacks of technology from the framework of capitalism.
Echoing some of his other thoughts during the podcast, Chiang also wrote a piece for the New Yorker the other day about how the singularity will probably never come.
In the latest episode of the Vox series Glad You Asked, host Joss Fong looks at how racial and other kinds of bias are introduced into massive computer systems and algorithms, particularly those that work through machine learning, that we use every day.
Many of us assume that tech is neutral, and we have turned to tech as a way to root out racism, sexism, or other “isms” plaguing human decision-making. But as data-driven systems become a bigger and bigger part of our lives, we also notice more and more when they fail, and, more importantly, that they don’t fail on everyone equally. Glad You Asked host Joss Fong wants to know: Why do we think tech is neutral? How do algorithms become biased? And how can we fix these algorithms before they cause harm?
How is Tesla’s full-self driving system coming along? Perhaps not so good. YouTuber AI Addict took the company’s FSD Beta 8.2 for a drive through downtown Oakland recently and encountered all sorts of difficulties. The video’s chapter names should give you some idea: Crosses Solid Lines, Acting Drunk, Right Turn In Wrong Lane, Wrong Way!!!, Near Collision (1), and Near Collision (2). They did videos of drives in SF and San Jose as well.
I realize this is a beta, but it’s a beta being tested by consumers on actual public roads. While I’m sure it works great on their immaculate test track, when irregularities in your beta can easily result in the death or grave injury of a pedestrian, cyclist, or other motorist several times over the course of 30 minutes, how can you consider it safe to release to the public in any way? It seems like Level 5 autonomy is going to be difficult to manage under certain road conditions. (via @TaylorOgan)