It is very easy to get ChatGPT to emit a series of words such as “I am happy to see you.” There are many things we don’t understand about how large language models work, but one thing we can be sure of is that ChatGPT is not happy to see you. A dog can communicate that it is happy to see you, and so can a prelinguistic child, even though both lack the capability to use words. ChatGPT feels nothing and desires nothing, and this lack of intention is why ChatGPT is not actually using language. What makes the words “I’m happy to see you” a linguistic utterance is not that the sequence of text tokens that it is made up of are well formed; what makes it a linguistic utterance is the intention to communicate something.
Humans are the first and, to our knowledge, only entities on Earth to develop general intelligence, which has allowed us to dominate and alter the planet in a way and at a speed that no other entity has managed. Now, some people are working towards building an artificial general intelligence. So what happens when humans are matched or even far outclassed by this new general intelligence?
Such an intelligence explosion might lead to a true superintelligent entity. We don’t know what such a being would look like, what its motives or goals would be, what would go on in its inner world. We could be as laughably stupid to a superintelligence as squirrels are to us. Unable to even comprehend its way of thinking.
This hypothetical scenario keeps many people up at night. Humanity is the only example we have of an animal becoming smarter than all others β and we have not been kind to what we perceive as less intelligent beings. AGI might be the last invention of humanity.
YouTuber Nerrel takes James Cameron to task for releasing 4K remasters of Aliens and True Lies that have been, well, ruined by using AI to clean them up.
The best 4k releases tend to follow a pretty simple template: clean and scan the negative, repair any obvious signs of damage, and restore the colors to match the original grading, with as little meddling beyond that as possible. The process should not be about modernizing the style or forcing film to look like digital video. 35mm film was capable of incredible picture quality, and 4k is the first home format capable of delivering most of that detail β that should be enough. A well done 4k is like having a pristine copy of the original negative to watch in your own home, with the full data from that celluloid β grain and detail alike β digitally preserved forever. And that’s the problem with deep learning algorithms β they can’t preserve details. They make their best guess about what an object is supposed to be, then pull new details out of their digital assholes and smear them across the screen.
If Hollywood and one of its best directors don’t care enough about their movies to do them right, how are they supposed to convince us to care about their movies?
Anyway, here’s Patel on the limitations of AI and where humans shine:
But these models in their most reductive essence are just statistical representations of the past. They are not great at new ideas.
And I think that the power of human beings sort of having new ideas all the time, that’s the thing that the platforms won’t be able to find. That’s why the platforms feel old. Social platforms like enter a decay state where everyone’s making the same thing all the time. It’s because we’ve optimized for the distribution, and people get bored and that boredom actually drives much more of the culture than anyone will give that credit to, especially an A.I. developer who can only look backwards.
Later he talks more specifically about why curation will grow more important in a world inundated with aggressively mid AI content:
And the idea is, in my mind at least, that those people who curate the internet, who have a point of view, who have a beginning and middle, and an end to the story they’re trying to tell all the time about the culture we’re in or the politics we’re in or whatever. They will actually become the centers of attention and you cannot replace that with A.I. You cannot replace that curatorial function or that guiding function that we’ve always looked to other individuals to do.
And those are real relationships. I think those people can stand in for institutions and brands. I think the New York Times, you’re Ezra Klein, a New York Times journalist means something. It appends some value to your name, but the institution has to protect that value. I think that stuff is still really powerful, and I think as the flood of A.I. comes to our distribution networks, the value of having a powerful individual who curates things for people, combined with a powerful institution who protects their integrity actually will go up. I don’t think that’s going to go down.
Yeah, exactly. Individuals and groups of like-minded people making things for other people β that stuff is only going to grow more valuable as time goes on. The breadth and volume offered by contemporary AI cannot provide this necessary function right now (and IMO, for the foreseeable future).
And finally, I wanted to share this exchange:
EZRA KLEIN: You said something on your show that I thought was one of the wisest, single things I’ve heard on the whole last decade and a half of media, which is that places were building traffic thinking they were building an audience. And the traffic, at least in that era, was easy, but an audience is really hard. Talk a bit about that.
NILAY PATEL: Yeah first of all, I need to give credit to Casey Newton for that line. That is something β at The Verge, we used to say that to ourselves all the time just to keep ourselves from the temptations of getting cheap traffic. I think most media companies built relationships with the platforms, not with the people that were consuming their content.
I never focused on traffic all that much, mainly because for a small site like kottke.org, there wasn’t a whole lot I could do, vis-Γ -vis Google or Facebook, to move the needle that much. But as I’ve written many times, switching to a reader-supported model in 2016 with the membership program has just worked so well for the site because it allows me to focus on making something for those readers β that’s you! β and not for platforms or algorithms or advertisers. I don’t have to “pivot to video”; instead I can do stuff like comments and [new thing coming “soon”] that directly benefit and engage readers, which has been really rewarding.
Perhaps the platform era caused us to lose track of what a Web site was for. The good ones are places you might turn to several times per day or per week for a select batch of content that pointedly is not everything. Going there regularly is a signal of intention and loyalty: instead of passively waiting for social feeds to serve you what to read, you can seek out reading materials-or videos or audio-from sources you trust. If Twitter was once a sprawling Home Depot of content, going to specific sites is more like shopping from a series of specialized boutiques.
I’m going to get slightly petty here for a sec and say that these “back to the blog / back to the web” pieces almost always ignore the sites that never gave up the faith in favor of “media” folks inspired by the former. It’s nice to see the piece end with a mention of Arts & Letters Daily, still bloggily chugging along since 1998. /salty
Data artist Robert Hodgin recently created a feedback loop between Midjourney and ChatGPT-4 β he prompted MJ to create an image of an old man in a messy room wearing a VR headset, asked ChatGPT to describe the image, then fed that description back into MJ to generate another image, and did that 10 times. Here was the first image:
And here’s one of the last images:
Recursive art like this has a long history β see Alvin Lucier’s I Am Sitting in a Room from 1969 β but Hodgin’s project also hints at the challenges facing AI companies seeking to keep their training data free of material created by AI. Ted Chiang has encouraged us to “think of ChatGPT as a blurry jpeg of all the text on the Web”:
It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry jpeg, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.
For the last few weeks, I’ve been listening to the audiobook of Brian Merchant’s history of the Luddite movement, Blood in the Machine: The Origins of the Rebellion Against Big Tech. In it, Merchant argues the Luddites were at their core a labor movement against capitalism and compares them to contemporary movements against big tech and media companies. Merchant writes in the Atlantic:
The first Luddites were artisans and cloth workers in England who, at the onset of the Industrial Revolution, protested the way factory owners used machinery to undercut their status and wages. Contrary to popular belief, they did not dislike technology; most were skilled technicians.
At the time, some entrepreneurs had started to deploy automated machines that unskilled workers β many of them children β could use to churn out cheap, low-quality goods. And while the price of garments fell and the industrial economy boomed, hundreds of thousands of working people fell into poverty. When petitioning Parliament and appealing to the industrialists for minimum wages and basic protections failed, many organized under the banner of a Robin Hood-like figure, Ned Ludd, and took up hammers to smash the industrialists’ machines. They became the Luddites.
He goes on to compare their actions to tech publication writers’ strikes, the SAG-AFTRA & WGA strikes, the Authors Guild lawsuit against AI companies, and a group of masked activists “coning” self-driving cars. All this reminds me of Ted Chiang’s quote about AI:
I tend to think that most fears about A.I. are best understood as fears about capitalism. And I think that this is actually true of most fears of technology, too. Most of our fears or anxieties about technology are best understood as fears or anxiety about how capitalism will use technology against us. And technology and capitalism have been so closely intertwined that it’s hard to distinguish the two.
Their codas could be orders of magnitude more ancient than Sanskrit. We don’t know how much meaning they convey, but we do know that they’ll be very difficult to decode. Project CETI’s scientists will need to observe the whales for years and achieve fundamental breakthroughs in AI. But if they’re successful, humans could be able to initiate a conversation with whales.
This would be a first-contact scenario involving two species that have lived side by side for ages. I wanted to imagine how it could unfold. I reached out to marine biologists, field scientists who specialize in whales, paleontologists, professors of animal-rights law, linguists, and philosophers. Assume that Project CETI works, I told them. Assume that we are able to communicate something of substance to the sperm whale civilization. What should we say?
One of the worries about whale/human communication is the potential harm a conversation might cause.
Cesar Rodriguez-Garavito, a law professor at NYU who is advising Project CETI, told me that whatever we say, we must avoid harming the whales, and that we shouldn’t be too confident about our ability to predict the harms that a conversation could cause.
The sperm whales may not want to talk. They, like us, can be standoffish even toward members of their own species-and we are much more distant relations. Epochs have passed since our last common ancestor roamed the Earth. In the interim, we have pursued radically different, even alien, lifeways.
A team of three students were able to virtually “unroll” a 2000-year-old papyrus scroll that was carbonized during the eruption of Mount Vesuvius in Herculaneum, thereby winning the grand prize in the Vesuvius Challenge. These scrolls (there are hundreds of them) are little more than “lumps of carbonized ash”; this Wikipedia entry helpfully summarizes their fate:
Due to the eruption of Mount Vesuvius in 79 AD, bundles of scrolls were carbonized by the intense heat of the pyroclastic flows. This intense parching took place over an extremely short period of time, in a room deprived of oxygen, resulting in the scrolls’ carbonization into compact and highly fragile blocks. They were then preserved by the layers of cement-like rock.
Using high-resolution CT scans of the scrolls, machine learning, and computer vision techniques, the team was able to read the text inside one of the scrolls without actually unrolling it. I am stunned by how much text they were able to recover from these blackened documents β take a look at this image:
There was one submission that stood out clearly from the rest. Working independently, each member of our team of papyrologists recovered more text from this submission than any other. Remarkably, the entry achieved the criteria we set when announcing the Vesuvius Challenge in March: 4 passages of 140 characters each, with at least 85% of characters recoverable. This was not a given: most of us on the organizing team assigned a less than 30% probability of success when we announced these criteria! And in addition, the submission includes another 11 (!) columns of text - more than 2000 characters total.
If you’re interested, it’s fascinating to read through the whole thing to see just how little they were working with compared to how much they were able to recover. And the best part is, all the contest submissions are open source, so researchers will be able to build each other’s successes. (via waxy.org)
OpenAI unveiled their prototype video generator called Sora. It does text-to-video and a ton more. Just check out the videos here and here β I literally cannot believe what I’m seeing.
In June 2021 (pre The Bear), New Yorker cartoonist Zoe Si coached Ayo Edebiri through the process of drawing a New Yorker cartoon. The catch: neither of them could see the other’s work in progress. Super entertaining.
I don’t know about you, but Si’s initial description of the cartoon reminded me of an LLM prompt:
So the cartoon is two people in their apartment. One person has dug a hole in the floor, and he is standing in the hole and his head’s poking out. And the other person is kneeling on the floor beside the hole, kind of like looking at him in a concerned manner. There’ll be like a couch in the background just to signify that they’re in a house.
Just for funsies, I asked ChatGPT to generate a New Yorker-style cartoon using that prompt. Here’s what it came up with:
Oh boy. And then I asked it for a funny caption and it hit me with: “I said I wanted more ‘open space’ in the living room, not an ‘open pit’!” Oof. ChatGPT, don’t quit your day job!
Over the weekend, I listened to this podcast conversation between the psychologist & philosopher Alison Gopnik and writer Ted Chiang about using children’s learning as a model for developing AI systems. Around the 23-minute mark, Gopnik observes that care relationships (child care, elder care, etc.) are extremely important to people but is nearly invisible in economics. And then Chiang replies:
One of the ways that conventional economics sort of ignores care is that for every employee that you hire, there was an incredible amount of labor that went into that employee. That’s a person! And how do you make a person? Well, for one thing, you need several hundred thousand hours of effort to make a person. And every employee that any company hires is the product of hundreds of thousands of hours of effort. Which, companies… they don’t have to pay for that!
They are reaping the benefits of an incredible amount of labor. And if you imagine, in some weird kind of theoretical sense, if you had to actually pay for the raising of everyone that you would eventually employ, what would that look like?
It’s an interesting conversation throughout β recommended!
Labyrinth and its many variants generally consist of a box topped with a flat wooden plane that tilts across an x and y axis using external control knobs. Atop the board is a maze featuring numerous gaps. The goal is to move a marble or a metal ball from start to finish without it falling into one of those holes. It can be a… frustrating game, to say the least. But with ample practice and patience, players can generally learn to steady their controls enough to steer their marble through to safety in a relatively short timespan.
CyberRunner, in contrast, reportedly mastered the dexterity required to complete the game in barely 5 hours. Not only that, but researchers claim it can now complete the maze in just under 14.5 seconds β over 6 percent faster than the existing human record.
CyberRunner was capable of solving the maze even faster, but researchers had to stop it from taking shortcuts it found in the maze. (via clive thompson)
These didn’t track as AI-generated at first…and then I tried to read the text β THE STANFORD PRESERIBENT. You can see the whole set on Bluesky (if you have access).
The four members of the Beatles, assisted by machine learning technology, come together one last time to record a song together, working off of a demo tape recorded by John Lennon in the 70s.
The long mythologised John Lennon demo was first worked on in February 1995 by Paul, George and Ringo as part of The Beatles Anthology project but it remained unfinished, partly because of the impossible technological challenges involved in working with the vocal John had recorded on tape in the 1970s. For years it looked like the song could never be completed.
But in 2022 there was a stroke of serendipity. A software system developed by Peter Jackson and his team, used throughout the production of the documentary series Get Back, finally opened the way for the uncoupling of John’s vocal from his piano part. As a result, the original recording could be brought to life and worked on anew with contributions from all four Beatles.
Ok, this is a little bit bonkers: HeyGen’s Video Translate tool will convert videos of people speaking into videos of them speaking one of several different languages (incl. English, Spanish, Hindi, and French) with matching mouth movements. Check out their brief demo of Marques Brownlee speaking Spanish & Tim Cook speaking Hindi or this video of a YouTuber trying it out:
The results are definitely in the category of “indistinguishable from magic”.
Photographs have always been an imperfect reproduction of real life β see the story of Dorothea Lange’s Migrant Mother or Ansel Adams’ extensive dark room work β but the seemingly boundless alterations offered by current & future AI editing tools will allow almost anyone to turn their photos (or should I say “photos”) into whatever they wish. In this video, Evan Puschak briefly explores what AI-altered photos might do to our memories.
I was surprised he didn’t mention the theory that when a past experience is remembered, that memory is altered in the human brain β that is, “very act of remembering can change our memories”. I think I first heard about this on Radiolab more than 16 years ago. So maybe looking at photos extensively altered by AI could extensively alter those same memories in our brains, actually making us unable to recall anything even remotely close to what “really” happened. Fun!
But also, one could imagine this as a powerful way to treat PTSD, etc. Or to brainwash someone! Or an entire populace… Here’s Hannah Arendt on constantly being lied to:
If everybody always lies to you, the consequence is not that you believe the lies, but rather that nobody believes anything any longer. This is because lies, by their very nature, have to be changed, and a lying government has constantly to rewrite its own history. On the receiving end you get not only one lie β a lie which you could go on for the rest of your days β but you get a great number of lies, depending on how the political wind blows. And a people that no longer can believe anything cannot make up its mind. It is deprived not only of its capacity to act but also of its capacity to think and to judge. And with such a people you can then do what you please.
This is the incredible and interesting and dangerous thing about the combination of our current technology, the internet, and mass media: “a lying government” is no longer necessary β we’re doing it to ourselves and anyone with sufficient motivation will be able to take advantage of people without the capacity to think and judge.
P.S. I lol’d too hard at his deadpan description of “the late Thanos”. RIP, big fella.
Artist and filmmaker Paul Trillo made Thank You For Not Answering, an artful experimental short film, using a suite of AI tools. The end credits of the film read:
Trillo demonstrated the process to me during a Zoom call; in seconds, it was possible to render, for example, a tracking shot of a woman crying alone in a softly lit restaurant. His prompt included a hash of S.E.O.-esque terms meant to goad the machine into creating a particularly cinematic aesthetic: “Moody lighting, iconic, visually stunning, immersive, impactful.” Trillo was enthralled by the process: “The speed in which I could operate was unlike anything I had experienced.” He continued, “It felt like being able to fly in a dream.” The A.I. tool was “co-directing” alongside him: “It’s making a lot of decisions I didn’t.”
I know, I know. Too much Wes Anderson. Too much AI. But there is something in my brain, a chemical imbalance perhaps, and I can’t help but find this reimagining of the Lord of the Rings in Anderson’s signature style funny and charming. Sorry but not sorry.
Expanding on his previous thoughts on the relationship between AI and capitalism β “I tend to think that most fears about A.I. are best understood as fears about capitalism” β Ted Chiang offers a useful metaphor for how to think about AI: as a management-consulting firm like McKinsey.
So, I would like to propose another metaphor for the risks of artificial intelligence. I suggest that we think about A.I. as a management-consulting firm, along the lines of McKinsey & Company. Firms like McKinsey are hired for a wide variety of reasons, and A.I. systems are used for many reasons, too. But the similarities between McKinsey β a consulting firm that works with ninety per cent of the Fortune 100 β and A.I. are also clear. Social-media companies use machine learning to keep users glued to their feeds. In a similar way, Purdue Pharma used McKinsey to figure out how to “turbocharge” sales of OxyContin during the opioid epidemic. Just as A.I. promises to offer managers a cheap replacement for human workers, so McKinsey and similar firms helped normalize the practice of mass layoffs as a way of increasing stock prices and executive compensation, contributing to the destruction of the middle class in America.
A former McKinsey employee has described the company as “capital’s willing executioners”: if you want something done but don’t want to get your hands dirty, McKinsey will do it for you. That escape from accountability is one of the most valuable services that management consultancies provide. Bosses have certain goals, but don’t want to be blamed for doing what’s necessary to achieve those goals; by hiring consultants, management can say that they were just following independent, expert advice. Even in its current rudimentary form, A.I. has become a way for a company to evade responsibility by saying that it’s just doing what “the algorithm” says, even though it was the company that commissioned the algorithm in the first place.
Good stuff β I especially enjoyed the mini You’re Wrong About on the Luddites β do read the whole thing.
No matter which side you come down on in the debate about using AI tools like Stable Diffusion and Midjourney to create digital art, this video of an experienced digital artist explaining how he uses AI in his workflow is worth a watch. I thought this comment was particularly interesting:
I see the overall process as a joint effort with the AI. I’ve been a traditional artist for 2 decades, painting on canvas. And in the last five years I’ve been doing a lot of digital art. So from that part of myself, I don’t feel threatened at all.
I feel this is an opportunity. An opportunity for many new talented people to jump on a new branch of art that is completely different from the one that we have already in digital art and just open up new way of being creative.
I’m not going to make a habit of posting AI generated video and photography here (mainly because most of it is not that interesting) but Pepperoni Hug Spot is just too perfect a name for a pizza place to pass up. And it’s got Too Many Cooks vibes.
Well this is some bizarre good fun β turns out that the campy goofiness of Star Wars and the campy seriousness of high fashion make for a pretty good combination.
[Yesterday I spent all day answering reader questions for the inaugural Kottke.org Ask Me Anything. One of them asked my opinion of the current crop of AI tools and I thought it was worth reprinting the whole thing here. -j]
Q: I would love to know your thoughts on AI, and specifically the ones that threaten us writers. I know you’ve touched on it in the past, but it seems like ChatGPT and the like really exploded while you were on sabbatical. Like, you left and the world was one way, and when you returned, it was very different. βGregor
A: I got several questions about AI and I haven’t written anything about my experience with it on the site, so here we go. Let’s start with two facts:
ChatGPT moved me to tears.
I built this AMA site with the assistance of ChatGPT. (Or was it the other way around?)
Ok, the first thing. Last month, my son skied at a competition out in Montana. He’d (somewhat inexplicably) struggled earlier in the season at comps, which was tough for him to go through and for us as parents to watch. How much do we let him figure out on his own vs. how much support/guidance do we give him? This Montana comp was his last chance to get out there and show his skills. I was here in VT, so I texted him my usual “Good luck! Stomp it!” message the morning of the comp. But I happened to be futzing around with ChatGPT at the time (the GPT-3.5 model) and thought, you know, let’s punch this up a little bit. So I asked ChatGPT to write a good luck poem for a skier competing at a freeski competition at Big Sky.
In response, it wrote a perfectly serviceable 12-line poem with three couplets that was on topic, made narrative sense, and rhymed. And when I read the last line, I burst into tears. So does that make ChatGPT a soulful poet of rare ability? No. I’ve thought a lot about this and here’s what I think is going on: I was primed for an emotional response (because my son was struggling with something really important to him, because I was feeling anxious for him, because he was doing something potentially dangerous, because I haven’t seen him too much this winter) and ChatGPT used the language and methods of thousands of years of writing to deliver something a) about someone I love, and b) in the form of a poem (which is often an emotionally charged form) β both of which I had explicitly asked for. When you’re really in your feelings, even the worst movie or the cheesiest song can resonate with you and move you β just the tiniest bit of narrative and sentiment can send you over the edge. ChatGPT didn’t really make me cry…I did.
But still. Even so. It felt a little magical when it happened.
Now for the second part. I would say ChatGPT (mostly the new GPT-4 model), with a lot of hand-holding and cajoling from me, wrote 60-70% of the code (PHP, Javascript, CSS, SQL) for this AMA site. And we easily did it in a third of the time it would have taken me by myself, without having to look something up on Stack Overflow every four minutes or endlessly consulting CSS and PHP reference guides or tediously writing tests, etc. etc. etc. In fact, I never would have even embarked on building this little site-let had ChatGPT not existed…I would have done something much simpler and more manual instead. And it was a *blast*. I had so much fun and learned so much along the way.
I’ve also been using ChatGPT for some other programming projects β we whipped the Quick Links into better shape (it can write Movable Type templating code…really!) and set up direct posting of the site’s links to Facebook via the API rather than through Zapier (saving me $20/mo in the process). It has really turbo-charged my ability to get shit done around here and has me thinking about all sorts of possibilities.
I keep using the word “we” here because coding with ChatGPT β and this is where it starts to feel weird in an uncanny valley sort of way β feels like a genuine creative collaboration. It feels like there is a “someone” on the other side of that chat, a something that’s really capable but also needs a lot of hand-holding. Just. Like. Me. There’s a back and forth. We both screw up and take turns correcting each other’s mistakes. I ask it please and tell it thank you. ChatGPT lies to me; I gently and non-judgmentally guide it in a more constructive direction (as you would with a toddler). It is the fucking craziest weirdest thing and I don’t really know how to think about it.
There have only been a few occasions in my life when I’ve used or seen some new technology that felt like magic. The first time I wrote & ran a simple BASIC program on a computer. The first time I used the web. The first time using a laptop with wifi. The first time using an iPhone. Programming with ChatGPT over the past few weeks has felt like magic in the same way. While working on these projects with ChatGPT, I can’t wait to get out of bed in the morning to pick up where we left off last night (likely too late last night), a feeling I honestly have not consistently felt about work in a long time. I feel giddy. I feel POWERFUL.
That powerful feeling makes me uneasy. We shouldn’t feel so suddenly powerful without pausing to interrogate where that power comes from, who ultimately wields it, and who it will benefit and harm. The issues around these tools are complex & far-reaching and I’m still struggling to figure out what to think about it all. I’m persuaded by arguments that these tools offer an almost unprecedented opportunity for “helping humans be creative and express themselves” and that machine/human collaboration can deepen our understanding and appreciation of the world around us (as has happened with chess and go). I’m also persuaded by Ted Chiang’s assertion that our fears of AI are actually about capitalism β and we’ve got a lot to fear from capitalism when it comes to these tools, particularly given the present dysfunction of US politics. There is just so much potential power here and many people out there don’t feel uneasy about wielding it β and they will do what they want without regard for the rest of us. That’s pretty scary.
Powerful, weird, scary, uncanny, giddy β how the hell do we collectively navigate all that?
(Note: ChatGPT didn’t write any of this, nor has it written anything else on kottke.org. I used it once while writing a post a few weeks ago, basically as a smart thesaurus to suggest adjectives related to a topic. I’ll let you know if/when that changes β I expect it will not for quite some time, if ever. Even in the age of Ikea, there’s still plenty of handcrafted furniture makers around and in the same way, I suspect the future availability of cheap good-enough AI writing/curation will likely increase the demand and value for human-produced goods.)
In a piece about how the pace of improvement in the current crop of AI products is vastly outstripping the ability of society to react/respond to it, Ezra Klein uses this cracker of a phrase/concept: “the difficulty of living in exponential time”.
I find myself thinking back to the early days of Covid. There were weeks when it was clear that lockdowns were coming, that the world was tilting into crisis, and yet normalcy reigned, and you sounded like a loon telling your family to stock up on toilet paper. There was the difficulty of living in exponential time, the impossible task of speeding policy and social change to match the rate of viral replication. I suspect that some of the political and social damage we still carry from the pandemic reflects that impossible acceleration. There is a natural pace to human deliberation. A lot breaks when we are denied the luxury of time.
But that is the kind of moment I believe we are in now. We do not have the luxury of moving this slowly in response, at least not if the technology is going to move this fast.
Covid, AI, and even climate change (e.g. the effects we are seeing after 250 years of escalating carbon emissions)…they are all moving too fast for society to make complete sense of them. And it’s causing problems and creating opportunities for schemers, connivers, and confidence tricksters to wreck havoc.
In this final installment of Everything is a Remix, Kirby Ferguson offers his perspective on image generation with AI, how it compares to human creativity, and what its role will be in the future. In watching the part about the anxiety in the creative community about these image generators, I was reminded of what Ted Chiang has said about fears of technology actually being fears of capitalism.
It’s capitalism that wants to reduce costs and reduce costs by laying people off. It’s not that like all technology suddenly becomes benign in this world. But it’s like, in a world where we have really strong social safety nets, then you could maybe actually evaluate sort of the pros and cons of technology as a technology, as opposed to seeing it through how capitalism is going to use it against us.
I agree with Ferguson that these AI image generators are, outside the capitalist context, useful and good for helping humans be creative and express themselves. Tools like Midjourney, DALL-E, and Stable Diffusion allow anyone to collaborate with every previous human artist that has ever existed, all at once. Like, just think about how powerful this is: normal people who have ideas but lack technical skills can now create imagery. Is it art? Perhaps not in most cases, but some of it will be. If the goal is to get more people to be able to more easily express and exercise their creativity, these image generators fulfill that in a big way. But that’s really scary β power always is.
In 2020, before the current crop of large language models (LLM) like ChatGPT and Bing, Emily Bender and Alexander Koller wrote a paper on their limitations called Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In the paper, Bender and Koller describe an “octopus test” as a way of thinking about what LLMs are capable of and what they aren’t. A recent profile of Bender by Elizabeth Weil for New York magazine (which is worth reading in its entirety) summarizes the octopus test thusly:
Say that A and B, both fluent speakers of English, are independently stranded on two uninhabited islands. They soon discover that previous visitors to these islands have left behind telegraphs and that they can communicate with each other via an underwater cable. A and B start happily typing messages to each other.
Meanwhile, O, a hyperintelligent deep-sea octopus who is unable to visit or observe the two islands, discovers a way to tap into the underwater cable and listen in on A and B’s conversations. O knows nothing about English initially but is very good at detecting statistical patterns. Over time, O learns to predict with great accuracy how B will respond to each of A’s utterances.
Soon, the octopus enters the conversation and starts impersonating B and replying to A. This ruse works for a while, and A believes that O communicates as both she and B do β with meaning and intent. Then one day A calls out: “I’m being attacked by an angry bear. Help me figure out how to defend myself. I’ve got some sticks.” The octopus, impersonating B, fails to help. How could it succeed? The octopus has no referents, no idea what bears or sticks are. No way to give relevant instructions, like to go grab some coconuts and rope and build a catapult. A is in trouble and feels duped. The octopus is exposed as a fraud.
The paper’s official title is “Climbing Towards NLU: On Meaning, Form, and Understanding in the Age of Data.” NLU stands for “natural-language understanding.” How should we interpret the natural-sounding (i.e., humanlike) words that come out of LLMs? The models are built on statistics. They work by looking for patterns in huge troves of text and then using those patterns to guess what the next word in a string of words should be. They’re great at mimicry and bad at facts. Why? LLMs, like the octopus, have no access to real-world, embodied referents. This makes LLMs beguiling, amoral, and the Platonic ideal of the bullshitter, as philosopher Harry Frankfurt, author of On Bullshit, defined the term. Bullshitters, Frankfurt argued, are worse than liars. They don’t care whether something is true or false. They care only about rhetorical power β if a listener or reader is persuaded.
The point here is to caution against treating these AIs as if they are people. Bing isn’t in love with anyone; it’s just free-associating from an (admittedly huge) part of the internet.
This isn’t an exact analogue, but I have a car that can drive itself under certain circumstances (not Tesla’s FSD) and when I turn self-drive on, it feels like I’m giving control of my car to a very precocious 4-year-old. Most of the time, this incredible child pilots the car really well, better than I can really β it keeps speed, lane positioning, and distance to forward traffic very precisely β so much so that you want to trust it as you would a licensed adult driver. But when it actually has to do something that requires making a tough decision or thinking, it will either give up control or do something stupid or dangerous. You can’t ever forget the self-driver is like a 4-year-old kid mimicking the act of driving and isn’t capable of thinking like a human when it needs to. You forget that and you can die. (This has the odd and (IMO) under-appreciated effect, when self-drive is engaged, of shifting your role from operator of the car to babysitting the operator of the car. Doing a thing and watching something else do a thing so you can take over when they screw up are two very different things and I think that until more people realize that, it’s going to keep causing unnecessary accidents.)
What I’ve described sounds a lot like ChatGPT, or most any other large-language model. Think of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry jpeg, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.
Reframing the technology in that way turns out to be useful in thinking through some of its possibilities and limitations:
There is very little information available about OpenAI’s forthcoming successor to ChatGPT, GPT-4. But I’m going to make a prediction: when assembling the vast amount of text used to train GPT-4, the people at OpenAI will have made every effort to exclude material generated by ChatGPT or any other large-language model. If this turns out to be the case, it will serve as unintentional confirmation that the analogy between large-language models and lossy compression is useful. Repeatedly resaving a jpeg creates more compression artifacts, because more information is lost every time. It’s the digital equivalent of repeatedly making photocopies of photocopies in the old days. The image quality only gets worse.
Indeed, a useful criterion for gauging a large-language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model. If the output of ChatGPT isn’t good enough for GPT-4, we might take that as an indicator that it’s not good enough for us, either.
I tend to think that most fears about A.I. are best understood as fears about capitalism. And I think that this is actually true of most fears of technology, too. Most of our fears or anxieties about technology are best understood as fears or anxiety about how capitalism will use technology against us. And technology and capitalism have been so closely intertwined that it’s hard to distinguish the two.
Let’s think about it this way. How much would we fear any technology, whether A.I. or some other technology, how much would you fear it if we lived in a world that was a lot like Denmark or if the entire world was run sort of on the principles of one of the Scandinavian countries? There’s universal health care. Everyone has child care, free college maybe. And maybe there’s some version of universal basic income there.
Now if the entire world operates according to β is run on those principles, how much do you worry about a new technology then? I think much, much less than we do now.
Just about everything on the web is on TikTok, and going viral there too, so it shouldn’t be a surprise that people who’ve been laid off are there too, trying to figure out what it all means.
Part of me is cynical about this. You mean that as people, we’re so poorly defined without our jobs that our only resource is to grind out some content about it? But on the other side of the coin, making content is what human beings do. Other animals use tools, but do they make content? Apart from some birds, probably not.
My favorite TikTok layoff video is by Atif Memon, a cloud engineer who offers a clear-eyed appraisal of her situation:
“At the company offsite, we celebrated our company tripling its revenue in a year. A month later, we are so poor! Who robbed us?”
“Even if ChatGPT can take away our jobs, they’ll have to get in line behind geopolitics and pandemic and shareholders and investors. I lost my job because the investors of the company were not sure will become 400x in the coming year. ‘How will we go to Mars?’ Someone else lost their job because the investors thought ‘Hmm, if this other company can lay off 12k people and still work as usual, shouldn’t we also try?”
“Artificial intelligence can never overtake human paranoia and human curiosity. AI can only do what human beings have been doing. Only humans can do what no human has done before.”
A lot to chew on in four minutes.
Update: Apparently this is not native to TikTok, but was posted to YouTube by a comedian, Aiyyo Shraddha. It really is a perfect TikTok story! The video is a ripoff.
Google Research has released a new generative AI tool called MusicLM. MusicLM can generate new musical compositions from text prompts, either describing the music to be played (e.g., “The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls”) or more emotional and evocative (“Made early in his career, Matisse’s Dance, 1910, shows a group of red dancers caught in a collective moment of innocent freedom and joy, holding hands as they whirl around in space. Simple and direct, the painting speaks volumes about our deep-rooted, primal human desire for connection, movement, rhythm and music”).
As the last example suggests, since music can be generated from just about any text, anything that can be translated/captioned/captured in text, from poetry to paintings, can be turned into music.
It may seem strange that so many AI tools are coming to fruition in public all at once, but at Ars Technica, investor Haomiao Huang argues that once the basic AI toolkit reached a certain level of sophistication, a confluence of new products taking advantage of those research breakthroughs was inevitable:
To sum up, the breakthrough with generative image models is a combination of two AI advances. First, there’s deep learning’s ability to learn a “language” for representing images via latent representations. Second, models can use the “translation” ability of transformers via a foundation model to shift between the world of text and the world of images (via that latent representation).
This is a powerful technique that goes far beyond images. As long as there’s a way to represent something with a structure that looks a bit like a language, together with the data sets to train on, transformers can learn the rules and then translate between languages. Github’s Copilot has learned to translate between English and various programming languages, and Google’s Alphafold can translate between the language of DNA and protein sequences. Other companies and researchers are working on things like training AIs to generate automations to do simple tasks on a computer, like creating a spreadsheet. Each of these are just ordered sequences.
The other thing that’s different about the new wave of AI advances, Huang says, is that they’re not especially dependent on huge computing power at the edge. So AI is rapidly becoming much more ubiquitous than it’s been… even if MusicLM’s sample set of tunes still crashes my web browser.
Neural Radiance Fields (NeRFs) is a relatively new technique that generates well-lit, complex 3D views from 2D images. If you’ve seen behind-the-scenes looks at how image/motion capture is traditionally done, you know how time-consuming and resource intensive it can be. As this video from Corridor Crew shows, NeRFs changes the image capture game significantly. The ease with which they play around with the technology to produce professional-looking effects in very little time is pretty mind-blowing. (via waxy)
Stay Connected