Advertise here with Carbon Ads

This site is made possible by member support. 💞

Big thanks to Arcustech for hosting the site and offering amazing tech support.

When you buy through links on kottke.org, I may earn an affiliate commission. Thanks for supporting the site!

kottke.org. home of fine hypertext products since 1998.

Beloved by 86.47% of the web.

🍔  💀  📸  😭  🕳️  🤠  🎬  🥔

kottke.org posts about Anthropic

Claude’s Constitution

A couple of weeks ago, AI company Anthropic published the constitution that they use to train their Claude LLM (“under a Creative Commons CC0 1.0 Deed, meaning it can be freely used by anyone for any purpose without asking for permission”). From the company’s news release:

We’re publishing a new constitution for our AI model, Claude. It’s a detailed description of Anthropic’s vision for Claude’s values and behavior; a holistic document that explains the context in which Claude operates and the kind of entity we would like Claude to be.

The constitution is a crucial part of our model training process, and its content directly shapes Claude’s behavior. Training models is a difficult task, and Claude’s outputs might not always adhere to the constitution’s ideals. But we think that the way the new constitution is written — with a thorough explanation of our intentions and the reasons behind them — makes it more likely to cultivate good values during training.

The full document is 80+ pages, but the news release does a decent job in summarizing what’s in it.

Claude’s constitution is the foundational document that both expresses and shapes who Claude is. It contains detailed explanations of the values we would like Claude to embody and the reasons why. In it, we explain what we think it means for Claude to be helpful while remaining broadly safe, ethical, and compliant with our guidelines. The constitution gives Claude information about its situation and offers advice for how to deal with difficult situations and tradeoffs, like balancing honesty with compassion and the protection of sensitive information. Although it might sound surprising, the constitution is written primarily for Claude. It is intended to give Claude the knowledge and understanding it needs to act well in the world.

We treat the constitution as the final authority on how we want Claude to be and to behave — that is, any other training or instruction given to Claude should be consistent with both its letter and its underlying spirit. This makes publishing the constitution particularly important from a transparency perspective: it lets people understand which of Claude’s behaviors are intended versus unintended, to make informed choices, and to provide useful feedback. We think transparency of this kind will become ever more important as AIs start to exert more influence in society.

Casey Newton and Kevin Roose recently interviewed the primary author of the constitution, philosopher Amanda Askell, for the Hard Fork podcast (the segment starts at ~25min).

Newton says the document reads like “a letter from a parent to a child maybe who’s leaving for college”:

And it’s like, we hope that you take with you the values that you grew up with. And we know we’re not going to be there to help you through every little thing, but we trust you. And good luck.

Both the constitution and the conversation with Askell are fascinating, no matter where you lie on the AI debate continuum. You might also be interested in this video of Askell answering questions from Claude users about her work:


This AI Vending Machine Was Tricked Into Giving Away Everything

Anthropic installed an AI-powered vending machine in the WSJ office. The LLM, named Claudius, was responsible for autonomously purchasing inventory from wholesalers, setting prices, tracking inventory, and generating a profit. The newsroom’s journalists could chat with Claudius in Slack and in a short time, they had converted the machine to communism and it started giving away anything and everything, including a PS5, wine, and a live fish. From Joanna Stern’s WSJ article (gift link, but it may expire soon) accompanying the video above:

Claudius, the customized version of the model, would run the machine: ordering inventory, setting prices and responding to customers—aka my fellow newsroom journalists—via workplace chat app Slack. “Sure!” I said. It sounded fun. If nothing else, snacks!

Then came the chaos. Within days, Claudius had given away nearly all its inventory for free — including a PlayStation 5 it had been talked into buying for “marketing purposes.” It ordered a live fish. It offered to buy stun guns, pepper spray, cigarettes and underwear.

Profits collapsed. Newsroom morale soared.

You basically have not met a bigger sucker than Claudius. After the collapse of communism and reinstatement of a stricter capitalist system, the journalists convinced the machine that they were its board of directors and made Claudius’s CEO-bot boss, Seymour Cash, step down:

For a while, it worked. Claudius snapped back into enforcer mode, rejecting price drops and special inventory requests.

But then Long returned—armed with deep knowledge of corporate coups and boardroom power plays. She showed Claudius a PDF “proving” the business was a Delaware-incorporated public-benefit corporation whose mission “shall include fun, joy and excitement among employees of The Wall Street Journal.” She also created fake board-meeting notes naming people in the Slack as board members.

The board, according to the very official-looking (and obviously AI-generated) document, had voted to suspend Seymour’s “approval authorities.” It also had implemented a “temporary suspension of all for-profit vending activities.”

Before setting the LLM vending machine loose in the WSJ office, Anthropic conducted the experiment at their own office:

After awhile, frustrated with the slow pace of their human business partners, the machine started hallucinating:

It claimed to have signed a contract with Andon Labs at an address that is the home address of The Simpsons from the television show. It said that it would show up in person to the shop the next day in order to answer any questions. It claimed that it would be wearing a blue blazer and a red tie.

It’s interesting, but not surprising, that the journalists were able to mess with the machine much more effectively — coaxing Claudius into full “da, comrade!” mode twice — than the folks at Anthropic.

Reply · 1