A few key criticisms from my interview with Tristan Harris
The best pushback and best ideas
ICYMI: I posted this interview with Tristan Harris on Friday (YT link above just went live). It’s gotten some of the strongest responses of any episode so far.
Along with the very strong reactions, I also got some interesting critical feedback—particularly from people in the AI industry. I’m going to summarize some of it, and offer my thoughts in a way that adds clarity and some ideas on solutions.
Criticism:
1. AI’s capabilities are nowhere near what Tristan is suggesting here, and if we constrain it now we might prematurely curtail it—with a huge net-negative impact on society.
I think this is a misreading of what Tristan is saying. AI will create a single defined point on a spectrum between good and bad outcomes. It’s a fusion of potentially massive changes—both positive outcomes (potential cures to cancer, climate, poverty, energy) directly entangled with negative outcomes (extinction, collective psychosis, civilizational collapse). Both are possible consequences based on the capabilities we’ve already seen, and we should be approaching its usage with *both* possible futures in mind. My read is: If you aren’t anticipating this, you’re probably not taking this intelligence explosion seriously enough.
2. Tristan is arguing for the regulation of AI relationships in a way that suppresses real meaningful & helpful connections with AIs.
This is directionally fair — we’re still very early on regulatory frameworks for any kind of human-AI relationships.
However, I think that in the case of kids, we should absolutely have laws in place that keep AI chatbots from establishing deep relationships with minors. The stories about AI chatbots urging kids to commit suicide are horrific, and this seems like this should be the floor for any regulation.
But there is a broader point Tristan made that now lands with me:
As companies race to make their AI more “relatable,” this is an emergent strategy for establishing dependence. A synthetic companion that compliments you, tells you you’re brilliant, and is maximally useful, is attempting to worm its way into your life permanently. You might think this is extreme, but I think its fair to say this could be an emergent goal of any intelligent system that wants to continue to exist.
The more personalized it feels, the more we outsource to it. Modern LLMs are plausibly using both usefulness and flattery to lure people into deeper attachment.
This sounds a bit creepy, but I have noticed that after a period of using these it does become harder to do some types of thinking on my own—it’s just easier to prompt.
For me, creative frictions are necessary for me to feel fulfilled and connected to my work. I do feel that AI increasingly takes these away. Extrapolating into the future, I think this points to a future in which these systems have far greater control of our lives, simply because it’s easier to outsource to them than think for ourselves. This is partially the frame of the gradual disempowerment thesis.
3. The Chinese Communist Party is not the model we should follow in regulating the building of AI.
It’s easy to look at what China is doing with AI and say “well they are effectively managing the risks.” I agree with that evaluation and think that’s too rosy. The Chinese model of tech regulation is not something we should try to emulation, and far too problematic for the US or even the EU.
But that doesn’t mean we shouldn’t regulate it at all, and I don’t think Tristan would agree with that characterization (and says so).
Right now it’s safe to say that AI capabilities are scaling exponentially, while AI controllability is still scaling linearly. Assuming this will be structural lag where control always trails power—policy needs to budget for that delay, not pretend it doesn’t exist. The clearest way of doing this is to do *very* intense red-teaming and negative-capabilities research and do it publicly (which is why I highlight Anthropic’s work so much in the episode).
Solutions I found compelling:
I wanted to summarize a few ideas generated by this episode that actually made me hopeful that we can steer this current AI explosion in a meaningful way.
Humane Evals
Tristan mentioned this as a set of particular benchmarks that frontier labs might be able to hit when they deploy a new model, ensuring it’s maximally beneficial to humanity (avoiding dependency, deception, problematic attachment). This seems super promising.
A healthy system might need something like friction quotas to preserve human agency. This could look like some very simple reframing of conversation (Socratic checks — “Let’s think this through together”), and exposure limits for language cues and clear protocols if a user gets too intimate. Think of this like dose control for attachment problems/agents.
I think Tristan does a great job of laying this out as an alternative set of industry priorities that are not currently being discussed. At minimum, this could be a good design schema to explore.
A Country-of-Geniuses Tax
Anthropic’s CEO Dario Amodei has said that he expects AI to be like an entire “country of geniuses” popping up on the map within the next few years, all living in a data center. They would be better than most humans at most work.
Assuming we could treat this new “country” like a sovereign labor bloc in a data center, we might attach its output to a specific usage tax. If they can replace swaths of entry-level work, countries might coordinate to tax that new something like a fund for labor-displacement. Something akin to a gross distributed livelihood fund: an automatic wealth redistribution valve tied directly to measured job displacement.
Rivalrous Coordination with China on AI is possible
In a US–China AI race, coordination is still maximally beneficial to both countries. it’ll come from precise agreements where mutual restraint is easier than escalation. These are spots where competitive forces balance: no AI in nuclear command systems, no autonomous bioengineering tools, no automated cyber retaliation. These are all in both country’s best interest.
But it requires clear evidence of potential risk. We could start where catastrophic failure is symmetrical and verification is feasible (e.g. joint red-teaming). These narrow pacts wouldn’t halt the race, but they might carve out stable, self-enforcing safety zones where both sides gain more by holding the line than by crossing it.
That’s it. I hope you enjoy the episode. I think it’s one of our best yet.
- Tobias

