When Jeff Bezos bought The Washington Post almost 12 years ago, he went out of his way to assuage fears that he would turn the paper into his personal mouthpiece. “The values of The Post do not need changing,” he wrote at the time. “The paper’s duty will remain to its readers and not to the private interests of its owners.” For much of his tenure, Bezos kept that promise. On Wednesday, he betrayed it.
In a statement posted on X, Bezos announced an overhaul of the Post’s opinion section, expressly limiting the ideology of the department and its writers: “We are going to be writing every day in support and defense of two pillars: personal liberties and free markets. We’ll cover other topics too of course, but viewpoints opposing those pillars will be left to be published by others.” In response, the Post’s opinion editor, David Shipley, resigned.
This is the second time in the past six months that Bezos has meddled in the editorial processes of the paper—and specifically its opinion page. In October, Bezos intervened to shut down the Post’s presidential-endorsement process, suggesting that the ritual was meaningless and would only create the perception of bias. Many criticized his decision as a capitulation to Donald Trump, though Bezos denied those claims. Several editorial-board members resigned in protest, and more than 250,000 people canceled their subscription to the paper in the immediate aftermath. Some interpreted this week’s announcement similarly, saying that the Amazon founder is bending the knee to the current administration; the Post’s former editor in chief, Marty Baron, told The Daily Beast that “there is no doubt in my mind that he is doing this out of fear of the consequences for his other business interests.” Bezos did not immediately respond to a request for comment.
[Chuck Todd: Jeff Bezos is blaming the victim]
Whatever Bezos’s personal reasons are, equally important is the fact that he is emboldened to interfere so brazenly. And he’s not alone. A broader change has been under way among the tech and political elite over the past year or so. Whether it’s Bezos remaking a major national paper in his image or Elon Musk tearing out the guts of the federal government with DOGE, bosses of all stripes are publicly and unapologetically disposing of societal norms and seizing control of institutions to orient the world around themselves. Welcome to the Great Emboldening, where ideas and actions that might have been unthinkable, objectionable, or reputationally risky in the past are now on the table.
This dynamic has echoes of the first Trump administration. Trump’s political rise offered a salient lesson that shamelessness can be a superpower in a political era when attention is often the most precious resource. Trump demonstrated that distorting the truth and generating outrage results in a lot of attentional value: When caught in a lie, he doubled down, denied, and went on the offensive. As a result, he made the job of demanding accountability much harder. Scandals that might otherwise have been ruinous—the Access Hollywood tape, for example—were spun as baseless attacks from enemies. Trump commandeered the phrase fake news from the media and then turned it against journalists when they reported on his lies. These tactics were successful enough that they spawned a generation of copycats: Unscrupulous politicians and business leaders in places such as Silicon Valley now had a playbook to use against their critics and, following Trump’s election, a movement to back it. Wittingly or not, nobody embodied this behavior better than Musk, who has spent the past decade operating with a healthy contempt for institutions, any semblance of decorum, and the law.
[Read: The flattening machine]
Trump’s first term was chaotic and run like a reality-television show; as a policy maker, he was largely ineffectual, instead governing via late-night tweets, outlandish press conferences, and a revolving door of hirings, fallings-out, and firings. But it wasn’t until the 2020 election and the events leading up to January 6 that Trump truly attempted to subvert American democracy to retain power. Although he was briefly exiled from major social-media channels, Trump got away with it: The narrative around January 6 was warped by Republican lawmakers and Trump supporters, and he continued to lead the Republican Party. This, along with the success of Trump’s 2024 campaign—which was rooted in the promise of exercising extreme executive authority—was a signal to powerful individuals, including many technology executives and investors, that they could act however they pleased.
[Read: The internet is worse than a brainwashing machine]
Trump winning the popular vote in November only amplified this dynamic. CEOs including Mark Zuckerberg pledged to roll back past content-moderation reforms and corporate-inclusivity initiatives, viewed now as excesses of the coronavirus-pandemic emergency and an outdated regime of overreach. Bosses in Silicon Valley, who saw the social-justice initiatives and worker solidarity of the COVID crisis as a kind of mutiny, felt emboldened and sought to regain control over their workforce, including by requiring people to return to the office. Tech executives professed that they were no longer afraid to speak their mind. On X, the Airbnb co-founder Joe Gebbia (who now works for Musk’s DOGE initiative) described the late 2010s and the Joe Biden era as “a time of silence, shaming, and fear.” That people like Gebbia—former liberals who used to fall in line with the politics of their peers—are now supporting Trump, the entrepreneur wrote, is part of a broader “woke-up call.”
The Great Emboldening has taken many forms. At the Los Angeles Times, the billionaire owner Patrick Soon-Shiong paved the way for Bezos, spiking a Kamala Harris endorsement and pledging to restore ideological balance to the paper by hiring right-wing columnists and experimenting with building a “bias meter” to measure opinions in the paper’s news stories. For some far-right influencers, this supposed MAGA cultural shift offers little more than the ability to offend with no consequences. “It’s okay to say retard again. And that’s great,” one right-wing X personality posted in December. Musk and others, including Steve Bannon, have taken this a step further, making what appear to be Nazi salutes while mocking anyone in the media who calls them out.
The DOGE incursion into the federal government is the single best example of the emboldening at work—a premeditated plan to remake the federal government by seizing control of its information and terrorizing its workforce with firings and bureaucratic confusion. It is a barely veiled show of strength that revolves largely around the threat of mass layoffs. Some of DOGE’s exploits, as with a few of Trump’s executive orders, may not be legal, and some have been stopped by federal judges. As my colleagues and I have reported, some DOGE staffers have entered offices and accessed sensitive government data without the proper clearances and background checks, and have bypassed security protocols without concern. But the second Trump administration operates as though it is unconcerned with abiding by the standards and practices of the federal government.
Bezos’s long-term plans for the Post beyond overhauling its opinion section aren’t yet known. But the timing of his decision to change the direction of its op-ed coverage tracks with the behavior of his peers, many of whom are adhering to the tenets of the Elon Musk school of management. When Bezos acquired The Washington Post for $250 million in 2013, its value to the tech baron was largely reputational. The purchase solidified Bezos as a mogul and, perhaps just as important, as a steward and benefactor of an important institution. Not meddling in the paper’s editorial affairs wasn’t just a strategy born out of the goodness of his heart; it was a way to exercise power through benevolence. Bezos could be seen as one of the good guys, shepherding an institution through the perils of an internet age that he profited handsomely from. Even if he stewed privately at the paper’s “Democracy dies in darkness” pivot in the first Trump administration, stepping in to influence coverage likely would have felt like too big a risk—an untenable mixing of Church and state.
But the DOGE era offers a permission structure. In a moment of deep institutional distrust, Trump 2.0 has tried to make the case that anything goes and that previously unthinkable uses of executive power—such as, say, dismantling USAID—may be possible, if executed with enough shamelessness and bravado. Bezos may or may not be turning the Post’s opinion section into a state-media apparatus for Trump and his oligarch class. Either way, the pivot is a direct product of the second Trump era and mirrors the president’s own trajectory with the United States government. Become the figurehead of an institution. Try to control it by the old rules. When that doesn’t work, take it by force, break it down, and rebuild it in your image.
These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.
Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.
In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.
Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the timemaking them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.
Here’s the paper.
[unable to retrieve full-text content]
CommentsA hint to the future arrived quietly over the weekend. For a long time, I've been discussing two parallel revolutions in AI: the rise of autonomous agents and the emergence of powerful Reasoners since OpenAI's o1 was launched. These two threads have finally converged into something really impressive - AI systems that can conduct research with the depth and nuance of human experts, but at machine speed. OpenAI's Deep Research demonstrates this convergence and gives us a sense of what the future might be. But to understand why this matters, we need to start with the building blocks: Reasoners and agents.
For the past couple years, whenever you used a chatbot, it worked in a simple way: you typed something in, and it immediately started responding word by word (or more technically, token by token). The AI could only "think" while producing these tokens, so researchers developed tricks to improve its reasoning - like telling it to "think step by step before answering." This approach, called chain-of-thought prompting, markedly improved AI performance.
Reasoners essentially automate the process, producing “thinking tokens” before actually giving you an answer. This was a breakthrough in at least two important ways. First, because the AI companies could now get AIs to learn how to reason based on examples of really good problem-solvers, the AI can “think” more effectively. This training process can produce a higher quality chain-of-thought than we can by prompting. This means Reasoners are capable of solving much harder problems, especially in areas like math or logic where older chatbots failed.
The second way this was a breakthrough is that it turns out that the longer Reasoners “think,” the better their answers get (though the rate of improvement slows as they think longer). This is a big deal because previously the only way to make AIs perform better was to train bigger and bigger models, which is very expensive and requires a lot of data. Reasoning models show you can make AIs better by just letting them produce more and more thinking tokens, using computing power at the time of answering your question (called inference-time compute) rather than when the model was trained.
Because Reasoners are so new, their capabilities are expanding rapidly. In just months, we've seen dramatic improvements from OpenAI's o1 family to their new o3 models. Meanwhile, China's DeepSeek r1 has found innovative ways to boost performance while cutting costs, and Google has launched their first Reasoner. This is just the beginning - expect to see more of these powerful systems, and soon.
While experts debate the precise definition of an AI agent, we can think of it simply as “an AI that is given a goal and can pursue that goal autonomously.” Right now, there's an AI labs arms race to build general-purpose agents - systems that can handle any task you throw at them. I've written about some early examples like Devin and Claude with Computer Use, but OpenAI just released Operator, perhaps the most polished general-purpose agent yet.
The video below, sped up 16x, captures both the promise and pitfalls of general-purpose agents. I give Operator a task: read my latest substack post at OneUsefulThing and then go onto Google ImageFX and make an appropriate image, download it, and give it to me to post. What unfolds is enlightening. At first, Operator moves with impressive precision - finding my website, reading the post, navigating to ImageFX (pausing briefly for me to enter my login), and creating the image. Then the troubles begin, and they're twofold: not only is Operator blocked by OpenAI's security restrictions on file downloads, but it also starts to struggle with the task itself. The agent methodically tries every conceivable workaround: copying to clipboard, generating direct links, even diving into the site's source code. Each attempt fails - some due to OpenAI's browser restrictions, others due to the agent's own confusion about how to actually accomplish the task. Watching this determined but ultimately failed problem-solving loop reveals both the current limitations of these systems and raises questions about how agents will eventually behave when they encounter barriers in the real world.
Operator's issues highlight the current limits of general-purpose agents, but that doesn’t suggest that agents are useless. It appears that economically valuable narrow agents that focus on specific tasks are already possible. These specialists, powered by current LLM technology, can achieve remarkable results within their domains. Case in point: OpenAI's new Deep Research, which shows just how powerful a focused AI agent can be.
OpenAI’s Deep Research (not to be confused with Google’s Deep Research, more on that soon) is essentially a narrow research agent, built on OpenAI’s still unreleased o3 Reasoner, and with access to special tools and capabilities. It is one of the more impressive AI applications I have seen recently. To understand why, let’s give it a topic. I am specifically going to pick a highly technical and controversial issue within my field of research: When should startups stop exploring and begin to scale? I want you to examine the academic research on this topic, focusing on high quality papers and RCTs, including dealing with problematic definitions and conflicts between common wisdom and the research. Present the results for a graduate-level discussion of this issue.
The AI asks some smart questions, and I clarify what I want. Now o3 goes off and gets to work. You can see its progress and “thinking” as it goes. It is really worth taking a second to look at a couple samples of that process below. You can see that the AI is actually working as a researcher, exploring findings, digging deeper into things that “interest” it, and solving problems (like finding alternative ways of getting access to paywalled articles). This goes on for five minutes.
At the end, I get a 13 page, 3,778 word draft with six citations and a few additional references. It is, honestly, very good, even if I would have liked a few more sources. It wove together difficult and contradictory concepts, found some novel connections I wouldn’t expect, cited only high-quality sources, and was full of accurate quotations. I cannot guarantee everything is correct (though I did not see any errors) but I would have been satisfied to see something like it from a beginning PhD student. You can see the full results here but the couple excerpts below would suffice to show you why I am so impressed.
The quality of citations also marks a genuine advance here. These aren't the usual AI hallucinations or misquoted papers - they're legitimate, high-quality academic sources, including seminal work by my colleagues Saerom (Ronnie) Lee and Daniel Kim. When I click the links, they don't just lead to the papers, they often take me directly to the relevant highlighted quotes. While there are still constraints - the AI can only access what it can find and read in a few minutes, and paywalled articles remain out of reach - this represents a fundamental shift in how AI can engage with academic literature. For the first time, an AI isn't just summarizing research, it's actively engaging with it at a level that actually approaches human scholarly work.
It is worth contrasting it with Google’s product launched last month also called Deep Research (sigh). Google surfaces far more citations, but they are often a mix of websites of varying quality (the lack of access to paywalled information and books hurts all of these agents). It appears to gather documents all at once, as opposed to the curiosity-driven discovery of OpenAI’s researcher agent. And, because (as of now) this is powered by the non-reasoning, older Gemini 1.5 model, the overall summary is much more surface-level, though still solid and apparently error-free. It is like a very good undergraduate product. I suspect that the difference will be clear if you read a little bit below.
To put this in perspective: both outputs represent work that would typically consume hours of human effort - near PhD-level analysis from OpenAI's system, solid undergraduate work from Google's. OpenAI makes some bold claims in their announcement, complete with graphs suggesting their agent can handle 15% of high economic value research projects and 9% of very high value ones. While these numbers deserve skepticism - their methodology isn't explained - my hands-on testing suggests they're not entirely off base. Deep Research can indeed produce valuable, sophisticated analysis in minutes rather than hours. And given the rapid pace of development, I expect Google won't let this capability gap persist for long. We are likely to see fast improvement in research agents in the coming months.
You can start to see how the pieces that the AI labs are building aren't just fitting together - they're playing off each other. The Reasoners provide the intellectual horsepower, while the agentic systems provide the ability to act. Right now, we're in the era of narrow agents like Deep Research, because even our best Reasoners aren't ready for general-purpose autonomy. But narrow isn’t limiting - these systems are already capable of performing work that once required teams of highly-paid experts or specialized consultancies.
These experts and consultancies aren't going away - if anything, their judgment becomes more crucial as they evolve from doing the work to orchestrating and validating the work of AI systems. But the labs believe this is just the beginning. They're betting that better models will crack the code of general-purpose agents, expanding beyond narrow tasks to become autonomous digital workers that can navigate the web, process information across all modalities, and take meaningful action in the world. Operator shows we aren’t there yet, but Deep Research suggests that we may be on our way.