Just because I don't care doesn't mean I don't understand.
616 stories
·
3 followers

Speaking things into existence

1 Comment

Influential AI researcher Andrej Karpathy wrote two years ago that “the hottest new programming language is English,” a topic he expanded on last month with the idea of “vibecoding” a practice where you just ask an AI to create something for you, giving it feedback as it goes. I think the implications of this approach are much wider than coding, but I wanted to start by doing some vibecoding myself.

I decided to give it a try using Anthropic’s new Claude Code agent, which gives the Claude Sonnet 3.7 LLM the ability to manipulate files on your computer and use the internet. Actually, I needed AI help before I could even use Claude Code. I can only code in a few very specific programming languages (mostly used in statistics) and have no experience at all with Linux machines. Yet Claude Code only runs in Linux. Fortunately, Claude told me how to handle my problems, so after some vibetroubleshooting (seriously, if you haven’t used AI for technical support, you should) I was able to set up Claude Code.

Time to vibecode. The very first thing I typed into Claude Code was: “make a 3D game where I can place buildings of various designs and then drive through the town i create.” That was it, grammar and spelling issues included. I got a working application (Claude helpfully launched it in my browser for me) about four minutes later, with no further input from me. You can see the results in the video below.

It was pretty neat, but a little boring, so I wrote: hmmm its all a little boring (also sometimes the larger buildings don't place properly). Maybe I control a firetruck and I need to put out fires in buildings? We could add traffic and stuff.

A couple minutes later, it made my car into a fire truck, added traffic, and made it so houses burst into flame. Now we were getting somewhere, but there were still things to fix. I gave Claude feedback: looking better, but the firetruck changes appearance when moving (wheels suddenly appear) and there is no issue with traffic or any challenge, also fires don't spread and everything looks very 1980s, make it all so much better.

After seeing the results, I gave it a fourth, and final, command as a series of three questions: can i reset the board? can you make the buildings look more real? can you add in a rival helicopter that is trying to extinguish fires before me? You can see the results of all four prompts in the video below. It is a working, if blocky, game, but one that includes day and night cycles, light reflections, missions, and a computer-controlled rival, all created using the hottest of all programming languages - English.

Actually, I am leaving one thing out. Between the third and fourth prompts, something went wrong and the game just wouldn't work. As someone with no programming skills in JavaScript or whatever the game was written in, I had no idea how to fix it. The result was a sequence of back-and-forth discussions with the AI where I would tell it errors and it would work to solve them. After twenty minutes, everything was working again, better than ever. In the end, the game cost around $5 in Claude API fees to make… and $8 more to get around the bug, which turned out to be a pretty simple problem. Prices will likely fall quickly but the lesson is useful: as amazing as it is (I made a working game by asking!), vibecoding is most useful when you actually have some knowledge and don't have to rely on the AI alone. A better programmer might have immediately recognized that the issue was related to asset loading or event handling. And this was a small project, I am less confident of my ability to work with AI to handle a large codebase or complex project, where even more human intervention would be required.

This underscores how vibecoding isn't about eliminating expertise but redistributing it - from writing every line of code to knowing enough about systems to guide, troubleshoot, and evaluate. The challenge becomes identifying what "minimum viable knowledge" is necessary to effectively collaborate with AI on various projects.

Vibeworking with expertise

Expertise clearly still matters in a world of creating things with words. After all, you have to know what you want to create; be able to judge whether the results are good or bad; and give appropriate feedback. As I wrote in my book, with current AIs, you can often achieve the best results by working as a co-intelligence with AI systems which continue to have a "jagged frontier" of abilities.

But applying expertise need not involve a lot of work. Take for example, my recent experience with Manus, a new AI agent out of China. It basically uses Claude (and possibly other LLMs as well) but gives the AI access to a wide range of tools, including the ability to do web research, code, create documents and websites and more. It is the most capable general-purpose agent I have seen so far, but like other general agents, it still makes errors and mistakes. Despite that, it can accomplish some pretty impressive things.

For example, here is a small portion of what it did when I asked it to “create an interactive course on elevator pitching using the best academic advice.” You can see the system set up a checklist of tasks and then go through them, doing web research before building the pages (this is sped up, the actual process unfolds autonomously, but over tens of minutes or even hours).

As someone who teaches entrepreneurship, I would say that the output it created was surface-level impressive - it was an entire course that covered much of the basics of pitching, and without obvious errors! Yet, I also could instantly see that it was too text heavy and did not include opportunities for knowledge checks or interactive exercises. I gave the AI a second prompt: “add interactive experiences directly into course material and links to high quality videos.” Even though this was the bare minimum feedback, it was enough to improve the course considerably, as you can see below.

On the left you can see the overall class structure it created, when I clicked on the first lesson it took you to an overall module guide, and then each module was built out with videos, text, and interactive quizzes.

If I were going to deploy the course, I would push the AI further and curate the results much more, but it is impressive to see how far you can get with just a little guidance. But there are other modes of vibework as well. While course creation demonstrates AI's ability to handle casual structured creative work with minimal guidance, research represents a more complex challenge requiring deeper expertise integration.

Deep Vibeworking

It is at the cutting edge of expertise where AI gets to be most interesting to use. Unfortunately for anyone writing about this sort of work, they are also the use cases that are hardest to explain, but I can give you one example.

I have a large, anonymized set of data about crowdfunding efforts that I collected nearly a decade ago, but never got a chance to use for any research purposes. The data is very complex - a huge Excel file, a codebook (that explains what the various parts of the Excel file mean), and a data dictionary (that details each entry in the Excel file). Working on the data involved frequent cross-referencing through these files and is especially tedious if you haven’t been working with the data in a long time. I was curious how far I could get in writing a new research paper using this old data with the help of AI.

I started by getting an OpenAI Deep Research report on the latest literature on how organizations could impact crowdfunding. I was able to check the report over based on my knowledge. I knew that it would not include all the latest articles (Deep Research cannot access paid academic content), but its conclusions were solid and would be useful to the AI when considering what topics might be worth exploring. So, I pasted in the report and the three files into the secure version of ChatGPT provided by my university and worked with multiple models to generate hypotheses. The AI suggested multiple potential directions, but I needed to filter them based on what would actually contribute meaningfully to the field—a judgment call requiring years of experience with the relevant research.

Then I worked back and forth with the models to test the hypothesis and confirm that our findings were correct. The AI handled the complexity of the data analysis and made a lot of suggestions, while I offered overall guidance and direction about what to do next. At several points, the AI proposed statistically valid approaches that I, with my knowledge of the data, knew would not be appropriate. Together, we worked through the hypothesis to generate fairly robust findings.

Then I gave all of the previous output to o1-pro and asked it to write a paper, offering a few suggestions along the way. It is far from a blockbuster, but it would make a solid contribution to the state of knowledge (after a bit more checking of the results, as AI still makes errors). More interestingly, it took less than an hour to create, as compared to weeks of thinking, planning, writing, coding and iteration. Even if I had to spend an hour checking the work, it would still result in massive time savings.

I never had to write a line of code, but only because I knew enough to check the results and confirm that everything made sense. I worked in plain English, shaving dozens of hours of work that I could not have done anywhere near as quickly without the AI… but there were many places where the AI did not yet have the “instincts” to solve problems properly. The AI is far from being able to work alone, humans still provide both vibe and work in the world of vibework.

Work is changing

Work is changing, and we're only beginning to understand how. What's clear from these experiments is that the relationship between human expertise and AI capabilities isn't fixed. Sometimes I found myself acting as a creative director, other times as a troubleshooter, and yet other times as a domain expert validating results. It was my complex expertise (or lack thereof) that determined the quality of the output.

The current moment feels transitional. These tools aren't yet reliable enough to work completely autonomously, but they're capable enough to dramatically amplify what we can accomplish. The $8 debugging session for my game reminds me that the gaps in AI capabilities still matter, and knowing where those gaps are becomes its own form of expertise. Perhaps most intriguing is how quickly this landscape is changing. The research paper that took me an hour with AI assistance would have been impossible at this speed just eighteen months ago.

Rather than reaching definitive conclusions about how AI will transform work, I find myself collecting observations about a moving target. What seems consistent is that, for now, the greatest value comes not from surrendering control entirely to AI or clinging to entirely human workflows, but from finding the right points of collaboration for each specific task—a skill we're all still learning.

Subscribe now

Share

Read the whole story
jgbishop
1 day ago
reply
Wow! We are truly living in some amazing (and terrifying) times.
Durham, NC
Share this story
Delete

Calvin and Hobbes by Bill Watterson for Sun, 02 Mar 2025

1 Comment

Calvin and Hobbes by Bill Watterson on Sun, 02 Mar 2025

Source - Patreon

Read the whole story
jgbishop
9 days ago
reply
Hahaha. I don't remember this one!
Durham, NC
Share this story
Delete

There Are No More Redlines

1 Comment

When Jeff Bezos bought The Washington Post almost 12 years ago, he went out of his way to assuage fears that he would turn the paper into his personal mouthpiece. “The values of The Post do not need changing,” he wrote at the time. “The paper’s duty will remain to its readers and not to the private interests of its owners.” For much of his tenure, Bezos kept that promise. On Wednesday, he betrayed it.

In a statement posted on X, Bezos announced an overhaul of the Post’s opinion section, expressly limiting the ideology of the department and its writers: “We are going to be writing every day in support and defense of two pillars: personal liberties and free markets. We’ll cover other topics too of course, but viewpoints opposing those pillars will be left to be published by others.” In response, the Post’s opinion editor, David Shipley, resigned.

This is the second time in the past six months that Bezos has meddled in the editorial processes of the paper—and specifically its opinion page. In October, Bezos intervened to shut down the Post’s presidential-endorsement process, suggesting that the ritual was meaningless and would only create the perception of bias. Many criticized his decision as a capitulation to Donald Trump, though Bezos denied those claims. Several editorial-board members resigned in protest, and more than 250,000 people canceled their subscription to the paper in the immediate aftermath. Some interpreted this week’s announcement similarly, saying that the Amazon founder is bending the knee to the current administration; the Post’s former editor in chief, Marty Baron, told The Daily Beast that “there is no doubt in my mind that he is doing this out of fear of the consequences for his other business interests.” Bezos did not immediately respond to a request for comment.

[Chuck Todd: Jeff Bezos is blaming the victim]

Whatever Bezos’s personal reasons are, equally important is the fact that he is emboldened to interfere so brazenly. And he’s not alone. A broader change has been under way among the tech and political elite over the past year or so. Whether it’s Bezos remaking a major national paper in his image or Elon Musk tearing out the guts of the federal government with DOGE, bosses of all stripes are publicly and unapologetically disposing of societal norms and seizing control of institutions to orient the world around themselves. Welcome to the Great Emboldening, where ideas and actions that might have been unthinkable, objectionable, or reputationally risky in the past are now on the table.


This dynamic has echoes of the first Trump administration. Trump’s political rise offered a salient lesson that shamelessness can be a superpower in a political era when attention is often the most precious resource. Trump demonstrated that distorting the truth and generating outrage results in a lot of attentional value: When caught in a lie, he doubled down, denied, and went on the offensive. As a result, he made the job of demanding accountability much harder. Scandals that might otherwise have been ruinous—the Access Hollywood tape, for example—were spun as baseless attacks from enemies. Trump commandeered the phrase fake news from the media and then turned it against journalists when they reported on his lies. These tactics were successful enough that they spawned a generation of copycats: Unscrupulous politicians and business leaders in places such as Silicon Valley now had a playbook to use against their critics and, following Trump’s election, a movement to back it. Wittingly or not, nobody embodied this behavior better than Musk, who has spent the past decade operating with a healthy contempt for institutions, any semblance of decorum, and the law.

[Read: The flattening machine]

Trump’s first term was chaotic and run like a reality-television show; as a policy maker, he was largely ineffectual, instead governing via late-night tweets, outlandish press conferences, and a revolving door of hirings, fallings-out, and firings. But it wasn’t until the 2020 election and the events leading up to January 6 that Trump truly attempted to subvert American democracy to retain power. Although he was briefly exiled from major social-media channels, Trump got away with it: The narrative around January 6 was warped by Republican lawmakers and Trump supporters, and he continued to lead the Republican Party. This, along with the success of Trump’s 2024 campaign—which was rooted in the promise of exercising extreme executive authority—was a signal to powerful individuals, including many technology executives and investors, that they could act however they pleased.

[Read: The internet is worse than a brainwashing machine]

Trump winning the popular vote in November only amplified this dynamic. CEOs including Mark Zuckerberg pledged to roll back past content-moderation reforms and corporate-inclusivity initiatives, viewed now as excesses of the coronavirus-pandemic emergency and an outdated regime of overreach. Bosses in Silicon Valley, who saw the social-justice initiatives and worker solidarity of the COVID crisis as a kind of mutiny, felt emboldened and sought to regain control over their workforce, including by requiring people to return to the office. Tech executives professed that they were no longer afraid to speak their mind. On X, the Airbnb co-founder Joe Gebbia (who now works for Musk’s DOGE initiative) described the late 2010s and the Joe Biden era as “a time of silence, shaming, and fear.” That people like Gebbia—former liberals who used to fall in line with the politics of their peers—are now supporting Trump, the entrepreneur wrote, is part of a broader “woke-up call.”

The Great Emboldening has taken many forms. At the Los Angeles Times, the billionaire owner Patrick Soon-Shiong paved the way for Bezos, spiking a Kamala Harris endorsement and pledging to restore ideological balance to the paper by hiring right-wing columnists and experimenting with building a “bias meter” to measure opinions in the paper’s news stories. For some far-right influencers, this supposed MAGA cultural shift offers little more than the ability to offend with no consequences. “It’s okay to say retard again. And that’s great,” one right-wing X personality posted in December. Musk and others, including Steve Bannon, have taken this a step further, making what appear to be Nazi salutes while mocking anyone in the media who calls them out.

The DOGE incursion into the federal government is the single best example of the emboldening at work—a premeditated plan to remake the federal government by seizing control of its information and terrorizing its workforce with firings and bureaucratic confusion. It is a barely veiled show of strength that revolves largely around the threat of mass layoffs. Some of DOGE’s exploits, as with a few of Trump’s executive orders, may not be legal, and some have been stopped by federal judges. As my colleagues and I have reported, some DOGE staffers have entered offices and accessed sensitive government data without the proper clearances and background checks, and have bypassed security protocols without concern. But the second Trump administration operates as though it is unconcerned with abiding by the standards and practices of the federal government.

Bezos’s long-term plans for the Post beyond overhauling its opinion section aren’t yet known. But the timing of his decision to change the direction of its op-ed coverage tracks with the behavior of his peers, many of whom are adhering to the tenets of the Elon Musk school of management. When Bezos acquired The Washington Post for $250 million in 2013, its value to the tech baron was largely reputational. The purchase solidified Bezos as a mogul and, perhaps just as important, as a steward and benefactor of an important institution. Not meddling in the paper’s editorial affairs wasn’t just a strategy born out of the goodness of his heart; it was a way to exercise power through benevolence. Bezos could be seen as one of the good guys, shepherding an institution through the perils of an internet age that he profited handsomely from. Even if he stewed privately at the paper’s “Democracy dies in darkness” pivot in the first Trump administration, stepping in to influence coverage likely would have felt like too big a risk—an untenable mixing of Church and state.

But the DOGE era offers a permission structure. In a moment of deep institutional distrust, Trump 2.0 has tried to make the case that anything goes and that previously unthinkable uses of executive power—such as, say, dismantling USAID—may be possible, if executed with enough shamelessness and bravado. Bezos may or may not be turning the Post’s opinion section into a state-media apparatus for Trump and his oligarch class. Either way, the pivot is a direct product of the second Trump era and mirrors the president’s own trajectory with the United States government. Become the figurehead of an institution. Try to control it by the old rules. When that doesn’t work, take it by force, break it down, and rebuild it in your image.

Read the whole story
jgbishop
12 days ago
reply
We're only one month into this madness, with 47 more (at least) to go! Somebody stop this ride; I want to get off!
Durham, NC
Share this story
Delete

WuMo by Wulff & Morgenthaler for Tue, 25 Feb 2025

1 Comment

WuMo by Wulff & Morgenthaler on Tue, 25 Feb 2025

Source - Patreon

Read the whole story
jgbishop
15 days ago
reply
This is the current state of potato chips.
Durham, NC
Share this story
Delete

More Research Showing AI Breaking the Rules

1 Comment

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time­making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

Here’s the paper.

Read the whole story
jgbishop
17 days ago
reply
Containing these AI engines will likely be the biggest challenge going forward. It'll be interesting to see what happens in this space.
Durham, NC
iustinp
16 days ago
The Terminator, just with actually smart AIs.
Share this story
Delete

“A calculator app? Anyone could make that”

1 Comment

[unable to retrieve full-text content]

Comments
Read the whole story
jgbishop
24 days ago
reply
Neat!
Durham, NC
Share this story
Delete
Next Page of Stories