Hacker Newsnew | past | comments | ask | show | jobs | submit | meowface's commentslogin

GPT-5.5 is the better programmer but Opus 4.8 remains the better system architect and product designer.

Codex is very "miss the forest for the trees", but is much better at successfully making large changes in large codebases. Claude Code makes more mistakes, but has more taste and a better grasp on idiomatic and elegant software development.

If you can afford to, I recommend juggling both.


Great analysis and follows my experience as well. Codex is better when you know how you want the design and the architecture and you drive the agent a lot more aggressively. Claude Code feels like more autopilot so executives and users who didn’t code before AI like it a lot more.

But I feel like an expert who can drive GPT aggressively will out perform Opus. It’s why some smart people I know are opting for GPT and have fallen off on Opus. It’s like asking an F1 driver to sit in a taxi.


Opus 4.7 (haven't tried 4.8) just really struggles writing correct code for complicated (i.e. valuable) work. I can handle architecture, which takes <1% of my time anyway. But writing code that's wrong is a cardinal sin. I've had much more luck with GPT 5.5 so far.

This is exactly right. Claude has baked in autonomy and preferences that let it handle underspecified prompts elegantly, which makes it seem smarter to people who like to prompt that way, but it also ignores instructions and fights you on things, which makes it a bad model for people who know what they want to do and specify it.

I find arguing that a complex weighted graph has a taste is interesting.

This is not a jab, but a genuine curiosity of mine.


More interesting than arguing a jumble of electrochemical reactions have taste? That may seem more readily familiar but is no less strange if you prod at it. Nonetheless it’s difficult to argue either don’t produce output that has qualities of discernment (ie taste).

Isn't it just arguing that one complex weighted graph was tuned to output tokens that more align with what current day users would define as 'taste'?

I don't think it necessarily says anything about a model itself having 'taste' in some subjective way.

If the fashion changes would the model update with it without retraining? No. So the model doesn't have 'taste' in that sense. It has alignment to current human definitions of taste.


The roulette pockets for the model are bigger for some outputs than others. Draw a big enough black box around it and a different one around humans and it's insistinguishable.

The taste that the complex weighted graph was trained on was better for one than the other I think is the long winded way to say it

GenAI is good. (LLMs are GenAI, for example.)

This particular subset of GenAI is very very bad.


Same. I prefer walking outside (as would anyone) but I find even walking within my own home is pretty good, for people who have enough space. I may look like a maniac pacing in circles while watching some philosophical YouTube video on the big TV, but it's nice.

Codex and Claude Code store all this too. Lately I've started having each agent regularly read each other's chat transcripts as well as their own, including even the very same session I'm in. (With big contexts they increasingly forget a few things that they re-learn by just looking at the verbatim transcript.)

I don't think it's worth writing my own harness or switching to Pi and writing a plugin, but I definitely need to create some skills to automate much of this.


It is not worth switching to Pi except as a hobbyist.

Something that is overlooked: the mainstream harnesses have a huge advantage in telemetry and datapoints to use to improve the harness. They have internal teams building the tooling. They have tight integration built-in with their own backends (e.g. optimizing for caching).

Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.

In this era of software when you can build almost anything you can imagine, why spend that time building plugins for a harness?


Hard disagree.

Pi has optimizations as well, and development is quite active.

We are literally months into this new frontier. Mainstream harnesses are not far off from a minimal + extensible open alternative.

You don’t have to build your own plugins, as you can simply install an existing plugin that does what the mainstream harnesses do. Folks are already making the same functionality, but with more control to the user.

If you are a builder, like many reading this thread, pi is the way to go. Pi already gives you the tools to leverage LLMs to assist with building plugins, if that’s the way you want to go.


That's like arguing that you should spend your time tuning your IDE. How does that relate to end-user value created?

Yes, you built yourself a nice little utility.

Meanwhile, you wasted those tokens and time that could have been spent building actual, useful software instead of hobby tinkering your harness.

It's like thinking your sneaker tread design is going to make the difference between you and someone who just goes out there and runs everyday. The person that just runs is going to win the race every time while you 3D print the perfect tread design optimized for you running style...and don't actually run.

If you want to produce better results at running, you just run and optimize the externalities (gear) later. Same here: you have a magical software production factory and the only thing you want to use it for is your hobby tweaking of your perfect harness instead of...just making useful software.

:clap: :clap: I guess.


Why would taking the more open, minimalist, configurable and ultimately diligent route means you won't be working on anything else?? Not to mention that pi has other advantages over Claude and Codex, read up on it. Also, improvements to the agent itself will pay more dividends the earlier they are applied. The tone of this message is waaaay off.

    > Why would taking the more open, minimalist, configurable and ultimately diligent route means you won't be working on anything else??
You're using the same finite pool of time and tokens. Why waste your time with the perfect gear instead of focusing on just getting really good at running? Just go run and when you've pushed the limits and the gear becomes the difference, then optimize the gear to get to the next level.

While you're busy trying to optimize your harness, others are just building and shipping with the magical software factory.


What are these "others" shipping, slopware? Agents are not a "magical software factory", they are a tool with a lot of limitations, but which can speed up development in a sustainable way, when used wisely. And that includes configuring it in a way that complements the other tools in our toolkit.

Everyone's waking up to this simple truth: vibe coding like there's not tomorrow accumulates conceptual and technical debt at a unsustainable rate. Then when the "magical factory" gets mired in its own mess, it's back to the drawing board. This is the also what the makers of pi have discovered, if you listen to their talks about how pi came about. I don't believe there are any justification for the assumptions you make about their approach, nor am I seeing you presenting any either. As it is, you take just feels peevish and unfair, to be honest.


A story to share: friend vibe coded absolute slop with Replit starting late 2024 (!!). Absolute trash code. Hacked multiple times because his login code exposed the full user list on the FE (!!!). Hacker found a way to exploit his account confirmation email because it was all front-end and sent an email to every customer telling them he was hacked. One time called me up in a panic asking why his web page was randomly refreshing (turns out, he was serving it in dev mode via Vite with HMR). It was mistake after mistake after mistake.

But he started to get customers. First a handful, then a dozen, then enough to get legal threats from other vendors, and this year, his first "enterprise" deal providing software in a space that was long dominated by a duopoly of legacy providers.

Guess what he did? Just rewrote it with the latest models and hired one engineer to ensure agents followed better practices. It's a legit business now built by a tiny team using a magical software factory to produce absolute trash code, but in shipping it, he found a market and customers willing to pay him for an alternative to the duopoly.

See, at the end of the day, it's cute that you have the perfectly tuned harness, but that also means whatever time you spent tuning your harness, reading up on Pi, spending tokens on your custom plugins -- all of that time and resources could have been used just building something useful.


People use Replit to build websites too, and some of them might scratch enough of a need to make money this way. So what? Is this what I should be mightily impressed with? That some random dude vibe coded some slopware which he was able to convince some random others to pay him for? I'm personally more interested and impressed by brilliant technical achievements, even if less monetizable, than some hustle or another in some industry niche which only ever attracted the interest of two legacy players. This is Hacker News, not Hustler News after all.

> Something that is overlooked: the mainstream harnesses have a huge advantage in telemetry and datapoints to use to improve the harness. They have internal teams building the tooling. They have tight integration built-in with their own backends (e.g. optimizing for caching).

> Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.

Do I want to become completely dependent on the pricy pay-as-you-go tool? In the long run that will make me powerless.


You'll be dependent on it whether or not you use the main harnesses. You pay for the model. The frontier models will likely always be better than the open source ones.

> The frontier models will likely always be better than the open source ones.

Their lead is only a few months, and shrinking.

Local is the future.


> Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.

I don't think that you really get what this new era of software is about otherwise you would understand why the experienced are spending time tinkering on the so called harness (like openclaw did)


OpenClaw is far from useful. Aside from the creator trading the fame for a job at OpenAI, it's hard to see how it's transformed anything.

> It is not worth switching to Pi except as a hobbyist.

Permit me to paraphrase slightly. "It is not worth switching to Linux except as a hobbyist. Something that is overlooked: the mainstream OSs have a huge advantage ....".

You are in good company. In 1999, Bill Gates confidently dismissed Linux as a threat, arguing it lacked the central control, features, and graphical interface needed to compete in the commercial market.

Back to the article, quoting:

> Pi might be built with Pi, but we’re quite far off today from where Bun and OpenClaw already are: fully detached, automated software engineering.

Please don't call it software engineering. I've been programming for 40 years, and most of that time had to put up with the derision from the other engineering disciplines: "If civil engineering built things like software engineers, the first woodpecker that came along would destroy civilisation". It hurt because it was true. It's still often true for things like web pages, but for the things I use like Linux and vim, it hasn't been true for a long, long while. We have finally mastered how to repeatedly build solid, reliable software.

Which is why I'm an Anthropic refugee. Opus is definitely the best for coding, but claude-cli + bun is the most unreliable piece of crap I've had the misfortune to come across in a while. Sadly I can't afford their API pricing, so either my principles or Opus had to give. I went to pi and an open-source model. The difference between the top open-source models and Opus are noticeable, but not drastic, unlike the difference between pi and claude-cli.

pi has proved to be solid, fast, have a transparent design, and be customisable in the old Linux way ("do one thing, and do it well"). I pray that will never change.


And yet Pi has done a few things that were quite transformational. A lot of recent agentic libraries explicitly credit Pi for design ideas.

We’re so early in this technology phase, now is the time to tinker and explore. At one point that window will close.


Which design ideas are those? (Asking out of curiosity, happy pi user here!)

One example: earlier versions of my mlx-code's harness layer were largely a Python port/adaptation of Pi.

I mean, have you tried Pi? It's really good out of the box.

Agreed. 4.7 is a smarter but weirder model. It will get confused in unexpected ways, but when it's not confused it will perform better than 4.6.

It's not a bad idea to skip it and wait until the next model release, but I personally will stick with 4.7.


How does their versionimg work? Because I've assumed that they're constantly tweaking their system prompts, I'm hoping in a couple of months, 4.7 will be improved over my first impressions- I caught significant hallucinations, something I'd rarely experienced with 4.6, if at all, I honestly can't remember one - but what I worried me was thebout the hallucinations I didn't catch.

That is a load-bearing decision!

That’s a decision-shaped comment.

It's just an incremental thing. You're both right. They will slowly become less and less likely to introduce vulns due to higher intelligence and better RL. Offensive capabilities will still probably scale faster than automatic defensive-while-coding ones.

I've noticed even people who do offensive security for a living frequently leave gaping holes in their own code. If you're not actively primed to scan the landscape for the gorilla, you will often miss it even if you're a gorilla inquisitor.

IMO this is bad, but a formatter that autofixes it would be fine

Is there a more benign explanation for these things? Altman is undeniably famously cagey and political but despite most of the tech and non-tech worlds at this point seeing him as some kind of con artist, I still kind of want to try to believe he's not.

No doubt some of OpenAI's founding principles like "stop + assist if a competitor gets to AGI first" are likely flying out the window, perhaps in part due to him and also as one might anticipate of initial lofty ideals and promises, but even with the recent New Yorker and other articles he seems like someone who maybe regularly placates people to avoid personal problems and lies to get out of trouble rather than a Machiavellian tech baron.


> he seems like someone who maybe regularly placates people to avoid personal problems and lies to get out of trouble rather than a Machiavellian tech baron.

This would be more plausible were it not for the staggering amount of wealth he’s amassed through those lies.


When someone tells you who they are, you should believe them.


> ... I still kind of want to try to believe he's not.

Asking genuinely - why?


What if it's actually super-intelligence and a human aligned visionary is at the helm. The good case is very good.


If you give me a hundred dollars, I promise I'll come back tomorrow and hand you thirty trillion dollars.


If he's our representative in the era of superintelligence, we are all screwed.


I mean what if he's actually the second coming of Christ. We can make up "what if"s all day but it's meaningless to even discuss them if you don't have a shred of evidence to support the claim.


> I mean what if he's actually the second coming of Christ.

Makes sense. Cue Don LaFontaine: In a world, where one man sacrificed himself for all of humanity… And they learned nothing of his lessons… In a country where people lie in his name as an excuse to hate their fellow man… Where they mock him by wearing his moment of death as jewellery¹… He’s back and adopted a new identity to slowly fuck them all and make the world burn… Johnny W Pussyfoot is Jesus in: The Second Coming.

¹ https://www.youtube.com/watch?v=pJSZcxXe7IQ


Exactly. The second coming of Christ would be a very good case.

Why people want to believe Altman is good is about the same reason people want to believe in the second coming.


I’m really struggling to see how Christian apocalyptic ideas are even remotely relevant.

We used to be capable of so much.


So much of the AI Hype is religion encoded. It's relevant because the AI companies are invoking the ideas. If you go around telling people that AI is going to cure cancer, bring about global prosperity, and give you an uploaded immortality then you cannot be surprised when some people start thinking of it similarly to the second coming.


I'm consistently amused by the fact that there's still this weird faction of populists on even tech-oriented sites like HN and /r/programming and lobste.rs and Mastodon who have this almost antivax-level stance on AI. I'm not precisely sure what explains it, because many of them actually are smart people and good programmers.

AI very likely will cure many cancers and very possibly (assuming combined with good politicians) will bring about global prosperity. A high percentage of AI company employees and executives and open source developers and researchers sincerely believe it will and so they say they believe it will. They have good reason to believe it, and they will likely be proven correct. If 400 (or 40, or 4) years pass and it's still mostly just creating spreadsheets, I will concede, though.


Uhh literally what is one thing Sam has done or said that demonstrates he's either human-aligned or visionary?


I could write a giant response to this with dozens of quotes from him and others and various sources but you would just say it's all lies/posturing, so it would not be a good use of my time. I will say that him becoming one of the most prominent funders and promoters of UBI research/experiments 6 years before GPT-3 is probably not a coincidence, though. OpenAI releasing a paper a month ago strongly suggesting the US move towards a more socialist economic system to handle massive economic upheaval is also probably not a coincidence. He obviously founded OpenAI with the primary intent of making AI so that they could make it go well instead of poorly, and it going well means properly addressing mass unemployment, biosecurity risks, some degree of widely distributed access so that the very poorest get meaningful use of the exact same intelligence as what the very richest get to use, etc.


This is what the most midwit milquetoast person in the country would try to do if they were in Sam's shoes and, like Sam, once they had a couple billion dollars dangled in front of them they'd abandon all regard for safety or distribution of wealth or whatever-else-they-thought-they-ought-to-care-about.


Come on… The guy who said he can’t imagine caring for his child without consulting ChatGPT… The guy who said he didn’t know how to make revenue with ChatGPT, and made a “soft promise” to investors they’d somehow achieve AGI then ask it how to make money… The guy who made a cryptocurrency scam that was banned in multiple countries… The guy who everyone around him says he’s a con artist and a sociopath… That guy? Really?


Really. That guy. I'm afraid you're one side of the coin in https://paulgraham.com/fh.html. Paul himself would agree with me on that if he were to read your post.


I’ve read enough Paul Graham to know he’s not someone whose opinion I care about or respect. He’s yet another rich guy tech bro out of touch with normal people who unfortunately has an army of wannabe tech and finance bro shills clinging to his every word like he’s some sort of sage. He’s not. He isn’t smarter than anyone else, he just has a bigger platform. I don’t abide by cults of personality, they’re a major reason everything is shit right now. They’re the fuel that perpetuates online and offline toxicity. Bragging that Paul Graham would agree with you is like bragging Will Smith or Kim Kardashian agrees with you: it’s not a badge of honour even if it’s true, and doesn’t make your argument stronger or mean you’re right.

But since you value his opinion so much, perhaps you should inform yourself of what he has said about Altman, including “Sam had been lying to us all the time”.


Paul has addressed this quote many times afterwards. But you've already said you don't care about his opinion, so it's not worth showing his explanation. He still likes Sam.

His own sister accused him of sexual assault.

He was fired from his first startup.

He was abruptly fired from ycombinator in shady circumstances.

He was accused by the OpenAI board of lying to them, ousted, and somehow managed to regain control.

He took OpenAI from being a non-profit to a for-profit, with obvious benefits to whoever controls it.

He was massively misleading about the capabilities of his product and predicted AGI within years.

At some point the pattern of all these events should have some weight in your judgement of him, no?


Paul Graham, founder of the website we're on, still says he likes and trusts him. He just was annoyed he was constantly distracted by AI stuff when he was supposed to be running YC as president (which bears no resemblance to any current events...).

He, at worst, finds him to have been (at some point) incompetent, which is very different from finding him immoral. Paul keeps replying to tweets to clarify this when people continue to misportray his stance.

"He was accused by the OpenAI board of lying to them, ousted, and somehow managed to regain control." is the only thing you wrote which is plainly true and valid to state.

I am sure there may exist good, strong criticisms, but your argument is so tendentiously gish-gallopy that it will if anything just make people more likely to disbelieve his critics. (Not that I would do that, since that'd be just as fallacious.)

Why would OpenAI employees all still be happily working for him and publicly supporting him? Why is the company still so successful, and the leader? Why wouldn't most of them have left in droves to Anthropic or elsewhere, by now? Especially given most technical employees at OpenAI (justifiably) share the eschatological views of AI shared' by Anthropic staff and other TESCREALists, in which case they really really try to be careful about who will be responsible for potential future superintelligence. The board and some executives disliked and distrusted him but it's unclear many other people there did or do now. And I'm not just talking about the petition but the people who have continued working there for years afterwards.


We’ll see in time if your confident trust in Sam Altman’s good nature is justified.

Personally if someone is found to be untrustworthy by multiple people and does weird stuff (like moving from open non-profit to for profit), I trust them a lot less. I don’t know him so don’t pass judgment but wouldn’t trust him and certainly wouldn’t give his statements credence since he’s been so spectacularly wrong on AI outcomes.

As to people at OpenAI and investors in OpenAI, I certainly wouldn’t expect them to denigrate their CEO just before an IPO, the one who fought off the board and installed his own place men and thus has complete control; it is not in their interests to do so.

If there is a bust after this boom I think quite a lot of bad behaviour and circular deals from The main players (nvidia, OpenAI, MS etc) will be revealed at that point. In a financial boom a lot gets hidden.


He will say whatever it takes to get the result he wants. That's manipulative and, when pursued as a lifestyle, sociopathic.

Living like that is corrupting. When you treat humans like objects, the question of your starting intentions is really secondary.


I like his tactic of talking to everyone individually to be able to tell each person exactly what they want to hear. I now use that one all the time.


Beware that there exist people who will cut you out of their lives—professional, personal, whatever—completely, likely with no warning, and possibly loudly, publicly, and with-receipts (if they’ve seen this kind of thing before or have thought through what your next steps will be after they cut you out) if they find out you do this.

All it takes is for a few of them to start comparing notes behind your back. Shit goes sideways extremely fast for people pulling this whose victims start talking to each other without them as the intermediary.


The secret is to be able to fail up at a rate higher than you burn the ecosystem around you. You are gone before people notice. That and being funny with enough charisma that it doesn't matter. Sam can't actually operate in this environment, everyone already knows his manipulative schticks.

This is one of the reasons that startups prefer the young, they often haven't been exposed to the grift and the manipulation. As a tech bro sociopath, I'd be wary of joining a startup with a mixture of ages, genders, experiences across the spectrum of ICs and management. They probably have experienced too much to be griftable in the same ways as an org stacked with young ICs. You also want to make sure that there are other people in the management chain that are more emotionally unstable. It takes much of the focus off of ones own pathologies.

Happy Hacking!


what did he do to you?


We already reached agi a while ago.


I am a beginner to Rust but I've coded with gevent in Python for many years and later moved to Go. Goroutines and gevent greenlets work seamlessly with synchronous code, with no headache. I know there've been tons of blog posts and such saying they're actually far inferior and riskier but I've really never had any issues with them. I am not sure why more languages don't go with a green thread-like approach.


Because they have their own drawbacks. To make them really useful, you need a resizable stack. Something that's a no-go for a runtime-less language like Rust.

You may also need to setup a large stack frame for each C FFI call.


Rust originally came with a green thread library as part of its primary concurrency story but it was removed pre-1.0 because it imposed unacceptable constraints on code that didn’t use it (it’s very much not a zero cost abstraction).

As an Elixir + Erlang developer I agree it’s a great programming model for many applications, it just wasn’t right for the Rust stdlib.


One of Rust's central design goals is to allow zero cost abstractions. Unifying the async model by basically treating all code as being possibly async would make that very challenging, if not impossible. Could be an interesting idea, but not currently tenable.

One problem I have with systems like gevent is that it can make it much harder to look at some code and figure out what execution model it's going to run with. Early Rust actually did have a N:M threading model as part of its runtime, but it was dropped.

I think one thing Rust could do to make async feel less like an MVP is to ship a default executor, much like it has a default allocator.


They could still come in a step short of default executor and establish some standard traits/types that are typical across executors.

By providing a default, I think you're going to paint yourself into a corner. Maybe have one of two opt-in executors in the box... one that is higher resource like tokio and one that is meant for lower resource environments (like embedded).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: