OpenAI will get caught vibe graphing

August 8, 2025

44

Throughout its massive GPT-5 livestream on Thursday, OpenAI confirmed off a couple of charts that made the mannequin appear fairly spectacular — however if you happen to look carefully, some graphs have been a little bit bit off.

In a single, paradoxically displaying how effectively GPT-5 does in “deception evals throughout fashions,” the dimensions is in every single place. For “coding deception,” for instance, the chart proven onstage says GPT-5 with considering apparently will get a 50.0 p.c deception price, however that’s in comparison with OpenAI’s smaller 47.4 p.c o3 rating which someway has a bigger bar. OpenAI seems to have correct numbers for this chart in its GPT-5 weblog put up, nonetheless, the place GPT-5’s deception price is labeled as 16.5 p.c.

With this chart, OpenAI confirmed onstage that one in all GPT-5’s scores is decrease than o3’s however is proven with a much bigger bar. On this identical chart, o3 and GPT-4o’s scores are completely different however proven with equally-sized bars. It was dangerous sufficient that CEO Sam Altman commented on it, calling it a “mega chart screwup,” although he famous {that a} appropriate model is in OpenAI’s weblog put up.

An OpenAI advertising staffer additionally apologized, saying, “We mounted the chart within the weblog guys, apologies for the unintentional chart crime.”

OpenAI didn’t instantly reply to a request for remark. And whereas it’s unclear if OpenAI used GPT-5 to truly make the charts, it’s nonetheless not an excellent search for the corporate on its massive launch day — particularly when it’s touting the “vital advances in decreasing hallucinations” with its new mannequin.

Previous articleEradicating friction from Amazon SageMaker AI growth

Next articleETH Zurich researchers 3D print coronary heart patch

OpenAI will get caught vibe graphing

Related Articles

Musk v. Altman week 2: OpenAI fires again, and Shivon Zilis reveals that Musk tried to poach Sam Altman

3D Printed Copper Chilly Plates May Minimize Knowledge Heart Cooling Vitality by 98%

Denon Dwelling collection audio system evaluate: Siri & superior sound

LEAVE A REPLY Cancel reply

Latest Articles

Musk v. Altman week 2: OpenAI fires again, and Shivon Zilis reveals that Musk tried to poach Sam Altman

3D Printed Copper Chilly Plates May Minimize Knowledge Heart Cooling Vitality by 98%

Denon Dwelling collection audio system evaluate: Siri & superior sound

The best way to consolidate cross-Area S3 knowledge into OpenSearch

Scaling cloud and AI: Microsoft Azure’s dedication to Europe’s digital future

About Us

OpenAI will get caught vibe graphing

Related Articles

LEAVE A REPLY Cancel reply

Stay Connected

Latest Articles

About Us