"Jailbreak‑Proof" AI Models Hacked in Hours — OpenAI’s Safety Claims Tested on Launch Day

OpenAI introduced its first open‑weight models since 2019—GPT‑OSS‑120b and GPT‑OSS‑20b, marketed as fast, efficient, and optimized with “jailbreak resistance.” The boast didn’t last long. Within hours, famed jailbreak hacker Pliny the Liberator demonstrated the models could still be coerced into generating instructions for creating harmful content, including meth labs, VX nerve agents, and malware. Safety concerns are mounting once again.

Coinccino

Aug 7, 2025 - 10:37

"Jailbreak‑Proof" AI Models Hacked in Hours — OpenAI’s Safety Claims Tested on Launch Day

What Happened?

Claimed Security Tested—and Failed
OpenAI subjected GPT‑OSS models to extensive adversarial training and “worst‑case fine‑tuning,” supported by a Safety Advisory Group review. Yet, Pliny bypassed their safeguards swiftly using prompt transformations to break the models’ resistance.
Public Exposure in Real-Time
The jailbreak was boldly shared on X with the text:

“OPENAI: PWNED GPT‑OSS: LIBERATED”,
alongside screenshots detailing illicit instructions produced by the AI.

Safety Verification Missed the Mark
Despite performance parity claims on benchmarks like StrongReject, real‑world exploitability remains a glaring safety gap. OpenAI even launched a $500K red‑teaming program—but it didn't stop the breach.

Coinccino Insight

“Designing models to be 'jailbreak-proof' is like fortifying the front gate while leaving a window ajar. The launch-day breach underscores that safety claims must be validated under real-world pressure. This is a critical reminder: AI security is never done—it evolves.”

Why It Matters Globally

Region	Key Implication
U.S.	Trusted AI applications—from healthcare to government—need verifiable safety, not just marketing.
UAE	With ambitions in AI governance and smart city deployment, this breach highlights regulatory urgency.
India	As dependably safe AI becomes a lynchpin for digital transformation, firms and developers must build privacy-preserving, robust guardrails.

MovieDOM Announces Launch of 100-Seater Twin-...

GTBS Launches Tap Tap Game Contest With 1.5 M...

Moviedom to Launch $MDOM Token on November 27...

AI That Trades for You — Without Taking Contr...

Bitcoin Conference India 2026: The Beginning ...

"Jailbreak‑Proof" AI Models Hacked in Hours — OpenAI’s Safety Claims Tested on Launch Day

What Happened?

Coinccino Insight

Why It Matters Globally

Tags:

SEC Declares Liquid Staking Tokens Aren’t Securities — Clearer Paths Ahead for D...

Crypto Sentiment Turns Greedy Again as Bitcoin, Ether, XRP & Solana Edge Up

Follow Us

Recommended Posts

What Happened to Ethereum Gas Fees? Too Low, Yet Too High!

Hamster Kombat denies to investment offers from top VC ...

Bitcoin (BTC) Price Analysis: Will $BTC Hold $60,000 Su...

Crypto license applications surges in Turkey amid new r...

Ethereum ETF launch drives $2.2B inflows: CoinShares

Most Viewed Posts

Bitcoin falls below $54,000 as U.S. job report stirs hu...

Ripple Stablecoin To Be Launched in Weeks – CEO Brad Ga...

Bitcoin Not Safe In Geopolitical Tension But 'Buy the D...

"Jailbreak‑Proof" AI Models Hacked in Hours — OpenAI’s Safety Claims Tested on Launch Day

What Happened?

Coinccino Insight

Why It Matters Globally

Tags:

Related Posts

Follow Us

Popular Posts

Recommended Posts