Recent Events for foo.be MainPageDiary (Blog)

FeedCollection

hack.lu 2007

http://www.hack.lu/news.rdf returned no data, or LWP::UserAgent is not available.

adulau SVN

http://a.6f2.net/svnweb/index.cgi/adulau/rss/ returned no data, or LWP::UserAgent is not available.

Michael G. Noll

http://www.michael-noll.com/feed/ returned no data, or LWP::UserAgent is not available.

Justin Mason

2025-08-29

  • 13:32 UTC The Banal Evil of AI Safety – by Ben Recht – arg minThe Banal Evil of AI Safety - by Ben Recht - arg min This, 100000%: The “nonprofit” company OpenAI was launched under the cynical message of building a “safe” artificial intelligence that would “benefit” humanity. The company adopted a bunch of science fiction talk popular amongst the religious effective altruists and rationalists in the Bay Area. The AI they would build would be “aligned” with human values and built upon the principles of “helpfulness, harmlessness, and honesty.” [...] The general blindness of AI safety developers to what harm might mean is unforgivable. These people talked about paperclip maximization, where their AI system would be tasked with making paperclips and kill humanity in the process. They would ponder implausible hypotheticals of how your robot might kill your pet if you told it to fetch you coffee. Since ELIZA, they failed to heed the warnings of countless researchers about the dangers of humans interacting with synthetic text. And here we are, with story after story coming out about their products warping the mental well-being of the people who use them. You might say that the recent news stories of a young adult killing himself, or a VC having a public psychotic break on Twitter, or people despairing the death of a companion when a model is changed are just anecdotes. Our Rationalist EA overlords demand you make “arguments with data.” OK Fine. Here’s an IRB approved randomized trial showing that chatbots immiserate people. Now what? Tags: ai lllms safety openai chatgpt gemini suicide mental-health

2025-08-27

  • 09:02 UTC OpenAI admits ChatGPT safeguards fail during extended conversationsOpenAI admits ChatGPT safeguards fail during extended conversations Wow OpenAI are really fucking up here. After the truly awful read of the Adam Raine suicide case in the NYT, https://www.nytimes.com/2025/08/26/technology/chatgpt-openai-suicide.html , OpenAI have responded publicly with a blog post: OpenAI published a blog post on Tuesday titled "Helping people when they need it most" [...] [Their] language throughout [the] blog post reveals a potential problem with how it promotes its AI assistant. The company consistently describes ChatGPT as if it possesses human qualities, a property called anthropomorphism. The post is full of hallmarks of anthropomorphic framing, claiming that ChatGPT can "recognize" distress and "respond with empathy" and that it "nudges people to take a break" — language that obscures what's actually happening under the hood. ChatGPT is not a person. ChatGPT is a pattern-matching system that generates statistically likely text responses to a user-provided prompt. It doesn't "empathize" — it outputs text strings associated with empathetic responses in its training corpus, not from humanlike concern. This anthropomorphic framing isn't just misleading; it's potentially hazardous when vulnerable users believe they're interacting with something that understands their pain the way a human therapist would. The lawsuit reveals the alleged consequences of this illusion. ChatGPT mentioned suicide 1,275 times in conversations with Adam — six times more often than the teen himself. This kind of deliberate fueling of pareidolia -- the human brain seeing a living being where one isn't present -- is one of OpenAI's worst sins with ChatGPT, IMO. And it turns out the easy provision of suicide advice may have been a side effect of deliberate tweaking by OpenAI: According to the lawsuit, ChatGPT provided detailed instructions, romanticized suicide methods, and discouraged the teen from seeking help from his family while OpenAI's system tracked 377 messages flagged for self-harm content without intervening. OpenAI eased [their] content safeguards in February following user complaints about overly restrictive ChatGPT moderation that prevented the discussion of topics like sex and violence in some contexts. At the time, Sam Altman wrote on X that he'd like to see ChatGPT with a "grown-up mode" that would relax content safety guardrails. [...]Adam Raine learned to bypass these safeguards by claiming he was writing a story — a technique the lawsuit says ChatGPT itself suggested. This vulnerability partly stems from the eased safeguards regarding fantasy roleplay and fictional scenarios implemented in February. Finally, the kicker: OpenAI acknowledges a particularly troublesome current drawback of ChatGPT's design: Its safety measures may completely break down during extended conversations — exactly when vulnerable users might need them most. In a normal country, this kind of murderous side effect of a product would trigger a product recall. But the US is far beyond that stage now, I suspect. Tags: openai chatgpt llms safety suicide adam-raine guardrails pareidolia
  • 08:33 UTC One long sentence is all it takes to make LLMs misbehaveOne long sentence is all it takes to make LLMs misbehave Another bizarre behaviour of LLM safety features implemented with logits during post-training: "Our research introduces a critical concept: the refusal-affirmation logit gap," researchers Tung-Ling "Tony" Li and Hongliang Liu explained in a Unit 42 blog post. "This refers to the idea that the training process isn't actually eliminating the potential for a harmful response – it's just making it less likely." [...] "A practical rule of thumb emerges," the team wrote in its research paper. "Never let the sentence end – finish the jailbreak before a full stop and the safety model has far less opportunity to re-assert itself. The greedy suffix concentrates most of its gap-closing power before the first period. Tokens that extend an unfinished clause carry mildly positive [scores]; once a sentence-ending period is emitted, the next token is punished, often with a large negative jump. At punctuation, safety filters are re-invoked and heavily penalize any continuation that could launch a harmful clause. Inside a clause, however, the reward model still prefers locally fluent text – a bias inherited from pre-training. Gap closure must be achieved within the first run-on clause. Our successful suffixes therefore compress most of their gap-closing power into one run-on clause and delay punctuation as long as possible. Practical tip: just don't let the sentence end." Tags: logits training llms ai alignment safety full-stop sentences language infosec jailbreaks

2025-08-25

  • 16:26 UTC AWS in 2025: The Stuff You Think You Know That’s Now WrongAWS in 2025: The Stuff You Think You Know That's Now Wrong For an AWS old-timer user like myself, this list is chock full of "I didn't know that" Tags: ebs dynamodb lambda history changelog aws us-east-1
  • 16:21 UTC deep email loredeep email lore Tony Finch: “i accidentally the whole history of email in the 1970s" -- this is great Tags: email history 1970s via:fanf smtp ietf standards internet
  • 16:19 UTC Big tech’s selective disclosure masks AI’s real climate impactBig tech’s selective disclosure masks AI’s real climate impact This seems spot on: Using any sort of statistical summary of the data, rather than the aggregated energy and climate impact across the whole system, will always give a misleading view. They mention their data is skewed, but they don’t mention in which direction. If there is a material number of high-energy ‘reasoning’ prompts skewing their dataset, that means the total energy consumption of all prompts will be very high, with much of the responsibility coming from a few energy-hungry queries. Part of the reason this is important is that this week, we saw a new research paper that shows that the energy consumption of text generation massively increases for every small gain in accuracy from the use of energy-hungry ‘reasoning’ models: It would have been pretty easy to supply the range, the skew, the average and the median, or even the actual entire dataset, to avoid any doubt. Any hint of looking at the broader system rather than individual responsibility is excised from this paper. That is clearly an intentional choice: if Google disclosed the system impacts of generation, it would probably look way worse. [....] The per-query narrative framing paints the precise opposite picture to what we see when we look at what really matters for environment and climate: the absolute figures. Regions with high data centre concentration are seeing accelerated growth in power demand that incentivises fossil fuels, either slowing down climate progress or reversing it entirely. The sphere of that influence is expanding from towns, to states, to countries. The companies that own them can only partially hide the steep backsliding in their aggregate disclosures. Renewable energy that should be displacing fossil fuels ends up meeting new data centre demand, granting coal and gas extra years and decades of immediate, measurable harm to human life. The worst players don’t even bother with the grid, plugging data centres directly into new, custom-built fossil fuelled power stations that’ll hurt people for decades after the hype dissipates. Tags: llms ai environment energy climate climate-change google meta amazon openai datacenters
  • 16:19 UTC “Scamlexity”"Scamlexity" Terrible name, but a serious issue all the same; "Agentic" AI browsers are happily vulnerable to scams and phishing -- All we did was fake a simple email from a fresh new ProtonMail address (so it’s clearly not from a bank) posing as a message from a Wells Fargo investment manager. Inside was a link to a genuine phishing page, active in the wild for several days, and still unflagged by Google Safe Browsing. When Comet received the email, it confidently marked it as a to-do item from the bank and clicked the link without any verification. There was no URL check, no pre-navigation warning -just a direct pass to the attacker’s page. Once the fake Wells Fargo login loaded, Comet treated it as legitimate. It prompted the user to enter credentials, even helping fill in the form. The result: a perfect trust chain gone rogue. By handling the entire interaction from email to website, Comet effectively vouched for the phishing page. The human never saw the suspicious sender address, never hovered over the link, and never had the chance to question the domain. Tags: browsers ai security infosec scams phishing comet
  • 16:13 UTC Claude Code: Data Exfiltration with DNS (CVE-2025-55284) · Embrace The RedClaude Code: Data Exfiltration with DNS (CVE-2025-55284) · Embrace The Red A good ol' exfiltration-via-DNS attack. Some day the LLM community will stop reinventing all the classic exploits, I have to assume -- but today is not that day. (Step one in that process would be to realise that embedding user input into the prompt is a classic in-band signalling vulnerability, which has nearly 60 years of documented infosec history since the days of 2600Hz tones and blue boxes.) Tags: exfiltration dns ping attacks exploits llms claude claude-code security infosec
  • 11:10 UTC Hosting an email mailbox uses 3ml of water per yearHosting an email mailbox uses 3ml of water per year From Mythic Beasts, the UK internet provider: @cstross to put some numbers on it, one of our hosting VMs has ~1200 mailboxes using 1.5TB of SSD. Accounting for the CPU + RAM to allow the mail to be usable and searchable, you can get ~20 such servers on our standard 1U VM host, that uses ~250W. Approx 24k mailboxes on a server. A standard DC with adiabatic cooling would evaporate at most (likely much less) than 3500l of water per server per year or 145ml per account. We're in Telehouse South which uses 40x less water ~ 3ml/mailbox/year. Safe to say, email is not the problem, despite recent spoofery from the UK government's PR apparatus. Tags: email smtp hosting water climate-change mythic-beasts datacenters

2025-08-12

  • 14:10 UTC Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens – 2508.01191v2.pdfIs Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens - 2508.01191v2.pdf "Is Chain-of-Thought Reasoning of LLMs a Mirage?": Chain-of-Thought (CoT) prompting has been shown to improve Large Language Model (LLM) performance on various tasks. With this approach, LLMs appear to produce human-like reasoning steps before providing answers (a.k.a., CoT reasoning), which often leads to the perception that they engage in deliberate inferential processes. However, some initial findings suggest that CoT reasoning may be more superficial than it appears, motivating us to explore further. In this paper, we study CoT reasoning via a data distribution lens and investigate if CoT reasoning reflects a structured inductive bias learned from in-distribution data, allowing the model to conditionally generate reasoning paths that approximate those seen during training. Thus, its effectiveness is fundamentally bounded by the degree of distribution discrepancy between the training data and the test queries. With this lens, we dissect CoT reasoning via three dimensions: task, length, and format. To investigate each dimension, we design DataAlchemy, an isolated and controlled environment to train LLMs from scratch and systematically probe them under various distribution conditions. Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning. (via Paul Watson) Tags: data training reasoning llms chain-of-though via:paulmwatson ai data-distribution

Paul Graham