Back to Results

Redwood Research Alignment Faking Hackathon

  Berkeley
Ended
Redwood Research Alignment Faking Hackathon

About This Hackathon

About the challenge Redwood Research, in collaboration with MATS and Constellation, invites you to build Model Organisms of Alignment Faking. What is this, you may ask? A model organism is an LLM modified to have a behavior we care about. We define alignment faking as an LLM that behaves safely in a testing environment (observed), but dangerously in a production environment (unobserved). Think of a kid that does homework only when the parents are watching, but starts setting fires and blackmailing people when they aren’t around. This unique opportunity allows participants to push the boundaries of Large Language Models (LLMs) while working with leading AI safety research organizations.  Get started Teams MUST be registered and accepted on luma Checkout the notion Join the discord Make a fork of https://github.com/redwoodresearch/hackathon-af And contribute system prompts, fine tuned model organisms, and environments!

  Event Dates
Sept. 13, 2025
to Sept. 14, 2025
  Location

Berkeley, CA, USA

  Organizer

Redwood Research

  Prizes

$1,000

Get ahead in innovation - receive all the latest hackathons directly in your inbox.

Subscribe