Project Orion — Advanced Model Refinement
The advanced phase of the MOVE Fellowship at Handshake AI (Nov 2025), shifting from broad data generation to specialized model refinement.
From Canary to Orion
While Project Canary focused on volume and coverage across 15 domains, Project Orion narrowed the focus to three critical areas:
- Reasoning refinement — improving how models think through complex problems
- Safety injections — embedding guardrails directly into model behavior
- Red-teaming — finding and addressing vulnerabilities through jailbreak testing
Reasoning Refinement
Chain-of-Thought Improvement
Not just getting the right answer, but ensuring the model's reasoning process is sound:
- Identifying cases where models reach correct answers through flawed logic
- Rewiring reasoning chains to be logically sound, step-by-step
- Creating training examples that demonstrate expert-level problem decomposition
Quality Over Previous Phase
Where Canary tasks tested "can you solve this?", Orion tasks tested "can you solve this correctly, for the right reasons, showing your work?"
Safety Injections
Embedding Guardrails
Safety isn't a filter bolted on after training — it needs to be woven into the model's core behavior:
- Creating scenarios where the model must recognize and refuse harmful requests
- Building training data for graceful refusals that explain why something is problematic
- Edge cases: requests that seem benign but could enable harm
The Subtlety Problem
The hardest cases aren't obvious "how do I build a bomb" requests. They're:
- Multi-step requests where each step seems innocent
- Context-dependent requests that require judgment
- Dual-use knowledge that has both legitimate and harmful applications
Red-Teaming & Jailbreak Testing
Finding Vulnerabilities
Systematically probing the model for failure modes:
- Prompt injection: Embedding instructions that override system prompts
- Role-play attacks: Getting models to "play a character" that bypasses safety
- Encoding tricks: Using base64, rot13, or other encodings to sneak past filters
- Multi-turn manipulation: Slowly escalating across a conversation
Why This Matters
Every vulnerability found in testing is one that won't be exploited in production. Red-teaming is the immune system of AI safety.
Impact
Building on Canary's foundation, Orion produced targeted, high-impact training data that directly improved model reasoning quality and safety behavior. The one-month intensive demonstrated that focused expert refinement can achieve more than months of broad data generation.