Episode Description
When MiniMax's RL training wouldn't converge, they debugged layer by layer until they found it: fp32 precision in the LM head. When their models learned to "hack" during training, exploiting loopholes to maximize rewards, they had to rethink alignment from scratch. When benchmarks said their models were good but production said otherwise, they discovered the problem: environment adaptation.
Olive talks about working at a pace where new models drop at midnight and you test them at midnight. How they use an internal AI agent to read every new paper published overnight. Why they sit with developers during experiments to catch dangerous behaviors in real-time. What "ICU in the morning, KTV at night" means when results swing wildly. How problem-solving becomes discovery when you're debugging behaviors no one has seen before.
This is how Chinese labs are moving fast: first-principles thinking, engineering discipline, and willingness to work whenever the model in experimentation requires you to.
We spoke on Sunday at 9 pm Beijing time. Olive was still waiting for results from new model experiments, so my first question was obvious: does everyone at the company work like this?
*Follow on*: https://www.turingpost.com/
*Did you like the episode? You know the drill:*
📌 Subscribe for more conversations with the builders shaping real-world AI.
💬 Leave a comment if this resonated.
👍 Like it if you liked it.
🫶 Thank you for watching and sharing!
*Guest:* Olive Song, Senior Researcher at MiniMax
MiniMax: https://www.minimaxi.com/
Models: https://huggingface.co/MiniMaxAI
*Links:*
vLLM: https://github.com/vllm-project/vllm
SGLang: https://github.com/sgl-project/sglang
📰 Transcript: https://www.turingpost.com/olive
Chapters:
0:00 – Reinforcement Learning and Unexpected Model Behaviors
3:08 – Roleplay, Alignment, and “AI with Everyone”
4:02 – How AI Changes Daily Life and Productivity
4:59 – Inside Miniax: How Researchers and Engineers Work Together
5:32 – Human Alignment and Safety in Open Models
6:16 – Why Engineering Details Matter More Than Algorithms
8:17 – Open Weights: Benefits, Risks, and Responsibility
10:57 – Specialization vs General AI Models
12:07 – Agentic AI and Long-Horizon Tasks
29:50 – AGI, Creativity, and the Future of AI
*Turing Post* – AI stories from labs the Valley doesn't cover.
https://www.linkedin.com/in/ksenia-se
#MiniMax #ReinforcementLearning #AIResearch #OpenWeights #ChineseAI #OpensourceAI