Inside MiniMax: How They Build Open Models

March 11

31 mins

Episode Description

First Western interview with a senior MiniMax researcher. Olive Song explains how they actually build models that work.

When MiniMax's RL training wouldn't converge, they debugged layer by layer until they found it: fp32 precision in the LM head. When their models learned to "hack" during training, exploiting loopholes to maximize rewards, they had to rethink alignment from scratch. When benchmarks said their models were good but production said otherwise, they discovered the problem: environment adaptation.

Olive talks about working at a pace where new models drop at midnight and you test them at midnight. How they use an internal AI agent to read every new paper published overnight. Why they sit with developers during experiments to catch dangerous behaviors in real-time. What "ICU in the morning, KTV at night" means when results swing wildly. How problem-solving becomes discovery when you're debugging behaviors no one has seen before.

This is how Chinese labs are moving fast: first-principles thinking, engineering discipline, and willingness to work whenever the model in experimentation requires you to.

We spoke on Sunday at 9 pm Beijing time. Olive was still waiting for results from new model experiments, so my first question was obvious: does everyone at the company work like this?

*Follow on*: https://www.turingpost.com/

*Did you like the episode? You know the drill:*

📌 Subscribe for more conversations with the builders shaping real-world AI.

💬 Leave a comment if this resonated.

👍 Like it if you liked it.

🫶 Thank you for watching and sharing!

*Guest:* Olive Song, Senior Researcher at MiniMax

MiniMax: https://www.minimaxi.com/

Models: https://huggingface.co/MiniMaxAI

*Links:*

vLLM: https://github.com/vllm-project/vllm

SGLang: https://github.com/sgl-project/sglang