A deep dive on AI model distillation attacks

April 29

1h 12m

View Transcript

Episode Description

In this solo episode of Risky Business Features James Wilson explores how distillation techniques are both a legitimate way to train smaller models, as well as a way to steal model capabilities. It’s not just a problem for frontier labs! Any LLM-based product could have its competitive advantage stolen through these attacks.

James covers:

High-level concept of distillation
Why it matters including close/open-weight/open-source explanation
Types of distillation and the prompts used
The distillation pipeline end to end
Distillation at scale and mitigation techniques
Hardware resource constraints for distillation

Show notes

Self-Instruct: Aligning Language Models with Self-Generated Instructions
Alpaca: A Strong, Replicable Instruction-Following Model
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Zephyr: Direct Distillation of LM Alignment
Stealing Part of a Production Language Model
Microsoft probes if DeepSeek-linked group improperly obtained OpenAI data, Bloomberg News reports
Detecting and preventing distillation attacks

See all episodes

A deep dive on AI model distillation attacks

View Transcript

Episode Description

Show notes

Never lose your place, on any device