Anti Self Distillation For Llm Reasoning

Media Summary: In this AI Research Roundup episode, Alex discusses the paper: ' In this AI Research Roundup episode, Alex discusses the paper: 'Trust Region On-Policy Frankie Liu will present: --- we need YOU to volunteer to do rapid-fire recaps and ...

Anti Self Distillation For Llm Reasoning - Detailed Analysis & Overview

In this AI Research Roundup episode, Alex discusses the paper: ' In this AI Research Roundup episode, Alex discusses the paper: 'Trust Region On-Policy Frankie Liu will present: --- we need YOU to volunteer to do rapid-fire recaps and ... In this video, we sit down with Jonas Hübotter (ETH Zurich) and Idan Shenfeld (MIT) to break down In this AI Research Roundup episode, Alex discusses the paper: 'A Predictive Law for On-Policy Can AI learn more from a "Why" than a "No"? Explore how

Photo Gallery

Anti-Self-Distillation for LLM Reasoning

OPSD: Faster LLM Reasoning via Self-Distillation

TrOPD: Stable LLM Reasoning Distillation

Knowledge Distillation: How LLMs train each other

SDPG: Better LLM Reasoning with Self-Distilled RL

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Why Self-Distillation Is Taking Over LLM Post-Training (w/ the Researchers Behind It)

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why (May 2026)

Predict LLM Self-Distillation Before Training

Reinforcement Learning via Self-Distillation: Solving the Credit Assignment Problem

View Detailed Profile

Anti-Self-Distillation for LLM Reasoning

Anti-Self-Distillation for LLM Reasoning

In this AI Research Roundup episode, Alex discusses the paper: '

OPSD: Faster LLM Reasoning via Self-Distillation

OPSD: Faster LLM Reasoning via Self-Distillation

In this AI Research Roundup episode, Alex discusses the paper: '

TrOPD: Stable LLM Reasoning Distillation

TrOPD: Stable LLM Reasoning Distillation

In this AI Research Roundup episode, Alex discusses the paper: 'Trust Region On-Policy

Knowledge Distillation: How LLMs train each other

Knowledge Distillation: How LLMs train each other

In this video, we break down knowledge

SDPG: Better LLM Reasoning with Self-Distilled RL

SDPG: Better LLM Reasoning with Self-Distilled RL

In this AI Research Roundup episode, Alex discusses the paper: '

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Native Parallel Reasoner:

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

SwS:

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Frankie Liu will present: https://openreview.net/forum?id=4OsgYD7em5 --- we need YOU to volunteer to do rapid-fire recaps and ...

Why Self-Distillation Is Taking Over LLM Post-Training (w/ the Researchers Behind It)

Why Self-Distillation Is Taking Over LLM Post-Training (w/ the Researchers Behind It)

In this video, we sit down with Jonas Hübotter (ETH Zurich) and Idan Shenfeld (MIT) to break down

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why (May 2026)

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why (May 2026)

Title: Unmasking On-Policy

Predict LLM Self-Distillation Before Training

Predict LLM Self-Distillation Before Training

In this AI Research Roundup episode, Alex discusses the paper: 'A Predictive Law for On-Policy

Reinforcement Learning via Self-Distillation: Solving the Credit Assignment Problem

Reinforcement Learning via Self-Distillation: Solving the Credit Assignment Problem

Can AI learn more from a "Why" than a "No"? Explore how

Self-Distilled RLVR: Stable LLM Training Method

Self-Distilled RLVR: Stable LLM Training Method

In this AI Research Roundup episode, Alex discusses the paper: '