Friday, February 27, 2026
HomeAINew method could increase the efficiency of LLM training

New method could increase the efficiency of LLM training

MIT Researchers Develop a New Method to Increase Efficiency in Training Large Language Models

In the rapidly evolving field of artificial intelligence, Large Language Models (LLMs) play a pivotal role. They are designed to solve complex tasks by breaking them down into smaller, more manageable steps. These powerful models are particularly adept at handling demanding tasks such as advanced programming and multi-stage planning. However, training these reasoning models comes with a hefty price tag in terms of computational and energy expenditure. Researchers at MIT and elsewhere have now found a way to use computational downtime to efficiently speed up the training of reasoning models. The source can be found Here.

Using Downtime to Accelerate Training

The new method involves training a smaller, faster model – known as the “drafter” – to predict the results of the larger reasoning LLM. This smaller model then checks the larger model, ultimately reducing the amount of work the reasoning model has to do and thereby speeding up the training process. This system’s efficiency lies in its ability to adaptively train and deploy the smaller model, only calling it into action when some processors are idle. This means that computational resources that would otherwise have been wasted are instead used to accelerate training without incurring additional overhead.

Improved Speed and Accuracy

When tested on multi-argument LLMs, this method was found to double the training speed while maintaining the same level of accuracy. This breakthrough could significantly reduce the cost and increase the energy efficiency of developing advanced LLMs for a variety of applications, including predicting financial trends or detecting risks in power grids.

Efficiency is the Key

“People want models that can handle more complex tasks. But if that is the goal of model development, then we need to prioritize efficiency. We found a lossless solution to this problem and then developed a full-stack system that can deliver pretty dramatic speedups in practice,” says Qinghao Hu, a postdoctoral researcher at MIT and co-lead author of a paper on the technique.

He was assisted on the paper by co-lead author Shang Yang, a doctoral student in electrical engineering and computer science (EECS); Junxian Guo, an EECS doctoral student; senior author Song Han, associate professor of EECS, member of the Electronics Research Laboratory and distinguished scientist of NVIDIA; as well as others at NVIDIA, ETH Zurich, the MIT-IBM Watson AI Lab and the University of Massachusetts at Amherst.

Overcoming the Training Bottleneck

One of the main challenges in training reasoning LLMs is the time-consuming process of generating multiple responses, called “rollout”, which can take up to 85 percent of the execution time required for RL training. The researchers found that although all processors in the training set must finish their responses before they can proceed to the next step, some processors may be working on very long responses, leaving others idle as they wait for them to complete.

“Our goal was to convert this idle time into acceleration without wasting costs,” Hu adds.

Introducing an Adaptive Solution

To address this issue, the researchers developed a flexible system called “Taming the Long Tail,” or TLT. This system includes an adaptive drafter trainer, which uses free time on idle processors to train the drafter model on the fly, and an adaptive rollout engine, which manages speculative decoding to automatically select the optimal strategy for each new input batch.

The drafter model is also designed to be lightweight so it can be trained quickly. By reusing some components of the reasoning model training process, TLT is able to achieve further speedup gains.

Future Prospects

The researchers plan to integrate TLT into more types of training and inference frameworks and find new applications for reinforcement learning that could be accelerated using this approach. “As reasoning continues to be the main workload driving the demand for reasoning, Qinghao’s TLT does a great job of addressing the computational bottleneck in training these reasoning models. I think this method will be very helpful in the context of efficient AI computation,” says Han.

This groundbreaking research was funded by the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, the MIT Amazon Science Hub, the Hyundai Motor Company, and the National Science Foundation.

Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here