Large Language Models (LLMs) have shown impressive capabilities in handling various reasoning tasks expressed in natural language, such as math word problems and code generation. However, as the complexity of these tasks increases, LLMs often struggle with errors, hallucinations, and inconsistencies due to their auto-regressive nature. These challenges are particularly pronounced in multi-step reasoning tasks, where the need for more logical, deliberative “System 2” thinking becomes crucial. Traditional methods like Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) have aimed to align LLM outputs with human expectations, but they often require extensive expertise and computational resources.

Researchers from Skywork AI and Nanyang Technological University have introduced Q*, a framework designed to enhance the multi-step reasoning capabilities of LLMs through deliberative planning. Q* formalizes LLM reasoning as a Markov Decision Process (MDP) and employs heuristic search methods, like A* search, to guide LLMs in selecting the most promising next steps efficiently. The framework uses a sophisticated architecture that includes offline reinforcement learning, best sequence selection from rollouts, and completion using stronger LLMs for estimating optimal Q-values. These methods allow Q* to learn from training data without task-specific modifications, making it adaptable to various reasoning tasks.

Q* has demonstrated significant performance improvements across multiple reasoning tasks. For instance, it increased the accuracy of Llama-2-7b to 80.8% on the GSM8K dataset and improved Llama-2-7b and DeepSeekMath-7b to 55.4% accuracy on the MATH dataset. In code generation, Q* enhanced CodeQwen1.5-7b-Chat to 77.0% accuracy on the MBPP dataset. These results highlight the framework’s effectiveness in enhancing LLM performance, outperforming traditional methods and some closed-source models. By offering a robust deliberation framework, Q* significantly improves LLMs’ ability to tackle complex multi-step reasoning tasks.

Source.

TOP STORIES

Unauthorized Users Breach Anthropic's Mythos Cybersecurity Tool
Unauthorized users have gained access to Anthropic’s Mythos, raising security concerns …
Clarifai Deletes 3 Million Photos Amid FTC Investigation Over Data Use
Clarifai has deleted millions of photos from OkCupid amid an FTC investigation into data misuse …
Nvidia's AI Revolution - The Vera Rubin Platform and Future Demand
Nvidia’s Vera Rubin platform is set to revolutionize AI inference with unmatched performance …
Tim Cook's Departure - A Strategic Shift in Apple's AI Landscape
Apple’s leadership transition highlights a strategic focus on silicon for AI innovation …
Tim Cook's Departure Marks a New Era for Apple's AI Strategy
Apple’s leadership changes signal a strategic shift towards AI and silicon innovation …
New Tennessee Law on AI and Mental Health - A Step Forward or Backward?
Tennessee’s new law restricts AI claims in mental health but may create loopholes …

latest stories