OpenAGI - Your Codes Reflect!
Industry Research
Rewriting the AI landscape
Exploring the foundations and frontiers of Large Language Models and Artificial General Intelligence through a curated collection of ground-breaking research.
Featured Research Papers
Key Findings on ML Architecture & Reasoning from Frontier Research.
Sapient Intelligence
Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori
Demonstrates that computational architecture fundamentally constrains reasoning capability—a constraint that parameter scaling alone cannot overcome.
Key Findings
Max Planck Institute for Informatics, Google, Peking University
Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem, Yongqin Xian, Jan Eric Lenssen, Liwei Wang, Federico Tombari, Bernt Schiele
Addresses inefficiencies in scaling by treating model parameters as learnable tokens and using cross-attention.
Key Findings
FAIR at Meta
François Fleuret
Extends standard decoders through a conditional Variational Autoencoder (VAE) framework to learn explicit latent random variables.
Key Findings
Meta FAIR
Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, et al.
Shifts from treating code as static text to learning what code does through observation-action trajectories in execution environments.
Key Findings
Meta, Polytechnique Montréal
Mahmoud Assran, Adrien Bardes, David Fan, Quentin Garrido, Yann LeCun, et al.
Combines internet-scale passive observation with minimal interaction data to build world models for understanding, prediction, and planning.
Key Findings
Apple
Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar
Questions whether Large Reasoning Models (LRMs) actually solve complex problems or merely create an appearance of reasoning.
Key Findings
MIT
Alex L. Zhang, Tim Kraska, Omar Khattab
Proposes a framework where LLMs programmatically decompose arbitrarily long context through recursive calls using a Python REPL.
Key Findings
Unified Synthesis: Seven Papers Converge on One Truth
The era of monolithic, single-pass computation is ending. Scaling through decomposition, recursion, and hierarchical reasoning is the frontier.
1 Depth/Recursion Beats Width
- HRM: Recurrent depth exceeds static Transformer depth for reasoning.
- RLMs: Recursive decomposition scales beyond fixed context windows.
- Tokenformer: Flexible architecture enables efficient progressive scaling.
2 World Models as Foundation
- CWM: Execution traces teach code semantics through world dynamics.
- V-JEPA 2: Video dynamics teach physical world understanding.
- RLMs: Contextual understanding requires interacting with the world.
3 Latent Structure & Planning
- Free Transformer: Explicit latents improve reasoning decision points.
- V-JEPA 2: Representation-space planning enables zero-shot transfer.
- RLMs: Emergent decomposition strategies discovered autonomously.
4 The Complexity Ceiling
- Illusion of Thinking: Hard limits on compositional complexity.
- HRM: Recurrent depth address but doesn't solve universal limits.
- RLMs: Decomposition handles beyond-ceiling problems via recursion.
5 Problem-Centric Design
- RLMs: Ask "How to understand context?" not "How to decompose task?".
- CWM: Models understand code via observation, not symbolic manipulation.
- Delegate decomposition strategy to the model, don't prescribe it.
The Next Frontier
Tomorrow's AI combines HRM-style recurrence, world-model grounding, and RLM-style recursive decomposition.
Practical Implications for ML Infrastructure
Architecture
- Depth/recursion over width
- Flexible parameter interaction
- Latent problem structure
Training
- Mid-training world modeling
- Inference-time scaling targets
- Complexity-graded datasets
Systems
- Asynchronous inference engines
- Unbounded prefix caching
- Recursive depth hyperparameters
Agents
- Context-decomposers design
- World models + Recursion
- Hybrid Symbolic-Neural coupling
Language Models Research
Fundamental research on Language Models optimization, adaptation, and inference.
# Inference & Decoding Optimization
5 PapersMinghao Yan, Saurabh Agarwal, Shivaram Venkataraman
In-depth study of factors affecting speculative decoding performance at scale.
Key Findings
Yaniv Leviathan, Matan Kalman, Yossi Matias
Accelerates autoregressive inference by using a small draft model to predict future tokens.
Key Findings
Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, John Jumper
Accelerates transformer decoding by generating multiple tokens from each transformer call.
Key Findings
# Alignment, Data & Training
4 PapersPaul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei
Seminal work on learning from human comparisons rather than reward functions, a precursor to RLHF.
Key Findings
Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le
Introduces FLAN, showing that instruction tuning on a collection of tasks substantially improves zero-shot performance on unseen tasks.
Key Findings
Emanuel Ben-Baruch, Adam Botach, Igor Kviatkovsky, Manoj Aggarwal, Gérard Medioni
Integrates Knowledge Distillation (KD) to improve performance on pruned datasets.
Key Findings
# Foundation & Scaling
10 PapersAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
The seminal paper introducing the Transformer architecture, replacing recurrence with self-attention.
Key Findings
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, J. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. Henighan, R. Child, A. Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, B. Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, I. Sutskever, Dario Amodei
This landmark paper introduces GPT-3, a 175-billion parameter decoder-only model, demonstrating that massive scaling allows models to perform diverse tasks without task-specific fine-tuning.
Key Findings
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever
The original GPT-1 paper which proposed a two-stage framework: unsupervised pre-training on a large corpus followed by supervised fine-tuning for specific tasks.
Key Findings
# Quantization & Model Compression
7 PapersJangho Kim, Yash Bhalgat, Jinwon Lee, Chirag Patel, Nojun Kwak
Coordination of quantization and knowledge distillation to maintain performance in low-precision models.
Key Findings
Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer
First optimizers to use 8-bit statistics while maintaining 32-bit performance levels.
Key Findings
Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer
Direct 8-bit matrix multiplication using mixed-precision decomposition to handle outliers.
Key Findings
# Retrieval-Augmented Generation (RAG)
4 PapersPatrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
Introduces RAG, combining parametric and non-parametric memory for generation.
Key Findings
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang
A comprehensive review of RAG paradigms, techniques, and evaluation frameworks, highlighting how external knowledge improves LLM accuracy and credibility.
Key Findings
Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue
An up-to-date survey of RAG techniques, focusing on retrievers, retrieval fusion, and industrial-scale deployment challenges.
Key Findings
# Parameter-Efficient Fine-Tuning (PEFT)
12 PapersXiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, Jie Tang
Introduces P-Tuning, a method for training continuous prompt embeddings to improve NLU performance.
Key Findings
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
Classic adaptation technique that injects trainable low-rank matrices into Transformer layers.
Key Findings
Nam Hyeon-Woo, Moon Ye-Bin, Tae-Hyun Oh
Re-parameterization using low-rank weights and Hadamard product for efficient federated learning.
Key Findings
# Prompt Engineering & Optimization
6 PapersJiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi
Introduces generated knowledge prompting, where an LLM generates relevant knowledge to aid its own reasoning on commonsense tasks.
Key Findings
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
Investigates why in-context learning works, finding that the ground truth labels in demonstrations are less important than the format and distribution.
Key Findings
Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré
Introduces ASK ME ANYTHING (AMA), a strategy to aggregate multiple imperfect prompts for high-quality LLM performance.
Key Findings
# Reasoning Strategies
11 PapersJason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou
Explores how generating intermediate reasoning steps improves LLM performance on complex tasks.
Key Findings
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou
Introduces a new decoding strategy for complex reasoning by selecting the most consistent answer across multiple reasoning paths.
Key Findings
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi
Introduces least-to-most prompting, a strategy that breaks down complex problems into subproblems solved in sequence.
Key Findings
# Agentic Frameworks & Tool Use
17 PapersShunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
Introduces ReAct, a framework where LLMs generate reasoning traces and task-specific actions in an interleaved manner.
Key Findings
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig
Introduces PAL, a method that uses LLMs to generate programs as intermediate steps, offloading computation to a Python interpreter.
Key Findings
Zemin Liu, Xingtong Yu, Yuan Fang, Xinming Zhang
Proposes a novel pre-training and prompting framework for graphs to unify tasks and reduce labeling requirements.
Key Findings
Transform Your Enterprise with Research-Driven AI Innovation
Are you interested in AI-Powered Products?
Get In Conversation With Us
We co-create enterprise AI architecture, develop cutting-edge agentic AI patterns, advance LLMOps methodologies, and engineer innovative testing frameworks for next-generation AI products with our research-centric approach.
Tippman Pl, Chantilly, VA
20152, USA
Timezone
Oakglade Crescent, Mississauga, ON
L5C 1X4, Canada