OpenAGI - Your Codes Reflect!

Industry Research
Rewriting the AI landscape

Exploring the foundations and frontiers of Large Language Models and Artificial General Intelligence through a curated collection of ground-breaking research.

Explore Research View Evolution
Research

Featured Research Papers

Key Findings on ML Architecture & Reasoning from Frontier Research.

Sapient Intelligence

Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori

Demonstrates that computational architecture fundamentally constrains reasoning capability—a constraint that parameter scaling alone cannot overcome.

Key Findings

Hierarchical convergence creates effective computational depth of N×T steps, overcoming premature convergence.
27M parameters achieved 40.3% on ARC-AGI-1, surpassing larger models like o3-mini and Claude 3.7.
Architectural depth—not parameter count—enables solving problems requiring polynomial-time computation.
Adding model width provides zero performance gain for reasoning tasks; dimensionality hierarchy emerges during training.
Implication: HRM's hierarchical convergence is implementable for custom reasoning systems constrained by memory and wanting to leverage extended computation at inference time.

Max Planck Institute for Informatics, Google, Peking University

Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem, Yongqin Xian, Jan Eric Lenssen, Liwei Wang, Federico Tombari, Bernt Schiele

Addresses inefficiencies in scaling by treating model parameters as learnable tokens and using cross-attention.

Key Findings

Decouples parameter token dimension from feature dimensions, enabling progressive scaling by adding tokens while freezing existng computation.
Scaling from 124M to 1.4B achieved >50% cost reduction; only 15B additional tokens needed for scaling vs 300B from scratch.
Modified softmax provides crucial gradient stability for attention-over-parameters design.
Implication: Tokenformer's parameter reuse directly optimizes model serving costs and enables efficient model family scaling (edge→datacenter).

FAIR at Meta

François Fleuret

Extends standard decoders through a conditional Variational Autoencoder (VAE) framework to learn explicit latent random variables.

Key Findings

Explicitly samples latent variables like 'review sentiment', simplifying the decoder's task.
Requires only ~3.6% overhead for 1.5B models by injecting Z at the midpoint.
Substantial improvements on reasoning tasks (HumanEval+, GSM8K) and multimodal tasks (MMLU).
Implication: Suggests agentic systems benefit from explicit decision points (encoded as latent variables) rather than implicit planning through token sequences.

Meta FAIR

Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, et al.

Shifts from treating code as static text to learning what code does through observation-action trajectories in execution environments.

Key Findings

Captures Python execution traces and Docker trajectories, enabling 'neural code interpretation'.
5 trillion tokens of world modeling data during mid-training phase.
Achieves competitive results on SWE-bench Verified (65.8%) and Math-500 (96.6%).
Implication: Execution-trace pretraining should inform the design of code-understanding systems—focus on what models learn about execution semantics.

Meta, Polytechnique Montréal

Mahmoud Assran, Adrien Bardes, David Fan, Quentin Garrido, Yann LeCun, et al.

Combines internet-scale passive observation with minimal interaction data to build world models for understanding, prediction, and planning.

Key Findings

Predicts in learned representation space instead of pixel space, enabling efficient model-predictive control.
Achieved 44% improvement in action anticipation and state-of-the-art results for 8B class Video QA.
Zero-shot deployment on Franka robots for manipulation without task-specific training or rewards.
Implication: Representation-space planning shows that learned representations enable vastly more efficient downstream planning than pixel-space approaches.

Apple

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar

Questions whether Large Reasoning Models (LRMs) actually solve complex problems or merely create an appearance of reasoning.

Key Findings

LRMs 'overthink' low-complexity problems but completely collapse on high-complexity ones.
As complexity increases, reasoning effort actually decreases after a threshold, suggesting a fundamental scaling limit.
Models describe algorithms correctly but cannot execute them precisely over many sequential steps.
Implication: Truly complex reasoning requires architectural innovation beyond scaling; models hit hard complexity ceilings regardless of parameters.

MIT

Alex L. Zhang, Tim Kraska, Omar Khattab

Proposes a framework where LLMs programmatically decompose arbitrarily long context through recursive calls using a Python REPL.

Key Findings

RLM(GPT-5-mini) (88%) outperforms base GPT-5 (68%) on OOLONG benchmark while being cheaper.
Maintains 95% accuracy on retrieval over 10M+ tokens where base models OOM or fail (<5%).
Discovers emergent strategies: Peeking, Grepping, Partition + Map, and Programmatic processing.
Introduces a new inference-time scaling axis orthogonal to CoT reasoning and ReAct actions.
Implication: Contextual understanding requires decomposing how to interact with world; recursion framework disproportionately benefits smaller models.

Unified Synthesis: Seven Papers Converge on One Truth

The era of monolithic, single-pass computation is ending. Scaling through decomposition, recursion, and hierarchical reasoning is the frontier.

1 Depth/Recursion Beats Width
  • HRM: Recurrent depth exceeds static Transformer depth for reasoning.
  • RLMs: Recursive decomposition scales beyond fixed context windows.
  • Tokenformer: Flexible architecture enables efficient progressive scaling.
2 World Models as Foundation
  • CWM: Execution traces teach code semantics through world dynamics.
  • V-JEPA 2: Video dynamics teach physical world understanding.
  • RLMs: Contextual understanding requires interacting with the world.
3 Latent Structure & Planning
  • Free Transformer: Explicit latents improve reasoning decision points.
  • V-JEPA 2: Representation-space planning enables zero-shot transfer.
  • RLMs: Emergent decomposition strategies discovered autonomously.
4 The Complexity Ceiling
  • Illusion of Thinking: Hard limits on compositional complexity.
  • HRM: Recurrent depth address but doesn't solve universal limits.
  • RLMs: Decomposition handles beyond-ceiling problems via recursion.
5 Problem-Centric Design
  • RLMs: Ask "How to understand context?" not "How to decompose task?".
  • CWM: Models understand code via observation, not symbolic manipulation.
  • Delegate decomposition strategy to the model, don't prescribe it.
The Next Frontier

Tomorrow's AI combines HRM-style recurrence, world-model grounding, and RLM-style recursive decomposition.

Hybrid Symbolic-Neural
Complexity-Graded Benchmarks

Practical Implications for ML Infrastructure

Architecture
  • Depth/recursion over width
  • Flexible parameter interaction
  • Latent problem structure
Training
  • Mid-training world modeling
  • Inference-time scaling targets
  • Complexity-graded datasets
Systems
  • Asynchronous inference engines
  • Unbounded prefix caching
  • Recursive depth hyperparameters
Agents
  • Context-decomposers design
  • World models + Recursion
  • Hybrid Symbolic-Neural coupling
Basics

Language Models Research

Fundamental research on Language Models optimization, adaptation, and inference.

# Inference & Decoding Optimization

5 Papers
Page 1 of 2

Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

In-depth study of factors affecting speculative decoding performance at scale.

Key Findings

Draft model latency is more important than pure language modeling accuracy.
New design space for hardware-efficient draft models (CPU/GPU optimized).
Provides 111% higher throughput than standard draft model designs.
Implication: Shifts focus from draft model 'intelligence' to draft model 'latency-per-token' for better serving.

Yaniv Leviathan, Matan Kalman, Yossi Matias

Accelerates autoregressive inference by using a small draft model to predict future tokens.

Key Findings

Achieves 2-3x speedup without changing the mathematical output distribution.
Leverages parallel verification on the larger target model.
Applicable to off-the-shelf models without retraining.
Implication: The foundation for high-throughput LLM serving systems like vLLM and TensorRT-LLM.

Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, John Jumper

Accelerates transformer decoding by generating multiple tokens from each transformer call.

Key Findings

Latency of parallel scoring of short continuations is comparable to sampling a single token.
Preserves the target model distribution via a modified rejection sampling scheme.
Achieved 2-2.5x decoding speedup without compromising sample quality.
Implication: Speculative sampling enables significant throughput improvements for massive models like Chinchilla-70B.

# Alignment, Data & Training

4 Papers
Page 1 of 2

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

Seminal work on learning from human comparisons rather than reward functions, a precursor to RLHF.

Key Findings

Demonstrates solving complex RL tasks (Atari, robotics) using non-expert human preferences between trajectory segments.
Reduces human oversight requirements to less than 1% of agent interactions.
Enables training of novel behaviors without hand-coded reward functions.
Implication: The foundation for modern RLHF, allowing AI systems to align with human values and complex goals across sophisticated domains.

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

Introduces FLAN, showing that instruction tuning on a collection of tasks substantially improves zero-shot performance on unseen tasks.

Key Findings

Instruction tuning 137B models on 60+ tasks verbalized via templates significantly boosts zero-shot generalization.
FLAN exceeds zero-shot GPT-3 (175B) on 20 of 25 evaluated tasks.
Surpasses few-shot GPT-3 by large margins on complex benchmarks like ANLI and RTE.
Implication: Establishes instruction tuning as a key technique for eliciting zero-shot capabilities in large language models.

Emanuel Ben-Baruch, Adam Botach, Igor Kviatkovsky, Manoj Aggarwal, Gérard Medioni

Integrates Knowledge Distillation (KD) to improve performance on pruned datasets.

Key Findings

Random pruning with KD is comparable to sophisticated pruning methods.
Superior accuracy on ImageNet with only 50% of the data using KD.
Small capacity teachers can surprisingly outperform larger ones in some pruning regimes.
Implication: Simplifies the data management pipeline for large-scale training through efficient pruning.

# Foundation & Scaling

10 Papers
Page 1 of 4

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

The seminal paper introducing the Transformer architecture, replacing recurrence with self-attention.

Key Findings

Transformer models are more parallelizable and require significantly less training time.
Attention mechanisms allow the model to weight the importance of parts of the input.
Achieved state-of-the-art results on translation tasks almost immediately.
Implication: The foundational architectural breakthrough that enabled modern LLMs.

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, J. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. Henighan, R. Child, A. Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, B. Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, I. Sutskever, Dario Amodei

This landmark paper introduces GPT-3, a 175-billion parameter decoder-only model, demonstrating that massive scaling allows models to perform diverse tasks without task-specific fine-tuning.

Key Findings

Scaling laws hold: increasing parameters, data, and compute leads to predictable improvements in performance.
Large language models can perform 'in-context learning,' solving tasks via a few examples provided in the prompt.
GPT-3 achieved state-of-the-art results in many NLP benchmarks under zero-shot, one-shot, and few-shot settings.
The model exhibits emergent abilities in areas like basic arithmetic, translation, and code generation despite not being specifically trained for them.
Implication: The success of GPT-3 shifted the industry focus toward 'prompt engineering' and massive scale, suggesting that general intelligence can be approached through large-scale generative pre-training.

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

The original GPT-1 paper which proposed a two-stage framework: unsupervised pre-training on a large corpus followed by supervised fine-tuning for specific tasks.

Key Findings

Demonstrated that the Transformer Decoder architecture is highly effective for learning linguistic features from unlabeled text.
Introduced a shift away from task-specific architectures toward a unified model that handles multiple tasks through minimal fine-tuning.
Proved that generative pre-training provides a significant 'warm start' that improves performance on downstream tasks like sentiment analysis and Q&A.
Utilized a 12-layer decoder-only transformer, establishing the foundation for all subsequent GPT iterations.
Implication: This paper established the 'pre-train then fine-tune' paradigm, proving that the decoder-only architecture could serve as a powerful backbone for general natural language understanding.

# Quantization & Model Compression

7 Papers
Page 1 of 3

Jangho Kim, Yash Bhalgat, Jinwon Lee, Chirag Patel, Nojun Kwak

Coordination of quantization and knowledge distillation to maintain performance in low-precision models.

Key Findings

Proposed a three-phase procedure: Self-studying, Co-studying, and Tutoring.
Outperformed existing state-of-the-art methods by significant margins on standard benchmarks.
Recovered full-precision accuracy at very low quantization levels (e.g., W3A3 on ResNet).
Implication: QKD enables the deployment of powerful LLMs on resource-constrained edge devices without sacrificing accuracy.

Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer

First optimizers to use 8-bit statistics while maintaining 32-bit performance levels.

Key Findings

Block-wise dynamic quantization handles the instability of 8-bit optimizer states.
Reduces memory footprint of optimizer states by 75% without accuracy loss.
Enables training larger models on existing hardware (e.g., 2xA100 instead of 4xA100).
Implication: Critical for institutional and consumer-grade training where VRAM is the primary bottleneck.

Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer

Direct 8-bit matrix multiplication using mixed-precision decomposition to handle outliers.

Key Findings

Isolates 'outlier' features into 16-bit while keeping 99.9% of computations in 8-bit.
Enables zero-degradation inference for models up to 175B parameters.
Reduces VRAM requirements by 50% for standard inference tasks.
Implication: Made massive models like OPT-175B accessible on single server consumer hardware.

# Retrieval-Augmented Generation (RAG)

4 Papers
Page 1 of 2

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Introduces RAG, combining parametric and non-parametric memory for generation.

Key Findings

RAG models combine pre-trained parametric (seq2seq) and non-parametric (dense vector index) memory.
Evaluated two formulations: RAG-Sequence (same passages for whole sequence) and RAG-Token (different passages per token).
Sets state-of-the-art on open-domain QA and generates more factual, diverse language than parametric baselines.
Implication: RAG provides a scalable way to update world knowledge and provide provenance for decisions in large language models.

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang

A comprehensive review of RAG paradigms, techniques, and evaluation frameworks, highlighting how external knowledge improves LLM accuracy and credibility.

Key Findings

Categorizes RAG into Naive, Advanced, and Modular paradigms.
Examines retrieval, generation, and augmentation techniques in detail.
Identifies challenges like hallucination and outdated knowledge, and how RAG mitigates them.
Implication: Provides a foundational understanding of RAG systems and identifies future research directions for enhancing LLM factuality and reliability.

Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

An up-to-date survey of RAG techniques, focusing on retrievers, retrieval fusion, and industrial-scale deployment challenges.

Key Findings

Provides tutorial-level technical details for implementing representative RAG techniques.
Focuses on retrieval fusion methods to better integrate multiple external sources.
Discusses the evolution from static RAG to systems with and without continuous knowledge updates.
Implication: Serves as a practical guide for engineers and researchers looking to implement and scale RAG systems in production environments.

# Parameter-Efficient Fine-Tuning (PEFT)

12 Papers
Page 1 of 4

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, Jie Tang

Introduces P-Tuning, a method for training continuous prompt embeddings to improve NLU performance.

Key Findings

Continuous prompt embeddings stabilize training compared to manual discrete prompts.
Significantly improves performance on LAMA and SuperGLUE benchmarks.
Effective for both frozen and fine-tuned models in few-shot and fully-supervised settings.
Implication: P-Tuning simplifies the optimization of large models for specific tasks by focusing on learnable prompt representations.

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

Classic adaptation technique that injects trainable low-rank matrices into Transformer layers.

Key Findings

Reduces trainable parameters by 10,000x compared to full fine-tuning.
Eliminates additional inference latency compared to traditional adapters.
Maintains or exceeds performance of full fine-tuning on large models like GPT-3.
Implication: LoRA is the industry standard for efficient domain adaptation, enabling multi-task deployment on shared hardware.

Nam Hyeon-Woo, Moon Ye-Bin, Tae-Hyun Oh

Re-parameterization using low-rank weights and Hadamard product for efficient federated learning.

Key Findings

Achieves 3-10x lower communication costs while maintaining performance.
Not restricted to low-rank constraints, allowing for larger model capacity.
Strong performance in personalized federated learning applications (pFedPara).
Implication: Enables distributed training of large models across devices with limited bandwidth.

# Prompt Engineering & Optimization

6 Papers
Page 1 of 2

Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi

Introduces generated knowledge prompting, where an LLM generates relevant knowledge to aid its own reasoning on commonsense tasks.

Key Findings

Demonstrates that generating and conditioning on knowledge from an LLM improves performance on commonsense reasoning.
Requires no task-specific supervision for knowledge integration or access to structured knowledge bases.
Achieves state-of-the-art results on NumerSense, CommonsenseQA 2.0, and QASC benchmarks.
Implication: Establishes LLMs as flexible, internal sources of world knowledge that can be leveraged to resolve ambiguities and improve factual reasoning during inference.

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

Investigates why in-context learning works, finding that the ground truth labels in demonstrations are less important than the format and distribution.

Key Findings

Ground truth demonstrations are not required: randomly replacing labels in exemplars barely hurts performance.
The key drivers are the label space, the distribution of input text, and the overall sequence format.
Challenges the notion that models 'learn' the task mechanics from few-shot labels during inference.
Implication: Provides a deeper theoretical understanding of prompting, suggesting that pre-training already contains the task logic and prompts merely 'activate' it.

Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré

Introduces ASK ME ANYTHING (AMA), a strategy to aggregate multiple imperfect prompts for high-quality LLM performance.

Key Findings

Demonstrates that open-ended question-answering (QA) formats outperform restricted labels.
Uses the LLM recursively to transform inputs into effective formats and aggregates noisy votes via weak supervision.
Enables smaller models like GPT-J-6B to match or exceed the few-shot performance of GPT3-175B.
Implication: Reduces the effort in prompt engineering by leveraging aggregation and weak supervision to achieve state-of-the-art results with smaller open-source models.

# Reasoning Strategies

11 Papers
Page 1 of 4

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

Explores how generating intermediate reasoning steps improves LLM performance on complex tasks.

Key Findings

Chain-of-thought (CoT) prompting enables LLMs to decompose complex problems into intermediate steps.
Reasoning abilities emerge naturally in models with sufficient scale (e.g., 100B+ parameters).
Significantly improves performance on arithmetic, commonsense, and symbolic reasoning benchmarks.
Implication: CoT is a fundamental prompting technique that unlocks the latent reasoning capabilities of large-scale language models.

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

Introduces a new decoding strategy for complex reasoning by selecting the most consistent answer across multiple reasoning paths.

Key Findings

Samples a diverse set of reasoning paths instead of using naive greedy decoding.
Selects the final answer by marginalizing out the sampled reasoning paths (majority vote).
Significantly boosts CoT performance across arithmetic and commonsense benchmarks (e.g., +17.9% on GSM8K).
Implication: Provides a robust decoding framework that leverages the diversity of reasoning paths to improve accuracy in stochastic LLMs.

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi

Introduces least-to-most prompting, a strategy that breaks down complex problems into subproblems solved in sequence.

Key Findings

Enables 'easy-to-hard' generalization: solving problems more difficult than shown in prompt exemplars.
Achieved 99%+ accuracy on the SCAN benchmark using only 14 exemplars, surpassing CoT (16%).
Decouples decomposition from subproblem solving, facilitating complex symbolic and mathematical reasoning.
Implication: Provides a powerful technique for compositional generalization, allowing LLMs to tackle multi-step problems that exceed their direct reasoning window.

# Agentic Frameworks & Tool Use

17 Papers
Page 1 of 6

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao

Introduces ReAct, a framework where LLMs generate reasoning traces and task-specific actions in an interleaved manner.

Key Findings

Reasoning traces help the model induce, track, and update action plans while handling exceptions.
Actions allow the model to interface with external sources like Wikipedia to gather real-time information.
Reduces hallucinations and error propagation compared to pure chain-of-thought reasoning.
Implication: Fundamentally changed how LLMs are used as agents, enabling more reliable and interpretable autonomous decision-making.

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig

Introduces PAL, a method that uses LLMs to generate programs as intermediate steps, offloading computation to a Python interpreter.

Key Findings

Sidesteps LLM arithmetic and logic mistakes by delegating 'calculation' to an external runtime.
Keeps the LLM focused on problem decomposition rather than raw computation.
Surpasses PaLM-540B with CoT on math benchmarks despite using smaller backbone models.
Implication: Highlights the synergy between neural reasoning and symbolic execution for robust problem solving.

Zemin Liu, Xingtong Yu, Yuan Fang, Xinming Zhang

Proposes a novel pre-training and prompting framework for graphs to unify tasks and reduce labeling requirements.

Key Findings

Unifies pre-training and downstream graph tasks into a common template.
Employs learnable prompts to locate relevant pre-trained knowledge for specific downstream tasks.
Significantly reduces the need for large task-specific supervision sets in graph representation learning.
Implication: Brings the power of prompting paradigms from NLP to the graph domain, enabling more efficient graph-based AI.

Transform Your Enterprise with Research-Driven AI Innovation

Are you interested in AI-Powered Products?

Get In Conversation With Us

We co-create enterprise AI architecture, develop cutting-edge agentic AI patterns, advance LLMOps methodologies, and engineer innovative testing frameworks for next-generation AI products with our research-centric approach.

Tippman Pl, Chantilly, VA
20152, USA

+1 (571) 294-7595

Timezone

Oakglade Crescent, Mississauga, ON
L5C 1X4, Canada

+1 (647) 760-2121