Publications
ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory
Matthew Ho, Chen Si, Zhaoxiang Feng, Fangxu Yu, Yichi Yang, Zhijian Liu, Zhiting Hu, Lianhui Qin.
Runner-up, ARC Prize 2025 Paper Awards (Top 8 of 90).
ArcMemo introduces concept-level memory for LLMs, distilling reusable abstractions from reasoning traces to enable continual learning at test time without weight updates. It features two memory formats—open-ended situation–suggestion pairs and parameterized program-synthesis routines—with a System-2 reasoning retrieval mechanism that outperforms embedding-based retrieval by 15%. On ARC-AGI-1, ArcMemo achieves 59.33% official score (+7.5% relative gain), reaching 70.83% with retries. Concepts transfer cross-model (DeepSeek R1 +16%, Kimi K2 +8%) and cross-domain (AIME math +11.6%).
SOKRATES: Distilling Symbolic Knowledge into Option-Level Reasoning via Solver-Guided Preference Optimization
Zhaoxiang Feng, David Scott Lewis.
Accepted at AAAI 2026 Bridge Program on Logic & AI: Logical and Symbolic Reasoning in Language Models (LMReasoning).
SOKRATES decomposes each reasoning step into a Thought (natural language justification) and an Action (a discrete inference-rule token from 18 first-order logic operations). A symbolic FOL solver verifies each step, and solver-valid vs. invalid traces form DPO preference pairs. An option success head—a lightweight MLP predicting step-level validity—provides interpretable "knowledge" about what the model has learned about logical reasoning.
LabMemo: Concept-Level Memory for Autonomous Scientific Discovery
Zhaoxiang Feng, David Scott Lewis, Enrique Zueco.
Accepted at IEEE IROS 2025 Workshop on Embodied AI and Robotics for Future Scientific Discovery (AIR4S).
LabMemo extends concept-level memory from abstract reasoning to autonomous scientific discovery. A Planner/Selector/Verifier architecture composes multi-step experimental protocols from reusable scientific procedures, with the Verifier gating execution against safety rules, type constraints, and physical limits. Elastic Weight Consolidation prevents catastrophic forgetting across experimental domains. Achieves 4.01% parameter identification error from 50 samples and 87% task retention.