The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning (arxiv.org)
from yogthos@lemmy.ml to technology@lemmy.ml on 22 Feb 09:34
https://lemmy.ml/post/43530837

This paper is honestly one of the most creative takes on LLM reasoning I’ve seen in a while. The team at ByteDance basically argues that we should view Long Chain-of-Thought as a macromolecular structure with internal forces that hold the logic together. They found that when we try to teach a model to reason by simply distilling keywords from a teacher, it fails because it’s like trying to build a protein by looking at a photo of it rather than understanding the atomic bonds.

Their Molecular Structure of Thought hypothesis breaks reasoning down into three specific bond types that behave similarly to their chemical counterparts. Deep reasoning acts like covalent bonds, forming the rigid primary backbone where each logical step must strictly justify the next. Self-reflection functions like hydrogen bonds, creating folding patterns where the model looks back 100 steps to audit an earlier premise, which keeps it from hallucinating. Finally, you have self-exploration acting like van der Waals forces, these are low-commitment bridges that let the model probe different ideas without getting stuck in a rigid path too early.

They found that most synthetic reasoning data is actually trash because it lacks this distribution. They proved that models don’t actually learn the keywords themselves, but the characteristic reasoning behaviors those keywords represent. In one experiment, they replaced keywords like wait with arbitrary synonyms or removed them entirely, and the models still learned the reasoning structure just fine. It turns out that building these stable thought molecules is what creates the basis for Long CoT, as opposed to just mimicking a specific vibe or prompt format.

They built MOLE-SYN to address the problem. Instead of just copying teacher outputs, it uses a distribution transfer graph to walk through four behavioral states to synthesize traces that have the correct bond profile from the start. Their approach makes reinforcement learning much more stable because the model starts with a balanced skeleton instead of a bunch of fragmented logic. The paper challenges the whole more data is better mindset to argue that it’s the geometry of the information flow that really matters.

#technology

threaded - newest