Mind Lab LoRA research: a 0.12% parameter increase boosts AI memory by 1.31 times

Mind Lab AI研究

On June 2, the machine heart news outlet reported that Mindverse (Mindverse)’s Mind Lab has recently released consecutive research results on LoRA and PEFT efficient fine-tuning. The δ-mem key metrics are: with parameter increments as low as 0.12%, it delivers 1.31× and 1.20× performance improvements on the Memory Agent Bench and LoCoMo intensive memory benchmark tests.

δ-mem: Confirmed technical mechanisms and benchmark test numbers

δ-mem is a parallel mixed linear attention architecture designed for LoRA characteristics. In a traditional Transformer, the KV cache is a frozen cache during inference and has no updating capability by itself; δ-mem introduces an “Online State of Associative Memory,” maintaining an 8×8 matrix. During token input, it is continuously updated using delta-rule learning (incremental rules); during generation, it applies low-rank corrections to the backbone network’s Attention Query and Output.

According to Mind Lab official numbers:

Parameter increments: as low as 0.12%

Memory Agent Bench improvement: 1.31×

LoCoMo improvement: 1.20×

Even after removing explicit historical context: still able to recover a large amount of relevant information

MinT: Confirmed performance metrics for the infrastructure enabling million-scale LoRA training

MinT is a managed infrastructure system built specifically for LoRA training and online services. Core design: the base model remains deployed long-term in training and inference services; after each training run, what is exported is a lightweight LoRA Adapter (with Rank-1 configuration, as low as about 0.1% of the base model). After new strategies go live, there’s no need to merge the full model or reload it.

According to Mind Lab official numbers:

Handoff time from training to inference service availability shortened: up to 18.3×

Engine on-demand loading speed improved (via MoE LoRA tensor packaging): 8.5× to 8.7×

Under a two-stage rollout mechanism: user-visible LoRA load p95 reduced to 0

First request TTFT p95 shortened: 2.3×

The LoRA scaling law paper《On the Scaling of PEFT》 proposes three major scaling axes: Scale up (fixing the issue where the router replay mechanism on 1T sparse MoE fails), Scale down (OLoRA-tail initialization, using secondary singular vectors to improve Rank-1 stability without increasing parameters), and Scale out (LoRA as Memory; under multi-model voting, the accuracy follows a logarithmic growth rule with the number of models k).

Macaron-A2UI: Confirmed benchmark test results

Macaron-A2UI is built on the MinT platform. On 30B, 235B, and 754B large language model backbones, it sequentially uses LoRA-based SFT and GRPO reinforcement learning training. Beyond text output, the model can generate structured, executable A2UI actions (multiple selection boxes, sliders, confirmation cards, etc.).

According to Mind Lab official numbers: Macaron-A2UI-Venti scores 75.6 on the A2UI-Bench, and when using only lightweight Schema prompts, it surpasses the strongest frontier model baseline that uses the full long Schema input (length about 27×).

FAQ

How does δ-mem achieve a memory performance improvement at such a low cost of a 0.12% parameter increment?

δ-mem introduces an 8×8 online associative memory state matrix (instead of a traditional static KV cache). It is continuously updated with delta-rule learning and, during generation, applies low-rank corrections to the backbone Transformer. This design enables the model to recover relevant information without relying on explicit historical context, achieving a 1.31× memory improvement with only a 0.12% parameter increment.

How does MinT manage LoRA at a million-scale without reloading the full model?

MinT keeps the base model persistently deployed across training and inference services. For each update, it only moves and loads lightweight LoRA Adapters; their size is typically less than 1% of the base model. MoE LoRA tensor packaging addresses many bottlenecks in reading and writing large numbers of small objects. The two-stage rollout mechanism ensures that LoRA is only made visible to user traffic after completing warmup under admission control, bringing p95 load latency to 0.

What is the fundamental difference between Macaron-A2UI and traditional pure-text AI assistants?

Beyond text output, Macaron-A2UI can generate structured A2UI executable actions during real-time interaction (multiple selection boxes, sliders, confirmation cards, etc.). Its goal is to reduce cognitive load for complex tasks and to continuously learn according to users’ personalized habits.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments