MiniMax responds with a lengthy article stating that 'the model cannot describe Ma Jiaqi'

09 May 2026 18:44
On May 9th, MiniMax's official WeChat account posted a lengthy response to the M2 series model's inability to identify Ma Jiaqi, providing a complete investigation process and technical thinking on the "Jiaqi recognition" issue. MiniMax stated that it conducted investigations from multiple dimensions, including tokenizer version alignment, embedding statistical distribution, semantic nearest neighbor retrieval, few shot comparison experiments between pre trained and post trained models, frequency statistics of post training data, and sorting and scanning of the change amplitude of the entire word table lm_ head. The final reason for localization is that "Jiaqi" was merged into an independent token in the tokenizer, but this token appeared very infrequently in the post training data, causing the model to gradually forget its ability to generate this token during post training. In terms of repair solutions, MiniMax constructed a synthetic data that covers the entire vocabulary. The core idea is to establish a "lower bound guarantee" for the generation frequency of the entire vocabulary through a simple repetition task, to prevent any tokens from deteriorating due to complete loss. In addition, MiniMax suggests that using token coverage as a regular monitoring metric for post training data quality can detect potential sparse token degradation risks early on and avoid similar issues from recurring online.

Most Popular Latest News