Unlocking Ancient Pictographs: A Multi-modal LLM Approach to Dongba Characters Understanding
DOI:
https://doi.org/10.54691/11mtjs98Keywords:
Dongba script, pictographic writing system, multimodal LLMs, character recognition.Abstract
The Dongba script, the only living pictographic writing system, poses unique challenges for computational modeling due to its visual complexity and limited resources. Previous work has primarily relied on convolutional neural networks for image-based recognition, but these approaches struggle with generalization to unseen characters and fail to capture contextual information. This study presents the first systematic evaluation of multimodal large language models (LLMs) for Dongba character recognition. We benchmark state-of-the-art pre-trained multimodal LLMs under zero-shot and two-shot prompting, and further develop DB-LLM, a fine-tuned multimodal model adapted specifically for Dongba scripts. Experimental results reveal that pre-trained models achieve less than 2% accuracy, indicating limited capacity for direct recognition. In contrast, DB-LLM achieves 78.4% accuracy on the seen test set, representing a substantial improvement and demonstrating the effectiveness of targeted adaptation. However, the model shows limited ability to generalize to unseen classes, highlighting the need for future research on cross-inventory generalization and robustness. These findings establish a foundation for computational analysis of Dongba and contribute to the broader study of low-resource pictographic writing systems.
Downloads
References
[1] Milnor, S. J. (2005). A comparison between the development of the Chinese writing system and Dongba pictographs. University of Washington Working Papers in Linguistics, 24.
[2] Bi, X., & Luo, Y. (2024). Incomplete handwritten Dongba character image recognition by multiscale feature restoration. Preprint.
[3] Luo, Y., Sun, Y., & Bi, X. (2023). Multiple attentional aggregation network for handwritten Dongba character recognition. Expert Systems with Applications, 213, 118865.
[4] OpenAI, Achiam, J., Adler, S., Agarwal, S., et al. (2024). GPT-4 Technical Report. arXiv:2303.08774.
[5] Gemini Team, Anil, R., Borgeaud, S., Alayrac, J.-B., Yu, J., Soricut, R., et al. (2025). Gemini: A family of highly capable multimodal models. arXiv:2312.11805.
[6] Qwen Team, Yang, A., Yang, B., Zhang, B., Hui, B., et al. (2025). Qwen2.5 Technical Report. arXiv:2412.15115.
[7] Tonja, A. L., Dossou, B. F. P., Ojo, J., Rajab, J., Thior, F., Wairagala, E. P., et al. (2024). InkuBaLM: A small language model for low-resource African languages. Preprint.
[8] Jayakody, R., & Dias, G. (2024). Performance of recent large language models for a low-resourced language. arXiv:2407.21330.
[9] Ahmad, I., Dudy, S., Ramachandranpillai, R., & Church, K. (2024). Are generative language models multicultural? A study on Hausa culture and emotions using ChatGPT. In Proc. of the 2nd Workshop on Cross-Cultural Considerations in NLP (pp. 98–106). ACL.
[10] Roest, C., Edman, L., Minnema, G., Kelly, K., Spenader, J., & Toral, A. (2020). Machine translation for English–Inuktitut with segmentation, data acquisition and pre-training. In Proc. of WMT5 (pp. 274–281). ACL.
[11] Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2022). Scaling Vision Transformers. arXiv:2106.04560.
[12] Liu, M., Liu, G., Liu, Y., & Jiao, Q. (2020). Oracle bone inscriptions recognition based on deep convolutional neural network. Journal of Image and Graphics, 8, 114–119.
[13] Barucci, A., Cucci, C., Franci, M., Loschiavo, M., & Argenti, F. (2021). A deep learning approach to Ancient Egyptian hieroglyphs classification. IEEE Access, PP, 1–1.
[14] Hua, R., & Xu, X. (2019). Intelligent classification on images of Dongba ancient books. The Journal of Engineering, 2019(23), 9039–9042.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Scientific Journal of Intelligent Systems Research

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




