IIT Delhi’s AI4Bharat research centre released BharatGPT 2.0, an open-source large language model that achieves native-level fluency in all 22 languages listed in the Eighth Schedule of the Indian Constitution.
The model, trained on 1.2 trillion tokens across Hindi, Tamil, Telugu, Kannada, Marathi, Bengali, Gujarati, and 15 other languages, outperforms GPT-4 and Gemini on the IndicNLP benchmark suite by margins ranging from 12% to 41% depending on the task.
What Makes It Different
Unlike previous multilingual models that treat Indian languages as secondary, BharatGPT 2.0 was trained from scratch with equal weight given to all supported languages. The team also created a new 500,000-question evaluation set covering legal, medical, agricultural, and cultural domains specific to the Indian context.