Photo: Markus Winkler / Pexels
Sovereign AI: India Bet Big on Its Own LLM — Is It Working?
India spent much of the past year chasing a single, politically charged idea: that the world's most populous nation should not have to rent its intelligence from Silicon Valley or Shenzhen. That idea has a name now — sovereign AI — and in 2026 it stopped being a slogan and became a set of shipped products, billion-rupee subsidies and, inevitably, a very public argument about whether any of it was worth the money. The short version is that India built its own large language models. The longer, more interesting version is that almost nobody can yet agree on what that achievement is actually for.
What Sovereign AI Actually Means
Strip away the flag-waving and sovereign AI is a claim about control. It means the model, the data it learned from, the chips it runs on and the rules that govern it all sit inside a country's own jurisdiction. The argument is partly about security — sensitive citizen data not flowing through foreign servers — and partly about culture. A model trained mostly on English internet text tends to treat Hindi, Tamil or Marathi as afterthoughts, fumbles local context, and quietly imports the assumptions of the places that built it.
For a country with 22 official languages and well over a billion people who mostly do not think in English, that gap is not academic. It shapes who can use a chatbot to fill a government form, query a crop-insurance scheme, or get a medical explanation in a language they actually speak. Sovereign AI, at its most convincing, is an accessibility project dressed in the language of national pride.
The IndiaAI Mission's Big Wager
The scaffolding for all this is the IndiaAI Mission, a roughly ₹10,370 crore programme approved in 2024. Its most visible deliverable has been compute: the government says it has onboarded more than 38,000 subsidised GPUs through a national compute portal, renting high-end chips to startups and universities at a fraction of market rates — figures cited in the low hundreds of rupees per GPU-hour. The stated ambition is to keep scaling toward six figures of GPUs.
The mission is also funding model-building directly. Around a dozen organisations — including Sarvam AI, the academic consortium BharatGen, Gnani.ai and others — have been backed to develop indigenous foundation models. BharatGen, a government-funded multimodal effort anchored at IIT Bombay, has been among the largest beneficiaries. The headline pick, though, was Sarvam: in April 2025 the Bengaluru startup was chosen as the first company to build India's flagship sovereign model from scratch, handed access to thousands of NVIDIA H100 GPUs and a subsidy reported at roughly ₹99 crore.
That phrase — from scratch — matters more than it looks, and it sits at the centre of the whole fight.
From Fine-Tuning to Building From Scratch
Sarvam's first widely noticed model, Sarvam-M, was not built from the ground up. It was a 24-billion-parameter system built on top of Mistral Small, a base model from the French lab Mistral, then post-trained heavily on Indian-language and reasoning data. The team reported large gains over the base model — double-digit percentage improvements on Indian-language, math and coding benchmarks, and a striking jump on romanised Indian-language math problems.
Technically that is a respectable result. Symbolically, it was a problem. Critics pointed out that a model standing on a European foundation is hard to call fully sovereign — the intelligence, after all, largely comes from what the base model already learned. If the goal was independence, fine-tuning someone else's brain looked like a compromise.
So at the India AI Impact Summit in February 2026, Sarvam answered with something more ambitious: two models, around 30 billion and 105 billion parameters, that the company says were trained from scratch on domestic infrastructure. The larger one uses a mixture-of-experts design, meaning it holds 105 billion parameters in total but activates only about a tenth of them per token, keeping running costs down. It claims a 128,000-token context window, support spanning India's official languages, and competitive scores against models like DeepSeek's R1 and Google's Gemini Flash on selected reasoning and agentic benchmarks. Both were released openly. It was, on paper, India's most credible attempt yet at a genuinely homegrown frontier-style system.
The Download Number That Stung
And then reality intervened in the most modern way possible: through download counts. When the earlier Sarvam-M landed on Hugging Face, the numbers were brutal. Reports described only a few dozen downloads in the first couple of days, and figures in the low hundreds across the first few. Commentators gleefully compared it with open models from smaller countries that had pulled hundreds of thousands of downloads.
The pile-on was swift. One prominent investor publicly called the result embarrassing, arguing there simply wasn't a large audience for incremental work like this. Others reached for ghosts of Indian tech past — homegrown apps like Koo and Hike that launched as patriotic alternatives to global platforms and then faded. A recurring, uncomfortable point: many digitally active Indians conduct their online lives in English, which blunts the very advantage an Indic-first model is supposed to have.
It is worth keeping perspective. Raw download counts on a developer platform are a noisy proxy for real-world impact, especially for a model also offered through an API, a hosted playground and a national repository. A quiet launch is not the same as a failed technology. But the episode punctured the triumphalism, and it forced a more honest question into the open.
Sovereignty Versus Scale: The Real Argument
That question is this: should India spend scarce capital and compute pretraining giant models from zero, or should it take the best open models in the world and become world-class at adapting them?
The pragmatists — a camp that has included senior voices from India's IT establishment — make a hard-nosed case. Frontier-scale pretraining costs enormous sums, demands rare research talent and needs vast, high-quality text corpora. For 22 Indian languages, that quality data simply may not exist yet at the scale required. By their logic, the smartest path is to bolt excellent Indian-language capability onto strong open foundations: better tokenizers for Indian scripts, multilingual alignment, sharp evaluation sets. Most of the practical benefit, for a fraction of the cost.
The sovereignty camp counters that capability you merely borrow is capability you can lose. Licences change, geopolitics shift, and a nation that can only fine-tune is forever one policy decision away from being cut off. Building the full stack — even imperfectly, even expensively — buys strategic insurance and the institutional muscle to do it again better. From this angle, Sarvam's from-scratch 105B is not about winning a benchmark this quarter; it is about proving the country can do the hard thing at all.
Both sides are partly right, which is why the debate refuses to resolve.
Why This Matters Beyond India
This is not a uniquely Indian dilemma. Governments across the Gulf, Southeast Asia, Europe and Latin America are wrestling with versions of the same trade-off, watching a tiny handful of American and Chinese labs accumulate the world's most valuable cognitive infrastructure. India is simply running the experiment in public, at population scale, with a transparency that makes both its wins and its stumbles unusually legible.
The near-term signals to watch are concrete. Does adoption of the homegrown models actually climb once they are wired into government services, regional businesses and voice-first interfaces — the places where Indic capability is a real edge rather than a talking point? Does the compute subsidy translate into a deeper bench of labs, or stay concentrated in a few favoured names? And do the next models close the gap with global frontier systems, or settle into a comfortable second tier?
The honest answer in mid-2026 is that India has bought itself an option, not a victory. It has the chips, the funding, a unicorn-valued national champion and at least one model built the hard way. Whether sovereign AI becomes critical infrastructure or an expensive point of pride will be decided not at summits, but in whether ordinary people quietly start using these systems in their own languages — and keep coming back.
Source: business-standard.com



