Building for Bharat: Innovation at Scale, Rooted in Culture

I’ve spent nearly a decade trying to answer one question: how do you really build for Bharat?
After ~7 years at Practo and another 1.5 years in the medical space with close friends, I’ve realized something fundamental – building for Bharat is not the same as building for the world.

What Makes Bharat Different?

Bharat is a geography where people are deeply value-driven.

Every rupee spent is measured against perceived value, not just functional value.
Adoption curves are dynamic. What excites users today may feel outdated in six months.
Consumer expectations change daily, powered by the data revolution and a smartphone-first economy.

For any company to succeed, innovation must be continuous. In Bharat, if you’re not innovating every week, you’re already behind.

The Temple Tech Challenge

One of our most exciting recent innovations has been Sankalp for multiple temples.
What sounds like a simple feature on paper required solving for scale, culture, and constraints:

Revenue split automation: One order may involve multiple temple partners. Our systems needed to automatically decide “what revenue goes where” while keeping transparency intact.
Parallel execution: Different temple teams needed to perform pujas almost simultaneously. Coordinating rituals across geographies and time zones wasn’t trivial.
Video at scale:
- Temples are often in areas with 2G/3G connectivity. Video compression had to balance file size and quality.
- We stitch together 30,000+ videos within 3–4 hours every week.
- Subtitles aren’t just text — they’re personalized with devotee names and gotras, extracted via speech-to-text + timestamp matching.

Deep Dive: Engineering the Pipeline

When you hear “stitching temple videos together”, it sounds trivial. But at Bharat scale across weak networks, heterogeneous devices, and high personalization requirements it becomes a distributed systems + media tech + AI challenge rolled into one.

1. Partner App (Capture + Offline-First)

Constraint: Many temples are in areas with poor 2G/3G coverage.
Solution: We designed an app for our temple partners with offline-first support.
- Videos are recorded locally and compressed in-device using a custom ffmpeg wrapper with heuristics tuned for low-bandwidth scenarios.
- Compression target: reduce upload size by 60–70% while retaining clarity of key ritual details (faces, fire, flowers).
- Metadata (booking ID, temple ID, timestamp) is packaged alongside the media in a lightweight JSON manifest.

2. Upload & Ingestion Layer

Once connectivity is detected, videos sync to our servers.
We use a chunked upload mechanism to resume interrupted transfers (critical in flaky network zones).
Ingestion servers run on a queue-based architecture (Kafka + S3) to decouple partner upload speed from downstream processing throughput.

3. Video Pre-Processing

Each video is checked for corruption, duplicate frames, and silent segments.
Standardized into a baseline format (MP4/H.264) for uniform downstream handling.
Thumbnails and low-res proxies are generated for quick previews and retries.

4. AI Transcription & Entity Extraction

Audio is extracted from the video.
A transcription job (Google Speech-to-Text + Gemini 2.5 fine-tuned layer) runs asynchronously.
Custom NER (Named Entity Recognition) pipeline:
- Detects devotee name, gotra, and puja type.
- Confidence threshold >95%; fallback to manual tagging if below threshold.
Transcription output is mapped back into a time-aligned JSON (WebVTT) to support subtitles.

5. Stitching & Personalization

Using FFmpeg’s concat demuxer, multiple temple videos are merged into a single timeline.
Between transitions, we auto-insert:
- Frames describing the ritual step (e.g., “Archana at Tirupati”, “Abhishekam at Kashi”).
- Background score snippets, volume leveled.
Subtitles are overlaid, personalized with devotee metadata.
Final output: A single 720p/1080p video optimized for mobile playback.

6. Scale & Performance

Every week, we process 30,000+ videos in ~3–4 hours.
Core optimizations:
- Parallelization via Kubernetes Jobs, each handling a batch of videos.
- Spot instances + autoscaling for cost efficiency.
- GPU nodes reserved for the transcription-heavy workloads.
To maintain throughput, we designed an internal orchestrator that dynamically routes jobs to CPU- or GPU-optimized queues.

7. Distribution

Final stitched videos are delivered via CDN-backed storage (CloudFront).
Users are notified via app push + WhatsApp fallback.

Why This Pipeline Matters

This system solves problems that most consumer tech never faces:

Building for flaky networks (offline-first, resumable uploads).
Personalizing at scale (subtitles with cultural metadata).
Media-heavy workloads at Bharat cost structures (FFmpeg optimizations, spot-instance autoscaling).
Human + divine context (where correctness isn’t just technical, but spiritual).

It’s the kind of engineering that blends distributed systems, video tech, and AI pipelines, all while staying rooted in Bharat’s realities.

Building for Bharat: Innovation at Scale, Rooted in CultureA deep dive into building for Bharat—where value-driven users, rapid change, and real-world constraints demand continuous innovation and scalable, resilient systems.