Back to Projects
InsightPilot
BetaPythonFastAPINext.js 14+11 more

InsightPilot

Enterprise intelligence system that transforms raw data into executive-grade strategy. The AI analyst that delivers instant answers in plain English. No dashboards, no SQL, just insights.

Timeline

6 weeks

Role

Full Stack AI Engineer

Team

Solo

Status
Beta

Technology Stack

Python
FastAPI
Next.js 14
TypeScript
PostgreSQL
DuckDB
Groq API
Llama 3.3
Plotly.js
Google Sheets API
SQLAlchemy
Tailwind CSS
Redis
Docker

Key Challenges

  • Natural Language to SQL Conversion
  • Multi-Database Architecture (PostgreSQL + DuckDB)
  • Real-time Google Sheets Integration
  • AI-Powered Chart Type Selection
  • Secure Report Sharing System
  • Production-Grade Rate Limiting

Key Learnings

  • LLM Prompt Engineering for SQL Generation
  • FastAPI Async Architecture
  • DuckDB Performance Optimization
  • OAuth2 Integration with Google APIs
  • Next.js 14 App Router Patterns
  • Plotly Advanced Visualizations
  • JWT Authentication & Security

Summary

"Stop Guessing. Start Commanding." InsightPilot is an enterprise intelligence system that transforms raw data into executive-grade strategy with AI-powered natural language queries. No dashboards. No SQL. Just answers. With 50+ beta users and 500+ queries processed, it delivers instant insights from 10+ data sources. Ask questions like "What were my top 10 customers last quarter?" and get charts, narratives, and shareable reports in seconds.


Features

  • Natural Language Core - Query entire data warehouse like talking to your smartest analyst
  • Real-Time Synthesis - Correlates and synthesizes millions of rows into strategic narratives
  • Sovereign Security - SOC2 Type II compliant, private VPCs for enterprise, data never trains public models
  • Smart Visualizations - AI auto-selects optimal chart type (bar, line, pie, scatter)
  • Data Anchoring - Every AI claim is cited with source rows, SQL queries, and confidence intervals
  • Shareable Reports - Generate public URLs with time-based expiration for team collaboration
  • Multi-Source Integration - CSV upload (50MB) + Google Sheets live sync + 10+ data sources

Architecture with Real-Life Usecase

The Executive Intelligence Pipeline

When an executive asks: "What were my top 10 customers last quarter?"

  1. Natural Language Understanding (instant) → Context-aware parsing with business logic
  2. Data Warehouse Query (1 sec) → Multi-source correlation and aggregation
  3. Real-Time Synthesis (2 sec) → AI analyzes patterns, trends, and anomalies
  4. Smart Visualization (instant) → Auto-selects chart type for maximum clarity
  5. Data Anchoring (instant) → Citations with source rows and confidence intervals
  6. Strategic Narrative (2 sec) → Executive summary with actionable insights
  7. Shareable Report (instant) → Generate collaborative URL with access controls

Total: 5 seconds vs. 2-3 days (traditional analyst workflow)

Real-World Outcome

Traditional BI Approach:

  • Submit request to data team → 2-day backlog
  • Analyst builds SQL query → 1 hour
  • Create dashboard → 2 hours
  • Review and iterate → 1 day
  • Total: 3+ days

InsightPilot Approach:

Executive asks: "Which products drove Q4 revenue growth?"

Results in 5 seconds:

  1. Smart visualization showing revenue by product with trend analysis
  2. Data-anchored insight: "Widget C drove 42% of growth with $3.2M in Q4, up 156% from Q3" [Source: rows 1,234-1,567]
  3. Real-time synthesis of market patterns and causal factors
  4. Strategic narrative with confidence intervals
  5. Shareable report URL for board presentation

Result: Executive-grade intelligence vs. 3-day analyst backlog


Tech Stack

  • Frontend: Next.js 14 (App Router), TypeScript, Tailwind CSS, Plotly.js
  • Backend: FastAPI, Python 3.11+, SQLAlchemy (async)
  • LLM: Groq API (Llama 3.3-70B) - Fast inference (300-500 tokens/sec)
  • Databases: PostgreSQL (Neon Cloud), DuckDB (analytics), Redis (caching)
  • Integrations: Google Sheets API (OAuth2), JWT authentication

Key Technical Achievements

  • Context-Aware Intelligence: Natural language understanding that grasps business context, not just keywords
  • Multi-Source Correlation: Real-time synthesis across 10+ data sources with causal analysis
  • Data Anchoring System: Click any AI claim to reveal source rows, SQL queries, and confidence intervals
  • Sovereign Security: SOC2 Type II compliance with private VPC deployment for enterprise customers
  • Smart Chart Selection: AI analyzes query intent and data structure to pick optimal visualization
  • Production Scale: Serving 50+ beta users with 500+ queries processed at sub-5-second response time

Architecture Highlights

Natural Language Processing:

User Query → Sanitization → Schema Context Injection → 
Groq LLM Prompt → SQL Validation → DuckDB Execution → 
Plotly Chart Config → AI Narrative → Shareable Report

Security Layers:

  • Rate limiting: 2 queries/day (free), 50/day (pro)
  • SQL injection prevention via parameterized queries
  • File upload validation and size limits
  • JWT access/refresh token rotation
  • Project-based access control

Real-World Impact Example

Executive Team at Growing Startup

Before InsightPilot:

  • Weekly data requests submitted to 2-person analytics team
  • 3-5 day turnaround for custom analyses
  • Board meetings delayed waiting for reports
  • Critical decisions made on incomplete information

After InsightPilot:

  • Self-service executive intelligence in 5 seconds
  • Real-time answers during strategy sessions: "Show customer retention by cohort"
  • Board reports generated instantly with data-anchored claims
  • Strategic pivots executed same-day based on synthesized insights

Measurable Impact:

  • Time Saved: 15 hours/week per executive (no analyst dependency)
  • Decision Speed: Same-minute vs. 3-5 day lag
  • Cost Reduction: $29/mo vs. $50k/year analyst salary
  • Beta Traction: 50+ users, 500+ queries, 10+ data sources integrated

Development Journey

Technical Challenges Overcome

  1. Natural Language Ambiguity

    • Problem: "Show sales" could mean revenue, units, or growth rate
    • Solution: Context-aware NLP with business domain understanding and clarification prompts
  2. Data Anchoring Reliability

    • Problem: AI hallucinations in traditional systems erode trust
    • Solution: Citation system linking every claim to source rows, SQL queries, and confidence scores
  3. Multi-Source Correlation

    • Problem: Synthesizing insights across disparate data sources (CRM, analytics, financial)
    • Solution: Real-time data correlation engine with causal analysis capabilities
  4. SOC2 Compliance at Scale

    • Problem: Balancing speed with enterprise-grade security requirements
    • Solution: Private VPC architecture with data sovereignty guarantees and annual security audits
  5. Executive-Grade Narratives

    • Problem: Technical data dumps don't translate to strategic insights
    • Solution: LLM fine-tuning for business context with strategic framing and actionable recommendations

Performance Benchmarks

  • Query Response Time

    • Target: < 5 sec
    • Achieved: 2–3 sec
  • Multi-Source Correlation

    • Target: < 10 sec
    • Achieved: 5–8 sec
  • Chart Rendering

    • Target: < 1 sec
    • Achieved: 0.3 sec
  • Beta User Adoption

    • Target: 20 users
    • Achieved: 50+ users
  • Queries Processed

    • Target: 100 queries
    • Achieved: 500+ queries
  • Data Sources Supported

    • Target: 5 sources
    • Achieved: 10+ sources

Resources & Links

For Employers & Collaborators

What This Project Demonstrates:

  • Enterprise AI Engineering: SOC2-compliant system serving 50+ beta users with real-time intelligence
  • Context-Aware NLP: Transforms plain English into strategic insights with data anchoring
  • Production Architecture: Multi-source correlation engine processing 500+ queries at scale
  • Security-First Development: Private VPC deployment with sovereign data guarantees
  • Product-Market Fit: Beta traction with 10+ data sources and growing user base
  • Full-Stack Ownership: From architecture to deployment, security to UX optimization

Open to Discuss:

  • Technical implementation of data anchoring and citation systems
  • SOC2 compliance strategies for AI systems
  • Multi-source correlation and real-time synthesis architecture
  • Scaling from beta (50 users) to production (1,000+ users)
  • Monetization strategy (free → $29/mo pro tier launching Q1 2026)

Contact: Available for technical interviews or architecture deep-dives


Project Status

Current: Beta (v1.0) - Live at insightpilot.thevanshgarg.com
Traction: 50+ beta users • 500+ queries processed • 10+ data sources
Next Milestones:

  • Pro tier launch with 50+ queries per day($29/mo)
  • Advanced model integration for deeper analysis.
  • Team collaboration features and workspaces.
  • Enterprise tier with private VPC and custom integrations.