Sanjay Sharma | Case Studies

Real-Time Entertainment Intelligence

HTML / CSS / JS TMDB API Chart.js Data Visualisation Responsive Design

Overview

A cinematic, single-page analytics dashboard that delivers deep intelligence on any movie or TV show in real time. Users search for a title and instantly see box office financials, revenue trend modelling, financial breakdown charts, genre radar profiles, audience score rings, cast & crew grids, season browsers, network info, and popularity metrics — all wrapped in a gold-on-dark responsive UI built entirely in vanilla HTML, CSS, and JavaScript.

Problem Statement

Entertainment data exists across dozens of scattered sources — box office trackers, review aggregators, streaming databases — making it hard for a user to get a holistic picture of any title in one place. The challenge was to unify all of this into a single zero-backend dashboard that feels as polished as a native app, loads instantly, and works on any device.

Key Features

Dual Mode: Seamless switch between Movies and TV Shows with context-aware UI — profit bars for movies, season grids and network info for TV
Live Autocomplete Search: Real-time suggestions via TMDB search API with full keyboard navigation (↑ ↓ Enter Escape) and ⌘K shortcut
Revenue Trend Modelling: Cumulative and period-based revenue charts built using a theatrical decay curve — since TMDB does not provide weekly breakdowns, the distribution is modelled mathematically
Financial Breakdown: Animated bar chart comparing budget, worldwide box office, and net profit/loss with colour-coded profit/loss states
Genre Radar Chart: Six-axis radar mapping Action, Adventure, Drama, Thriller, Comedy, and Romance scores derived from a hand-crafted genre scoring matrix
Audience Score Ring: Animated SVG stroke-dashoffset ring that fills dynamically, colour-coded green/gold/red by rating tier
Cast & Crew Grid: Director/Creator highlighted in gold, full cast with profile photos, character names, and roles
Season Browser: TV-only grid showing every season with poster, episode count, and air year
Production & Networks Panel: Company/network logos with country of origin
Responsive Design: Four breakpoints (desktop → tablet → mobile → small mobile) with a sticky search bar and back-to-home link on mobile

Technical Architecture

Zero Backend: Pure static frontend — no server, no build step, no framework. HTML + CSS + vanilla JS only
TMDB API v3: Two parallel fetch calls on search selection — /movie/{id} + /movie/{id}/credits (or TV equivalents) — resolved with Promise.all for speed
Chart.js 4.4: Six chart instances (cumulative line, period bar, financial bar, popularity bar, genre radar, episodes bar) with shared tooltip and scale defaults
chartjs-adapter-date-fns: Enables time-axis charts with type: "time" for the revenue trend panels
CSS Custom Properties: Full design token system (10 colour variables, layout variables) enabling consistent theming across all components
Animations: CSS @keyframes fadeUp with staggered animation-delay per card, SVG stroke transitions for the score ring, and width transitions for the profit bar

Design Decisions

Gold-on-dark palette: Chosen to evoke the premium feel of cinematic brands (Oscars, IMAX, Netflix dark mode) — gold (#c9a84c) as accent against a near-black (#07080d) background
Bebas Neue + DM Sans: Display font for stats/headings, humanist sans for body — a pairing that balances editorial weight with readability
Noise overlay: Subtle SVG fractalNoise texture at 0.4 opacity adds grain depth without impacting performance
Mobile poster fix: Sidebar poster uses aspect-ratio: 2/3 + object-position: top center to always show the full poster without cropping faces
Modelled revenue: Rather than hiding the chart for unavailable data, a theatrically-plausible decay curve is rendered with a clear "Modelled distribution" label — keeping the dashboard always visually complete

Challenges & Solutions

No weekly box office data: TMDB only provides a lifetime revenue total. Solved by modelling distribution using an exponential decay function with a 120-day theatrical window and a 12% home-video tail
Chart instance memory leaks: Each new search destroys all six previous chart instances via a destroyChart() wrapper before re-creating them
Mobile image cropping: Fixed by replacing fixed-height poster containers with aspect-ratio: 2/3 + max-height caps per breakpoint, ensuring faces are always visible
Three simultaneous search inputs: Desktop header, mobile sticky bar, and landing page all sync on selection — clicking any suggestion updates all three inputs simultaneously

Tech Stack

HTML5, CSS3 (Custom Properties, Grid, Flexbox, Animations), Vanilla JavaScript (ES2020+), TMDB API v3, Chart.js 4.4, chartjs-adapter-date-fns, Bebas Neue + DM Sans (Google Fonts)

🏠 Home

AI Web Vulnerability Scanner

Python Streamlit Groq Cloud AI Random Forest ML BeautifulSoup Scikit-learn

Overview

A full-stack, AI-powered web vulnerability scanner built with Streamlit and powered by a dual-intelligence engine: Groq Cloud AI (LLaMA 3.3 70B) for contextual threat analysis and a trained Random Forest ML model for pattern-based detection. Users authenticate via a login system, enter any URL or domain, and the scanner automatically discovers subdomains, crawls pages, and surfaces vulnerabilities across SQL Injection, XSS, CSRF, insecure headers, and security misconfigurations — all exported as JSON, Markdown, or CSV reports.

Problem Statement

Manual penetration testing is slow, expensive, and hard to scale. Rule-based scanners miss context-dependent vulnerabilities and generate noisy false positives. This project combines a trained ML model with a real Groq LLM to create a scanner that understands page context, validates findings intelligently, and produces executive-grade reports — all runnable with a single streamlit run app_ai.py command.

Three-Layer Detection Pipeline

Layer 1 — Random Forest ML Model: A trained scikit-learn Random Forest classifier predicts SQL Injection, XSS, or Misconfiguration with per-class probability scores. Only predictions above 60% confidence are surfaced as findings
Layer 2 — Traditional Pattern Detection: Regex and DOM heuristics scan page HTML for known dangerous patterns — eval(), innerHTML=, document.write(), GET-method forms, POST forms without CSRF tokens, and sensitive keywords in HTML comments
Layer 3 — Groq AI Analysis (LLaMA 3.3 70B): The GroqOrchestrator sends HTML snippets, form structures, and script counts to Groq's API for deep contextual analysis, returning structured JSON vulnerability objects with CWE IDs, proof of concept, and remediation guidance

Groq AI Orchestrator

Domain Recognition: Uses LLaMA to intelligently parse any user input — full URLs, bare domains, subdomains, paths — and extract a clean domain with confidence score
Page Content Analysis: Sends truncated HTML + form/script metadata to Groq, receiving a JSON array of vulnerabilities with risk scores calculated by severity × exploitability × business impact × confidence
Vulnerability Validation: Each detected finding is sent back to Groq for a second-pass review, refining severity and filtering false positives
Executive Summary: Generates a 200–250 word business-readable summary with security posture rating, key findings, and immediate action items

Subdomain Discovery & Smart Crawler

crt.sh Certificate Transparency Logs + HackerTarget API + 20-word common subdomain wordlist
30-thread ThreadPoolExecutor for parallel DNS resolution and HTTP reachability verification
Smart Crawl mode prioritises security-relevant pages: login, admin, upload, search, payment, api, profile
Configurable scan depth (Quick / Standard / Deep / Comprehensive) with 5–30 pages per domain slider

Vulnerability Coverage

SQL Injection (CWE-89), XSS (CWE-79), Missing CSRF Protection (CWE-352)
Sensitive Data Exposure (CWE-200) via HTML comment scanning
Insecure Protocol (CWE-319) and five missing security headers (CWE-16)
ML-detected Misconfiguration based on CSP absence and script/form count thresholds

Tech Stack

Python, Streamlit, Groq Cloud API (LLaMA 3.3 70B), Scikit-learn (Random Forest), Joblib, Pandas, NumPy, BeautifulSoup4, Requests, python-dotenv, lxml, concurrent.futures, socket

🏠 Home

Cyber Threat Detection Suite

Python Machine Learning Cybersecurity Flask

Overview

A machine learning-powered system designed to detect malicious network activity in real time, providing proactive cybersecurity through traffic pattern analysis and threat classification.

Problem Statement

Modern networks face threats including malware, DoS attacks, and unauthorized access. Traditional rule-based tools struggle with unknown threats. This project applies ML to recognize attack patterns in network traffic.

Modelling Approach

Random Forest, SVM, and ensemble learners classify traffic as benign or malicious. Models are evaluated using accuracy, precision, recall, and F1-score metrics.

Key Features

Network traffic preprocessing and feature preparation
Real-time inference through Flask API
Confidence scores for predictions
Attack category visualisation

Impact

Enhances traditional security tools by detecting complex attack patterns, supporting faster incident response and reducing false positives. Achieved 94% detection accuracy during the RedKross internship deployment.

🏠 Home

Spotify Real-Time Recommendation System

Python Flask Spotify API Recommendation Engine

Overview

A real-time music recommendation system built using Python and the Spotify Web API. Its goal is to help users discover new music tailored to their listening preferences by analyzing their historical listening behavior and track audio features.

Problem Statement

With millions of songs available on streaming platforms, users often struggle to find tracks that match their tastes. This project creates a system that recommends songs dynamically based on listening patterns and audio characteristics.

System Architecture

User Authentication: Secure login via Spotify OAuth 2.0
Data Retrieval: Collects recently played tracks and audio features
Feature Processing: Normalizes song vectors for similarity comparison
Similarity Calculation: Uses cosine similarity for matching
Recommendations: Real-time suggestions through web interface

Key Insights

Real-time analysis enables highly personalized recommendations
Audio feature-based similarity outperforms playlist matching
API optimization significantly improves responsiveness

Tech Stack

Python, Flask, Pandas, NumPy, Spotify Web API

🏠 Home

Car Price Prediction

Python Flask Machine Learning Regression

Overview

ML-powered car price prediction system estimating vehicle values based on mileage, manufacture year, brand, and other features through regression models.

Problem Statement

Predicting fair market value for used cars is challenging. Buyers and sellers rely on rough estimates. This model learns from historical data to provide accurate predictions.

Data & Features

Data cleaning handled missing values and categorical encoding. Feature engineering created variables like vehicle age, enhancing the model's pattern recognition.

Modelling

Regression models trained on historical car data. Evaluation using MAE and R² score ensures generalisation to unseen data.

Key Features

Data preprocessing and transformation pipeline
Real-time prediction via Flask API
Cloud deployment ready
Serialised model for quick inference

🏠 Home

Movie Recommender System

Python Flask Recommendation Machine Learning

Overview

A Python-based recommendation system providing personalized movie suggestions using content similarity algorithms and collaborative filtering.

Problem Statement

Users struggle to find relevant movies among thousands of titles. Static charts don't cater to individual tastes. This system analyses metadata and similarity patterns for personalisation.

How It Works

Precomputed similarity matrices enable fast recommendations. Cosine similarity between movie feature vectors (genres, ratings) drives the logic.

Features

Interactive web interface
Fast real-time recommendations
Personalised suggestions
Lightweight Flask deployment

Impact

Enhances content discovery and user engagement by applying ML techniques for personalized recommendations in large catalogs.

🏠 Home

IPL Data Analysis Dashboard

Python Pandas Matplotlib Seaborn

Overview

In-depth exploratory analysis of IPL cricket (2008–2019), uncovering patterns in team performance, player contributions, and match outcomes using historical data.

Problem Statement

IPL generates massive datasets, but raw statistics don't explain performance trends. This project structures data to answer questions about consistency, toss influence, and venue effects.

Analysis Approach

Team Performance: Win percentages and dominance patterns
Toss Impact: Correlation with match outcomes
Player Analysis: Strike rates and consistency metrics
Venue Insights: Stadium influence on scoring

Key Insights

Small group of teams showed long-term dominance
Toss had limited impact on outcomes
Individual performance influenced success more than conditions
Venue characteristics affected scoring trends

🏠 Home