Complete overview of the ElectroFind system — data pipeline, AI model, architecture, and API integrations.
ElectroFind is a final-year B.E. Artificial Intelligence project that solves the electronics price comparison problem using a combination of offline AI-trained models and real-time web data aggregation. The system has two core layers:
Offline AI Layer
Real-Time Layer
The offline data pipeline was built to collect, clean, and transform a large dataset of electronics products for model training.
Web Scraping
Automated collection of 50,000+ electronics product listings from Amazon and Flipkart spanning 10 categories including smartphones, laptops, and audio.
Raw Storage
Data stored in structured CSV format with fields: title, price, rating, review_count, category, platform, ASIN/product_id, image_url.
Data Cleaning
Remove duplicates, handle null values, normalize price formats (₹ to $), strip HTML from titles, standardize category labels.
Feature Engineering
Extract brand names, storage variants, color attributes, and model numbers from title strings using regex and NLP rules.
Vectorization
Apply TF-IDF vectorization to cleaned product titles, generating sparse feature matrices for similarity computation.
Export
Clean dataset exported as processed CSV and pickle files for use in model training and the similarity engine.
# Simplified data pipeline (Python)
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MinMaxScaler
# Load raw scraped data
df = pd.read_csv('raw_electronics.csv')
# Clean titles
df['clean_title'] = df['title'].str.lower()
.str.replace(r'[^a-z0-9 ]', '', regex=True)
.str.strip()
# Remove duplicates based on title similarity
df = df.drop_duplicates(subset=['clean_title', 'platform'])
# Normalize prices
df['price_usd'] = df['price'].apply(normalize_price)
# TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1,2))
tfidf_matrix = vectorizer.fit_transform(df['clean_title'])The AI model serves two purposes: cross-platform product matching (deduplication) and validation of the Buy Score signal weights.
Cosine Similarity Matching
Buy Score Validation
# Cosine similarity matching
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
def find_matches(query_title, product_matrix, vectorizer, threshold=0.85):
query_vec = vectorizer.transform([query_title])
similarities = cosine_similarity(query_vec, product_matrix).flatten()
matches = np.where(similarities >= threshold)[0]
return matches, similarities[matches]
# Buy Score calculation
def buy_score(price, rating, reviews, min_p, max_p):
r_score = (rating / 5.0) * 40
rv_score = min(math.log10(reviews + 1) / math.log10(100_000) * 30, 30)
p_score = (1 - (price - min_p) / (max_p - min_p)) * 30 if max_p > min_p else 15
return round(r_score + rv_score + p_score, 1)API Layer (Next.js)
Data Sources
Frontend (Next.js 14)
// Simplified API route flow
export async function GET(req: NextRequest) {
const q = req.nextUrl.searchParams.get("q");
// 1. Fetch all platforms in parallel (each has own 40s timeout)
const [oxyResult, serperResult] = await Promise.all([
fetchAllPlatforms(q), // Amazon + Flipkart + Best Buy + eBay
fetchSerperShopping(q), // Google Shopping (with images)
]);
// 2. Merge, score and sort
const all = [...oxyResult.products, ...serperResult.products];
const scored = calculateScores(all).sort((a, b) => b.score - a.score);
return NextResponse.json({ products: scored, errors, total: scored.length });
}Every product receives a Buy Score (0–100) calculated from three signals validated against our training dataset.
| Signal | Weight | Formula |
|---|---|---|
| Star Rating | 40 pts | (rating / 5.0) × 40 |
| Review Volume | 30 pts | log10(reviews+1) / log10(100k) × 30 |
| Price Value | 30 pts | (1 – (price – minP) / (maxP – minP)) × 30 |
72–100
Excellent Buy
45–71
Good Value
0–44
Consider Options
Oxylabs Realtime Scraper
Serper.dev Shopping API
User Input
User enters query + category on home page or search page. Enter key or button click triggers navigation to /search?q=...&category=...
URL Navigation
Next.js router pushes to /search. SearchSection reads query from URL params via useSearchParams() and auto-triggers search on mount.
Parallel API Fetch
GET /api/search fires Promise.all([fetchAllPlatforms, fetchSerperShopping]). Each platform has its own 40s AbortController.
Score & Sort
calculateScores() applies Buy Score formula across all products. Results sorted descending by score before returning JSON.
Client Render
SearchSection receives products, sets activePlatforms from returned results. User can filter by platform, sort, and toggle filter panel.
Product Card
Each ProductCard shows rank badge, platform badge, title, price, rating, Buy Score pill with color coding, and View Deal link.
Ready to try it?
Search and compare electronics across 5 platforms in real-time.