First of all OP himself replied on my tweet!
crazy cool
Introduction
Tanmay Bhat is one of India's most popular content creators, known for his reaction videos and gaming streams. A unique aspect of his content is how he assigns memorable nicknames to his guests and fellow creators, always ending with "OP" (typically meaning "overpowered" in gaming culture). This project aims to catalog and analyze these nicknames using AI and speech recognition.
The Idea
While watching Tanmay's reaction videos, a pattern emerges: each guest receives a unique nickname ending with "er OP" or "ar OP", such as "Rider OP" or "Prisoner OP". So I thought of creating a comprehensive database of all these nicknames.
Naturally, my first instinct was to ask Perplexity:

Since it showed no results, I decided to take matter into my own hands.
"Fine, I'll do it myself."
Approach
I wanted to programmatically figure this out instead of manually watching all the videos and noting down the nicknames.
Getting Transcriptions
If we could transcribe what was spoken in the videos, extracting the nicknames from the text would be straightforward.
-
Video Metadata Fetching:
- Initially used the YouTube Data API v3, which presented challenges in handling channel and playlist IDs, and the documentation was unnecessarily complex.
- Transitioned to yt-dlp, which significantly simplified video metadata and audio extraction but required ~8–9 hours to process all the videos.
-
Audio Processing:
- Initially trimmed only the first 30 seconds of audio, but realized nicknames were often introduced later in the video. Increased the duration to 2 minutes for broader coverage.
- Used ffmpeg to trim audio for transcription.
- Built a pipeline to avoid redundant audio downloads, optimizing resource usage.
Transcription Services
To transcribe audio into text, I tested Deepgram, Whisper, and Google STT APIs:
-
Google STT API:
- Produced highly inaccurate results on multi-language audio, likely due to difficulties with Indian accents and mixed-language content.
-
OpenAI Whisper API:
- Handled multilingual transcriptions well when set to translate to English but struggled with Hinglish-specific nuances.
- Experimented with open-source transliteration APIs to convert Hindi text to Hinglish, but results were unsatisfactory.
-
Deepgram API:
- Provided high accuracy for English but failed to process Hindi or Hinglish effectively, even with the language parameter set to "multi."
-
Fallback Mechanism:
- Both Whisper and Deepgram results were stored for comparison to maximize transcription accuracy.
Nickname Extraction
- Regex Logic:
- Extracted nicknames matching patterns like
<Something ending with er> OP
or<Something ending with ar> OP
.
- Extracted nicknames matching patterns like
- GPT-4 Processing:
- Utilized a refined prompt to infer and standardize nicknames when transcriptions contained errors or ambiguities.
- Addressed issues like incorrect nicknames (
"Prznar OP"
) by standardizing them ("Prisoner OP"
).
Check the entire prompt sent for processing:
const prompt = `
You are an expert at extracting nicknames from YouTube video transcripts, especially skilled at identifying badly transcribed nicknames that end with "OP".
### Critical Understanding:
1. Due to speech-to-text errors, nicknames often get badly mangled:
- "Hustler OP" might appear as "has the rope", "hustle rope", "hazlr op"
- "Writer OP" might appear as "write rope", "riter op"
- "Gamer OP" might appear as "game rope", "gamma op", "gaymer op"
2. Look for these specific patterns:
- Words near introduction phrases like "let me introduce", "we got", "with us"
- Words ending in sounds like "rope", "hope", "op", "rop", "ob", "opie"
- Continuous sequences that might be split incorrectly (e.g., "has the rope" → "Hustler OP" , "Bank Aurobi" -> "Banker OP" )
3. Common Transcription Problems:
- Words get split incorrectly: "hus tler" instead of "hustler"
- "OP" gets transcribed as: "rope", "hope", "ob", "op", "rop", "opi", "Aurobi"
- Sounds get merged: "wriderop" instead of "Writer OP"
- Extra words inserted: "has the" instead of "hustler"
### Analysis Steps:
1. First, find obvious nicknames (ending with OP/rope/hope)
2. Then, look for split or mangled versions:
- Look for sequences of 2-3 words that could be a nickname
- Check if any word combinations could form a valid nickname
- Pay special attention to words near introduction phrases
3. For each potential nickname:
- Try combining adjacent words
- Remove common filler words ("the", "a", "has")
- Check if the combined form makes a valid "-er/-ar" nickname
### Example Corrections:
"has the rope" → "Hustler OP"
"game write rope" → "Gamer OP", "Writer OP"
"stream rope killer hope" → "Streamer OP", "Killer OP"
### Transcription to Analyze:
${transcription}
Return ONLY a JSON array of the corrected nicknames. Each nickname should:
- End with " OP"
- Have proper capitalization
- Be a real word ending in -er or -ar
Example: ["Gamer OP", "Writer OP", "Hustler OP"]
`;
Challenges and Solutions
Mixed-Language Transcriptions
- Problem: Transcription services often ignored or misprocessed Hindi/Hinglish content.
- Solution: Combined Deepgram (for English) and Whisper (for multilingual) to cover both scenarios. Results from both were saved for further analysis.
Inconsistent Nickname Standardization
- Problem: Transcriptions contained errors (e.g.
prisn ropie
instead ofPrisoner OP
,Show Hero OP
instead ofShauhar OP
). - Solution: Wrote a robust GPT prompt with examples and validation to infer meaningful and standardized nicknames. Did little bit of manual quality check to ignore incorrect names and matched a few similar names.
Implementation
Workflow
- Video Metadata:
- Fetched video details using
yt-dlp
.
- Fetched video details using
- Audio Processing:
- Downloaded and trimmed 2-minute audio segments with
ffmpeg
.
- Downloaded and trimmed 2-minute audio segments with
- Transcription:
- Sent audio to both Deepgram and Whisper APIs.
- Nickname Extraction:
- Applied Regex for direct extraction.
- Used GPT for advanced inference and standardization.
Key Learnings
- STT models need more training on data spoken in both Hindi and English (Hinglish), as they currently struggle with mixed-language content.
- GPT models require specific, detailed prompts for optimal results; simple instructions are often insufficient.
- Claude and GPT-4 still have limitations when improving frontend code; they sometimes introduce new bugs even when making minor changes.
- Both Claude and GPT-4 demonstrate strong visual comprehension abilities. When shown application screenshots, they can accurately identify and suggest fixes for specific visual issues.