Vijay Gokarn, Author at Vijay Gokarn

GitHub Copilot + VS Code: Tips, Tricks, and Best Practices

Vijay Gokarn — Mon, 25 May 2026 03:42:50 +0000

GitHub Copilot in VS Code — The Ultimate Hidden Tricks, Agents, CLI & Prompts Guide (2026)

Developer Tools AI Coding Agents

GitHub Copilot in VS Code — Hidden Tricks, Agents, CLI & Prompts That 95% of Developers Never Use

May 24, 2026 · 38 min read

Tools VS Code · Copilot CLI · GitHub.com

Use Case Coding · Automation · DevOps

AI Features Agents · NES · MCP · Cloud Agent

Cost Free tier available

Most developers use Copilot as a fancy autocomplete. They press Tab, accept a suggestion, and think they have seen it all. What they are missing is an entire operating system of AI-powered features — autonomous agents that write and test code across your project, a CLI that turns your terminal into a coding partner, custom instruction files that teach Copilot your team’s conventions, MCP servers that connect Copilot to external tools, a cloud agent that opens pull requests while you sleep, and a /chronicle command that searches your own AI history. This is every hidden trick I have found after months of daily use — with real prompts, real configs, real repos, and the keyboard shortcuts that make the difference.

Context

Why most developers only use 10% of Copilot

GitHub Copilot started as inline code completion in 2021. By mid-2026, it has evolved into something radically different — an agent platform with autonomous multi-file editing, cloud-based PR generation, a terminal-native CLI, plugin marketplaces, and deep customization through markdown files. But the interface looks deceptively simple. A chat panel. Some ghost text. Tab to accept. Most developers never dig past that surface.

The hidden features fall into six layers that most tutorials skip entirely:

Keyboard-level tricks — partial accepts, next edit suggestions, cycling alternatives — that make inline completions 3x more useful.

Custom instruction files — .github/copilot-instructions.md, path-specific instructions, and AGENTS.md — that teach Copilot your project’s rules permanently.

Prompt files & custom agents — reusable .prompt.md and .agent.md files that turn repetitive workflows into one-command actions.

MCP servers — Model Context Protocol connections that let Copilot read GitHub issues, run Playwright tests, query databases, and interact with cloud services.

Copilot CLI — a terminal-native agent with /memory, /chronicle, /delegate, custom agents, and built-in MCP servers.

Cloud agent & Autopilot — async agents that run on GitHub Actions infrastructure, opening PRs while you work on something else entirely.

This article covers all six layers with the exact prompts, configs, keyboard shortcuts, and GitHub repos you need. Bookmark it.

Hidden Trick 1

Keyboard shortcuts most developers never learn

⌨

Inline Suggestion Controls

Accept word-by-word · Cycle alternatives · Next Edit Suggestions (NES)

Tab to Accept Partial Accept NES Predictions Free Tier

The default behavior is Tab to accept the full suggestion and Esc to dismiss. But there are five other interactions most developers never discover.

Partial Accept — Word by Word

Instead of accepting the entire suggestion, press Ctrl+→ (Mac: ⌘+→) to accept one word at a time. This is transformative when Copilot gets the first half of a line right but drifts on the second half. You grab the good part, then type the rest yourself. On Visual Studio 2026, you can also press Ctrl+↓ to accept one line at a time from multi-line suggestions.

Cycle Through Alternatives

Copilot often generates multiple suggestions. Press Alt+] to see the next alternative and Alt+[ to go back. If none of the inline suggestions work, press Ctrl+Enter to open a new tab showing all alternative suggestions side by side — you can accept any one of them individually.

Next Edit Suggestions (NES)

This is the feature most developers do not even know exists. NES watches your editing pattern — say you are renaming a variable — and predicts where your next edit should be and what it should contain. A small arrow appears in the gutter. Press Tab to jump to the predicted location, then Tab again to accept the edit. It turns repetitive multi-location edits into a Tab-Tab-Tab flow.

✦ Enable NES: In VS Code, go to Settings → search “next edit suggestions” → enable github.copilot.nextEditSuggestions.enabled. In Visual Studio 2026, go to Tools → Options → GitHub → Copilot → Enable Next Edit Suggestions.

Inline Chat — The Secret Refactoring Weapon

Press Ctrl+I (Mac: ⌘+I) to open a chat prompt directly inside the editor at your cursor position. Describe a change — “make this function async,” “add error handling,” “convert to TypeScript” — and Copilot suggests edits in place. No context switching. No copy-pasting from the sidebar chat. This is the fastest way to do targeted refactors.

Full Keyboard Shortcut Reference

Action	Windows / Linux	macOS
Accept full suggestion	`Tab`	`Tab`
Dismiss suggestion	`Esc`	`Esc`
Accept next word	`Ctrl`+`→`	`⌘`+`→`
Next alternative	`Alt`+`]`	`⌥`+`]`
Previous alternative	`Alt`+`[`	`⌥`+`[`
Show all suggestions in tab	`Ctrl`+`Enter`	`⌃`+`Enter`
Open inline chat	`Ctrl`+`I`	`⌘`+`I`
Open Chat view	`Ctrl`+`Alt`+`I`	`⌃`+`⌘`+`I`
Switch to Agent mode	`Ctrl`+`Shift`+`I`	`⇧`+`⌘`+`I`
NES: jump to next edit	`Tab` (at gutter arrow)	`Tab`

Hidden Trick 2

Custom instructions — teach Copilot your codebase permanently

📋

copilot-instructions.md

Repository-wide context · Auto-included in every chat · Version-controlled

Always Active Team-Shareable Zero Config All Editors

This is the single most impactful customization you can make, and most teams never create one. Drop a file at .github/copilot-instructions.md in your repo root, and its contents are automatically included in every single Copilot chat interaction for anyone working in that workspace. No opt-in. No configuration. Just commit the file.

What to put in it — the five essential sections

Example — copilot-instructions.md

# Project Overview
This is a NestJS + Prisma API serving a React frontend. We use PostgreSQL on AWS RDS. All API routes are under /api/v2/.

# Coding Standards
– TypeScript strict mode. No `any` types.
– All database queries go through Prisma. Never write raw SQL.
– Use barrel exports (index.ts) for every module.
– Error handling uses our custom `AppError` class from `src/lib/errors.ts`.

# Testing
– Unit tests use Vitest. Integration tests use Supertest.
– Every new endpoint needs at least one happy-path and one error-case test.

# Git Conventions
– Conventional Commits: feat:, fix:, chore:, docs:
– Branch naming: feature/TICKET-123-description

# Build & Run
– `npm run dev` starts the dev server
– `npm run test` runs the full test suite
– `npm run db:migrate` applies Prisma migrations

What this produces: Every time anyone on your team asks Copilot to write code, generate tests, or create a PR description, it already knows your stack, your conventions, your error patterns, and your build commands. No more “actually, we use Vitest not Jest” corrections.

Auto-generate it with one click

In VS Code, click the Configure Chat gear icon in the Chat view and select “Generate Chat Instructions.” Copilot will analyze your repository and draft a custom instructions file tailored to your project. Or, even better — use the cloud agent:

Cloud Agent Prompt — Auto-Generate Instructions

Your task is to “onboard” this repository to Copilot by adding a .github/copilot-instructions.md file that contains information describing how an agent seeing it for the first time can work most efficiently. Document which commands work, the order they should be run, any errors you encounter and workarounds, and any other information that reduces time spent exploring.

Path-specific instructions

Need different rules for your frontend vs. backend? Create files in .github/instructions/ with a .instructions.md extension. Each file has YAML frontmatter specifying which file paths it applies to:

.github/ ├── copilot-instructions.md # Global — applies everywhere └── instructions/ ├── react.instructions.md # applyTo: “src/frontend/**” ├── api.instructions.md # applyTo: “src/api/**” └── tests.instructions.md # applyTo: “**/*.test.ts”

AGENTS.md — the cross-agent instruction file

If your team uses multiple AI agents (Copilot, Claude Code, Gemini), you can create an AGENTS.md file at the repo root. Copilot treats it as primary instructions alongside copilot-instructions.md. You can also use CLAUDE.md and GEMINI.md for agent-specific instructions.

Personal instructions that follow you everywhere

Create a file at $HOME/.copilot/copilot-instructions.md with your personal preferences — indentation style, preferred libraries, response format. These apply across all projects without polluting any repo.

Hidden Trick 3

Prompt files — reusable recipes your whole team shares

📝

.prompt.md Files

On-demand task templates · YAML frontmatter · Model & tool selection

Triggered via /command Version Controlled Specify Model Chain with Agents

While instructions set the background context for every interaction, prompt files are specific workflows you trigger on demand by typing /prompt-name in the chat input. They are the recipes in your team’s cookbook — checked into source control and shared with everyone.

File structure

.github/ └── prompts/ ├── review.prompt.md # Type /review in chat ├── generate-api.prompt.md # Type /generate-api in chat ├── write-tests.prompt.md # Type /write-tests in chat └── security-audit.prompt.md # Type /security-audit in chat

Prompt: Code Review Workflow

.github/prompts/review.prompt.md

—
description: “Comprehensive code review with actionable feedback”
model: claude-sonnet-4-6
—

Review the selected code for:
1. **Bugs and edge cases** — null checks, off-by-one errors, race conditions
2. **Security** — injection risks, auth bypass, exposed secrets
3. **Performance** — unnecessary loops, missing indexes, N+1 queries
4. **Readability** — naming, function length, comments that explain “why” not “what”

For each finding, provide: the exact line, the issue, and a concrete fix.
End with a summary: “Ship it ✅” or “Needs changes 🔄” with a one-line reason.

Prompt: Generate REST Endpoint

.github/prompts/generate-api.prompt.md

—
description: “Scaffold a full REST endpoint with tests”
model: gpt-5.3-codex
agent: code
tools: [“terminal”, “editFiles”, “codeSearch”]
—

Generate a complete REST endpoint following the patterns in this repo:
1. Look at existing endpoints in src/api/ for conventions
2. Create the route handler, service, and Prisma query
3. Add input validation using our existing Zod schemas
4. Write Vitest unit tests (happy path + error cases)
5. Update the barrel export in the module’s index.ts

Reference: [coding standards](../../copilot-instructions.md)

Notice the last line — prompt files can reference other markdown files via links, pulling in your coding standards without duplication.

✓ Pro tip: The agent: field in the frontmatter tells the prompt to run inside a specific custom agent, inheriting its tool set. The tools: field restricts which tools are available. This means you can create a “review-only” prompt that cannot modify files — only read and comment.

Hidden Trick 4

Custom agents — specialist personas for your team

🤖

.agent.md Files

Role-specific AI · Tool permissions · Agent handoffs · Subagents

Specialist Roles Handoff Chains Cloud & Local User-Level Agents

Custom agents (formerly called “custom chat modes”) are the most powerful customization layer. While instructions are rules and prompts are recipes, agents are entire personas — they define how Copilot thinks, which tools it can use, and even which other agents it can hand off to.

File structure

.github/ └── agents/ ├── planner.agent.md # Architecture & design planning ├── implementer.agent.md # Code writing & file editing ├── reviewer.agent.md # Code review — read-only tools └── security.agent.md # Security scanning & audit

Example: Security Auditor Agent

.github/agents/security.agent.md

—
description: “Security-focused code auditor”
model: claude-sonnet-4-6
tools: [“codeSearch”, “readFile”, “listFiles”]
—

You are a senior application security engineer. Your job is to find vulnerabilities, not to write code.

When reviewing code:
– Check for OWASP Top 10 vulnerabilities
– Look for hardcoded secrets, API keys, and credentials
– Identify SQL injection, XSS, CSRF, and SSRF vectors
– Check authentication and authorization patterns
– Review dependency versions for known CVEs

NEVER suggest “it looks fine.” Always find at least one area for improvement.
Rate severity as CRITICAL / HIGH / MEDIUM / LOW for each finding.

Agent handoffs — chaining workflows

Agents can be chained into guided workflows. After one agent finishes, a handoff button appears in the chat to transition to the next agent with pre-filled context. For example: Plan → Implement → Review. This turns a complex multi-step workflow into a guided sequence.

Personal agents that follow you

Since April 2026, you can store agents at %USERPROFILE%/.github/agents/ (Windows) or ~/.github/agents/ (Mac/Linux). These personal agents travel with you across every project — perfect for your own debugging agent or writing-style agent.

Hidden Trick 5

MCP servers — connect Copilot to everything

🔌

Model Context Protocol

GitHub MCP built-in · Playwright · Azure · Database · Custom servers

Open Standard VS Code + CLI + Cloud External Tools GitHub Registry

MCP is the open protocol that lets Copilot talk to external tools, data sources, and services. Think of it as USB-C for AI — a universal connector. The GitHub MCP server is built-in and provides access to issues, PRs, repos, and Actions. But you can add any MCP server for databases, cloud providers, monitoring tools, and more.

The GitHub MCP Server — already there, already powerful

The GitHub MCP server gives Copilot read/write access to your GitHub data. It is built into Copilot CLI and available without additional configuration. In VS Code, it connects automatically via OAuth. This means you can say things like:

Agent Mode Prompt — Using GitHub MCP

Find all open issues labeled “bug” in this repo, sort them by most recent, and create a summary with the title, assignee, and how many days each has been open. Then pick the oldest unassigned bug and create a fix for it.

Adding custom MCP servers

In VS Code, add MCP servers to your workspace .vscode/settings.json or user settings:

settings.json — MCP Configuration

{
  "mcp": {
    "servers": {
      "playwright": {
        "type": "local",
        "command": "npx",
        "args": ["@playwright/mcp@latest"]
      },
      "postgres": {
        "type": "local",
        "command": "npx",
        "args": ["@modelcontextprotocol/server-postgres",
               "postgresql://user:pass@localhost/mydb"]
      },
      "context7": {
        "type": "http",
        "url": "https://mcp.context7.com/mcp"
      }
    }
  }
}

In Copilot CLI — even easier

Terminal — Add MCP Server to CLI

# Interactive add
$ copilot
> /mcp add

# Or edit ~/.config/github-copilot/mcp.json directly
{
  "mcpServers": {
    "playwright": {
      "type": "local",
      "command": "npx",
      "args": ["@playwright/mcp@latest"],
      "tools": ["*"]
    }
  }
}

Best MCP servers to add in 2026

MCP Server	What It Does	Use Case
GitHub (built-in)	Issues, PRs, repos, Actions, Copilot Spaces	Project management from chat
Playwright	Read, interact with, and screenshot web pages	UI testing, visual regression
Context7	Up-to-date library documentation	Always-current API references
PostgreSQL	Query and inspect database schemas	Write queries with real schema context
Azure	Azure resource management	Cloud infrastructure from terminal
Filesystem	Read/write access to local files	Broader file context for agents
Knowledge Graph Memory	Persistent local knowledge graph	Long-term context across sessions

⚡

modelcontextprotocol/servers

Official MCP server implementations — filesystem, PostgreSQL, Puppeteer, Brave Search, and more

→

Hidden Trick 6

Copilot CLI — your terminal becomes an AI agent

GitHub Copilot CLI

Terminal-native AI · Built-in agents · /chronicle · /delegate · /memory

Explore Agent Task Agent /chronicle Search /delegate to Cloud Plugin Marketplace

Copilot CLI is not just “Copilot in a terminal.” It is a full agent runtime with its own ecosystem of commands, built-in specialist agents, persistent memory, session history search, cloud delegation, and a plugin marketplace. If you have not tried it, you are missing the most powerful Copilot surface.

Installation

Terminal — Install Copilot CLI

# macOS (Homebrew)
brew install gh
gh extension install github/gh-copilot

# Windows (WinGet)
winget install GitHub.GitHubCLI
gh extension install github/gh-copilot

# npm (cross-platform)
npm install -g @githubnext/github-copilot-cli

# Start it
$ copilot

# Authenticate
> # Follow browser OAuth flow — one-time setup

/chronicle — search your AI history

This experimental feature lets you query your own chat history across all sessions. Trying to remember which file you refactored last Tuesday? What command fixed that build error? Which PR you referenced?

Copilot CLI — /chronicle

> /chronicle what files did I change for the auth migration last week?

> /chronicle show me the Prisma query I wrote for the user dashboard

> /chronicle which PR did I reference when fixing the payment bug?

/memory — persistent context

The /memory command gives the CLI persistent context that survives across sessions. Set project-specific facts once, and they stay active:

Copilot CLI — /memory

> /memory set "This project uses pnpm, not npm. Always use pnpm commands."
> /memory set "The staging server is at staging.myapp.com"
> /memory set "Database migrations require running: pnpm prisma migrate dev"
> /memory list
> /memory clear

/delegate — hand off to cloud agent

For well-defined tasks, you can delegate directly from the CLI to the cloud agent. Copilot preserves your current session context, creates a new branch, opens a draft PR, and makes the changes on GitHub Actions infrastructure — while you keep working locally.

Copilot CLI — /delegate

> /delegate Add comprehensive error handling to the payment 
  processing module. Follow the patterns in src/lib/errors.ts.
  Include retry logic for network failures.

# Copilot creates a GitHub issue, starts a cloud agent session,
# and you get a PR notification when it's done.

Built-in specialist agents

Agent	Command	What It Does
Explore	`/agent explore`	Fast codebase analysis. Ask questions about your code without cluttering main context.
Task	`/agent task`	Runs commands like tests and builds. Focused execution.
Custom	`/agent my-agent`	Any .agent.md file from .github/agents/ or ~/.github/agents/

Plugins — install community extensions

Copilot CLI — Plugin Marketplace

# Install a plugin from the Awesome Copilot marketplace
copilot plugin install security-auditor@awesome-copilot
copilot plugin install test-generator@awesome-copilot

# List installed plugins
copilot plugin list

# If marketplace isn't registered (older CLI versions)
copilot plugin marketplace add github/awesome-copilot
copilot plugin install @awesome-copilot

Real CLI workflow — review and fix

Copilot CLI — Full Workflow Example

> Review @src/api/payments.ts for potential improvements. Focus on error handling and code quality.

[Copilot reads the file, analyzes it, provides specific feedback]

> Fix the issues you found. Write the corrected code.

[Copilot edits the file in place]

> Now write tests for the changes you made.

[Copilot creates test file, runs them]

> /delegate Create a PR with these changes titled “fix: improve payment error handling”

github/copilot-cli-for-beginners

Official 8-chapter hands-on course — from installation to agent workflows

→

Hidden Trick 7

Agent mode & Autopilot — autonomous coding in VS Code

⚡

Agent Mode + Autopilot

Multi-file editing · Terminal access · Zero-approval mode · Permission levels

Autonomous Multi-File Auto-Retry Browser Debugging Preview

Agent mode is when Copilot stops being an autocomplete and becomes an autonomous coding agent. It takes a goal, breaks it into steps, edits multiple files, runs terminal commands, checks results, and iterates until the task is done. Activate it with Ctrl+Shift+I or by selecting “Agent” from the chat mode picker.

Three permission levels

Level	What Happens	When to Use
Default	Copilot asks for approval before running terminal commands or editing files	Unfamiliar codebases, sensitive code
Bypass Approvals	Copilot runs commands and edits files without asking — but you can intervene	Trusted projects, known patterns
Autopilot ⚠	Copilot approves its own actions, auto-retries on errors, and works autonomously until complete	Well-defined tasks on non-critical branches

⚠ Autopilot is in preview. It is powerful but opinionated. Start with Default mode, graduate to Bypass Approvals once you trust the patterns, and only use Autopilot for tasks where you are comfortable with fully autonomous execution on a feature branch.

Agent mode prompts that actually work

Agent Mode — Scaffold a Feature

Create a user notification system. Look at the existing patterns in src/features/ for the module structure. I need: a Prisma model for notifications (id, userId, type, message, read, createdAt), a NestJS service with methods for create, markAsRead, and getUnreadByUser, a REST controller at /api/v2/notifications with GET and PATCH endpoints, and Vitest tests for the service. Run the tests when done and fix any failures.

Agent Mode — Migrate Legacy Code

Migrate the JavaScript files in src/legacy/ to TypeScript. For each file: add type annotations (no ‘any’ types), convert require() to ES module imports, add interfaces for function parameters and return types, and ensure all existing tests still pass after conversion. Do one file at a time, run tests between each file, and stop if a test fails.

Agent Mode — Debug a Production Issue

Users are reporting that the dashboard takes 12 seconds to load. The endpoint is GET /api/v2/dashboard. Investigate the performance issue: trace the request through the controller → service → database queries. Check for N+1 queries, missing database indexes, or unnecessary data fetching. Fix the issue, add a performance test that asserts the response time is under 500ms, and run it.

New in April 2026: Inline diffs, browser tab sharing, terminal read/write

Agents now show inline diffs directly in the chat — so you can review multi-file changes without switching tabs. They can share your browser tabs (for debugging web apps) and have read/write access to any open terminal. Combined with configurable thinking effort (control how deeply reasoning models think before responding), agent mode feels genuinely collaborative.

Hidden Trick 8

Cloud agent — close the IDE, get a PR

☁

Copilot Cloud Agent

Runs on GitHub Actions · Opens PRs · Async workflows · Plan-first mode

Async Execution GitHub Issues Playwright Built-in Branch Without PR Plan Mode

The cloud agent (formerly “coding agent”) runs on GitHub Actions infrastructure, not your local machine. You assign it a task — from VS Code, the CLI, GitHub.com, Slack, Teams, Jira, or Linear — and it creates a GitHub issue, clones your repo, does the work asynchronously, and opens a pull request when done. You can close your laptop. The work continues without you.

Three ways to trigger cloud agent

From VS Code: Select “Cloud” from the agent picker in Copilot Chat. Describe your task. Copilot creates a GitHub issue and starts working remotely.

From GitHub.com: Assign the copilot user to any GitHub issue. The cloud agent picks it up and works on it.

From the CLI: Use /delegate with context from your current session preserved.

New: Branch-only mode & plan-first mode

Since April 2026, the cloud agent no longer requires creating a PR. It can work on a branch without opening a PR — review the full diff, iterate with Copilot, and only create a PR when you are ready. You can also ask for a plan first and review the approach before any code is written.

Cloud Agent — Plan-First Mode

Create a plan for adding WebSocket support to this Express app. I need real-time notifications for the dashboard. Show me the implementation plan — which files will be created or modified, what dependencies are needed, and the order of changes — before writing any code.

Best use cases (and where it struggles)

Great For	Not Great For
Well-defined bugs with clear reproduction steps	Ambiguous tickets that need product judgment
Dependency bumps and version upgrades	Cross-cutting architectural changes
Adding tests to existing code	Exploratory back-and-forth design
Documentation updates	UI/UX work requiring visual review
Conventional refactors (rename, extract, move)	Security-critical code changes

Hidden Trick 9

Copilot Spaces — curated knowledge bundles

Copilot Spaces are curated collections of repos, docs, and transcripts that you bundle together so Copilot grounds its answers in your specific context. Think of it as a custom knowledge base — pick a set of repositories, internal documentation, and meeting transcripts, and Copilot references them when answering questions.

Create a Space from GitHub.com, add your repos and docs, and every Copilot interaction within that Space is grounded in your team’s actual context — not generic training data. The GitHub MCP server includes Copilot Spaces tools, so they are accessible from both VS Code and the CLI.

Hidden Trick 10

Smart Actions — right-click AI powers

VS Code includes predefined AI-powered actions that most developers never find because they are buried in context menus. Right-click on code or use the Command Palette:

Smart Action	What It Does	Where to Find It
Generate Commit Message	AI-writes your commit message from staged changes	Source Control panel — sparkle icon ✨
Generate PR Description	Summarizes all changes into a structured PR description	GitHub PR creation
Rename Symbol	AI-suggested rename that considers usage context	Right-click → Rename Symbol
Fix Error	Reads the error, suggests a fix	Hover over red squiggly → Quick Fix
Explain This	Explains selected code in plain English	Right-click → Copilot → Explain This
Generate Tests	Writes tests for selected code	Right-click → Copilot → Generate Tests
Generate Docs	Writes JSDoc/docstring comments	Right-click → Copilot → Generate Docs
Semantic Search	Search by meaning, not just text match	Command Palette → “Copilot: Semantic Search”

✓ Hidden gem — commit messages: Stop writing commit messages manually. Stage your changes, click the sparkle icon in the Source Control panel, and Copilot generates a Conventional Commits-style message based on the actual diff. This alone saves 5–10 minutes per day.

Hidden Trick 11

#-mentions — the context trick that changes everything

When chatting with Copilot, you can use #-mentions to explicitly include files, symbols, or entire workspaces as context. This is the difference between Copilot guessing what you mean and Copilot knowing what you mean.

Mention	What It Adds	Example
`#file`	A specific file’s content	`#file:src/auth.ts explain this`
`#selection`	Currently selected code	`Refactor #selection to use async/await`
`#codebase`	Your entire workspace (indexed)	`#codebase where is the user model defined?`
`#terminalLastCommand`	Last terminal command output	`#terminalLastCommand why did this fail?`
`#terminalSelection`	Selected terminal text	`#terminalSelection explain this error`
`#problems`	Current VS Code problems panel	`#problems fix all TypeScript errors`
`#changes`	Git uncommitted changes	`#changes summarize what I've done today`
`@workspace`	Full workspace context	`@workspace how is authentication handled?`

In Copilot CLI, use @ to reference files: @src/api/payments.ts reads the file into context.

Hidden Trick 12

Agent skills — reusable scripts agents can call

Skills are bundled scripts and instructions that agents or prompt files can invoke. They live in .github/skills/ (also discovered from .claude/skills/ and .agents/skills/). A skill might contain a SKILL.md with instructions and a set of scripts — an agent reads the instructions and runs the scripts as needed.

Example Skill — Performance Audit

—
name: performance-audit
description: “Run Lighthouse audit and report results”
—

Use this skill when asked to audit performance. Steps:
1. Run `npx lighthouse http://localhost:3000 –output json –output-path ./lighthouse.json`
2. Parse the JSON for scores: Performance, Accessibility, Best Practices, SEO
3. Identify any metric below 90 and suggest specific fixes
4. Output a summary table with scores and recommendations

Hidden Trick 13

Hooks — event-triggered automations

Hooks let you set up event-triggered automations during Copilot coding agent sessions — useful for session logging, governance auditing, and custom post-processing. When an agent starts a session, commits code, or opens a PR, your hook scripts run automatically. These are defined in your repo configuration and are particularly useful for enterprise teams that need audit trails for AI-generated code.

Hidden Trick 14

Model selection — pick the right brain for the task

Most developers never change the default model. But Copilot now supports multiple AI models that you can switch between from the model picker in VS Code or the /model command in the CLI. Different models excel at different tasks:

Model	Best For	Premium Requests
GPT-5.3-Codex (default)	General coding, completions, chat	1x (LTS until Feb 2027)
Claude Sonnet 4.6	Complex reasoning, architecture, debugging	Premium
GPT-5 mini	Fast responses, simple tasks	0x (included free)
GPT-4.1	Legacy compatibility	0x (deprecated June 2026)

In prompt files and agent files, specify the model in the YAML frontmatter: model: claude-sonnet-4-6. For complex code reviews or architectural decisions, switching to Claude Sonnet 4.6 often produces significantly better results.

✦ Thinking effort: Since March 2026, you can control how deeply reasoning models think before responding. Adjust this directly from the model picker — higher thinking effort for complex problems, lower for quick tasks.

Essential Resources

GitHub repos — everything you need, bookmarked

★

github/awesome-copilot

175+ agents, 208+ skills, 176+ instructions, 48+ plugins — the official community-driven customization hub

→

github/copilot-cli-for-beginners

8-chapter hands-on CLI course — installation to agent workflows with a Python demo app

→

🤖

Code-and-Sorts/awesome-copilot-agents

Curated list of instructions, prompts, skills, MCPs, and agent markdown files

→

🧪

jaktestowac/awesome-copilot-for-testers

Copilot prompts, instructions, and chat modes specifically for test automation and QA

→

⚡

modelcontextprotocol/servers

Official MCP server implementations — filesystem, PostgreSQL, Brave Search, and more

→

🌐

awesome-copilot.github.com

Awesome Copilot website — searchable directory of agents, skills, plugins, and learning hub

→

📚

GitHub Copilot Customization Handbook

Complete guide to instructions, prompt files, custom agents, skills, and hooks

→

📋

VS Code Copilot Cheat Sheet

Official quick reference — agents, inline suggestions, chat, smart actions, and enterprise controls

→

Reference

The complete Copilot customization file tree

your-repo/ ├── .github/ │ ├── copilot-instructions.md # Global instructions — auto-included in every chat │ ├── instructions/ │ │ ├── react.instructions.md # Path-specific: applyTo “src/frontend/**” │ │ ├── api.instructions.md # Path-specific: applyTo “src/api/**” │ │ └── tests.instructions.md # Path-specific: applyTo “**/*.test.ts” │ ├── prompts/ │ │ ├── review.prompt.md # /review — on-demand code review │ │ ├── generate-api.prompt.md # /generate-api — scaffold endpoint │ │ └── write-tests.prompt.md # /write-tests — auto-generate tests │ ├── agents/ │ │ ├── planner.agent.md # Architecture & planning persona │ │ ├── implementer.agent.md # Code writing persona │ │ ├── reviewer.agent.md # Read-only review persona │ │ └── security.agent.md # Security audit persona │ └── skills/ │ ├── perf-audit/SKILL.md # Lighthouse audit skill │ └── db-migrate/SKILL.md # Database migration skill ├── AGENTS.md # Cross-agent instructions (Copilot + Claude + Gemini) └── .vscode/ └── settings.json # MCP server configs, Copilot settings # Personal (not in repo) ~/.copilot/ └── copilot-instructions.md # Personal prefs — follow you everywhere ~/.github/agents/ └── my-debug.agent.md # Personal agent — available in all projects ~/.config/github-copilot/ └── mcp.json # CLI MCP server configs

Comparison

When to use which Copilot surface

Scenario	Inline Suggestions	Chat	Agent Mode	CLI	Cloud Agent
Quick code completion	Best ✓	—	—	—	—
Explain unfamiliar code	—	Best ✓	—	Good	—
Multi-file feature	—	—	Best ✓	Good	Good
Fix a well-defined bug	—	—	Good	Good	Best ✓
Code review	—	Good	—	Best ✓	—
Large refactor	—	—	Best ✓	—	Good
CI/CD debugging	—	—	—	Best ✓	—
Backlog triage	—	—	—	—	Best ✓
Learning a codebase	—	Good	—	Best ✓	—

Quick Reference

Cheat sheet — tricks and prompts at a glance

VS Code

Partial Accept + NES Flow

Accept suggestions word-by-word with Ctrl+→. Enable Next Edit Suggestions for Tab-Tab-Tab editing flow. Cycle alternatives with Alt+]/[. These three shortcuts alone make inline completions 3× more useful.

Keyboard Shortcuts Inline Completions Zero Setup

Customization

copilot-instructions.md

Create .github/copilot-instructions.md — one file, auto-included in every chat for every team member. Include: project overview, coding standards, testing conventions, git workflow, and build commands. The highest-ROI Copilot customization you can make.

Team-Wide Version Controlled High Impact

Customization

Prompt Files + Custom Agents

Reusable .prompt.md files in .github/prompts/ — type /review or /generate-api and get consistent, team-standard workflows. Custom .agent.md files in .github/agents/ — specialist personas like security auditor, planner, implementer with handoff chains.

On-Demand Shareable Model Selection

CLI

/chronicle + /memory + /delegate

Search your AI history with /chronicle. Set persistent context with /memory. Hand off tasks to cloud agent with /delegate. Install community plugins with copilot plugin install. The CLI’s biggest feature drop shipped May 2026.

Terminal-Native Session History Plugin Marketplace

Agents

Autopilot + Cloud Agent

Autopilot mode in VS Code: agents approve their own actions and work autonomously until complete. Cloud agent: assign a task from VS Code, CLI, GitHub, Slack, or Jira — get a PR back. New plan-first mode lets you review the approach before code is written. Close the IDE. Get a PR.

Autonomous Async GitHub Actions Preview

Context

#-Mentions + MCP Servers

Use #file:, #codebase, #terminalLastCommand, #problems, #changes to give Copilot precise context. Add MCP servers for GitHub, Playwright, PostgreSQL, and more — so Copilot can interact with external tools and data sources directly.

Precision Context External Tools Open Standard

Summary

Key takeaways

Start with copilot-instructions.md — today

If you take one thing from this article, create a .github/copilot-instructions.md file in your main repo. Five minutes of work, permanent improvement in every Copilot interaction for your entire team. Use the auto-generate feature if you do not know where to start.

Learn the three keyboard shortcuts that matter

Partial accept (Ctrl+→), cycle alternatives (Alt+]), and inline chat (Ctrl+I). Master these before exploring anything else. They change the daily feel of working with Copilot more than any agent feature.

Try the CLI — it is not what you expect

Copilot CLI is not “Copilot in a terminal.” It is a full agent runtime with persistent memory, session history search, cloud delegation, and a plugin marketplace. The /chronicle command alone — searching your own AI history — is worth the install.

Use agent mode for multi-file work, cloud agent for async tasks

Agent mode shines for interactive, multi-file features where you want to stay in the loop. Cloud agent shines for well-defined tasks you can fire and forget — bug fixes, test additions, dependency bumps. Use both. They complement, not compete.

Customize in layers: instructions → prompts → agents → MCP

Start with global instructions. Add prompt files for workflows you repeat. Create agents for specialist roles. Connect MCP servers for external tools. Each layer multiplies the effectiveness of the ones below it. Do not try to set up everything at once — add layers as the value of the previous one becomes clear.

Vijay Gokarn

The post GitHub Copilot + VS Code: Tips, Tricks, and Best Practices appeared first on Vijay Gokarn.

3 Plugins That Actually Organize Your Life — Notion, Todoist & Obsidian in 2026

Vijay Gokarn — Sun, 24 May 2026 03:27:43 +0000

3 Plugins That Actually Organize Your Life — Notion, Todoist & Obsidian in 2026

Productivity AI Tools Life Organization

3 Plugins That Actually Organize Your Life — Notion, Todoist & Obsidian in 2026

May 23, 2026 · 10 min read

Tools

Notion · Todoist · Obsidian

Use Case

Weekly Planning · Tasks · Knowledge

AI Features

Notion AI · Todoist Assist · Dataview

Cost

Free tiers available

How I stopped using five scattered apps and consolidated everything — tasks, goals, knowledge, and weekly planning — into three plugins that actually talk to each other. With real prompts, real screenshots, and the workflows that stuck after six months of daily use.

Context

Why most “organize your life” advice fails

Every January, millions of people download a new planner app, set up color-coded labels, and abandon everything by February. The problem is never the tool — it is the lack of a system that matches how your brain actually works.

I tried Trello boards for everything. I had a Notion dashboard with 47 pages I never opened. I kept a paper journal that lived under a stack of mail. Nothing stuck because I was treating “organization” as a single problem instead of what it actually is — three distinct problems that need three distinct tools:

Planning — structuring your week, goals, and projects so you know what matters before Monday starts.

Execution — capturing and completing tasks in the moment, on your phone, in a meeting, on the go.

Reflection — connecting what you learn, tracking patterns, and building a searchable brain over time.

Once I mapped the right tool to the right problem — Notion for planning, Todoist for execution, Obsidian for reflection — the whole thing clicked. Here is exactly how, with the prompts and workflows I use every week.

Plugin 1

Notion AI — your weekly command center

Notion

AI-powered workspace · Databases · Wikis · Custom AI blocks

Notion is not a to-do app and it should never be used as one. Its superpower is structured planning at the week and project level — the 30,000-foot view that keeps you from confusing “busy” with “productive.” With Notion AI’s custom blocks, you can automate the thinking that usually gets skipped.

Custom AI Blocks Linked Databases Free Tier Templates

Prompt: Sunday Weekly Reset

Notion AI Prompt — Weekly Planning

Look at my completed tasks from this week and my current project list. Generate a weekly plan for next week that includes: (1) the top 3 outcomes I should focus on ranked by impact, (2) one project that has stalled and a specific next action to unblock it, (3) one personal goal I have been neglecting. Format as a table with columns: Priority, Outcome, First Action, Time Estimate.

What this produces: A clean priority table that forces you to pick three things — not fifteen. The “stalled project” question is the real magic: Notion AI scans your database and surfaces the thing you have been avoiding. It turns a vague Sunday-night anxiety into a concrete Monday-morning action.

Prompt: Meeting Debrief to Action Items

Notion AI Prompt — Post-Meeting

I just had a meeting about [project name]. Here are my rough notes: [paste bullet points]. Extract every action item, assign an owner if mentioned, add a suggested due date, and flag any decision that was made but not explicitly documented. Output as a checklist I can drop into my project tracker.

What this produces: Instead of re-reading messy notes three days later, you get a clean checklist within 30 seconds of leaving the meeting. The “flag undocumented decisions” instruction catches the verbal agreements that usually fall through the cracks.

Prompt: Monthly Goal Audit

Notion AI Prompt — End-of-Month Review

Review my goals database for this month. For each goal, rate my progress as On Track, Behind, or Stalled. For any goal marked Behind or Stalled, suggest whether I should recommit with a revised timeline, delegate it, or drop it entirely. Be honest — if a goal has had zero activity for 3+ weeks, recommend dropping it.

What this produces: A brutally honest audit that prevents your goals list from becoming a graveyard of good intentions. The “recommend dropping” instruction is key — it gives you permission to say no to things that are not working.

A Notion weekly dashboard with linked databases — goals, projects, and AI-generated weekly priorities all in one view.

✓ Pro tip: Create a Notion template with a Custom AI Block pre-loaded with the weekly reset prompt. Every Sunday, one click creates a fresh plan — no copy-pasting needed.

Plugin 2

Todoist — capture everything, do what matters

✓

Todoist

Todoist Assist · Ramble voice input · 50M+ users · Cross-platform sync

Todoist is the execution layer. It lives on your phone, your watch, your browser. The goal is zero-friction capture — if a thought takes more than five seconds to record, it will not get recorded. With Todoist Assist (their AI suite launched in 2026), the app now breaks down complex tasks, suggests scheduling, and even creates filters from plain English.

Todoist Assist AI Ramble Voice Input Free Tier (5 projects) Natural Language Dates

Prompt: Brain Dump to Organized Tasks (via Ramble)

Todoist Ramble — Voice Capture

“I need to call the dentist tomorrow afternoon, also pick up the dry cleaning before Saturday, and I should really start researching flights for the Austin trip — that’s high priority because prices keep going up. Oh and remind me to send the quarterly report to Sarah by end of day Wednesday.”

What Ramble creates:
☐ Call the dentist — Tomorrow 2:00 PM
☐ Pick up dry cleaning — Friday
☐ Research flights for Austin trip — Today · Priority 1 🔴
☐ Send quarterly report to Sarah — Wednesday EOD

Ramble parses your natural speech into structured tasks with dates, priorities, and projects — in 38 languages. You speak messily. It organizes cleanly.

Prompt: Weekly Focus Filter

Todoist Filter Assist — Natural Language

“Show me all high priority tasks due this week that are not in the Someday project, sorted by due date”

What Filter Assist generates:
priority 1 & due before: next Monday & !#Someday

Instead of learning Todoist’s filter syntax, you describe what you want to see in plain English. Filter Assist writes the query. This one filter replaced three separate views I used to maintain manually.

Prompt: Break Down a Big Project

Todoist Task Assist — Project Decomposition

Select a task like “Plan team offsite for Q3” → click Task Assist → “Break this down into subtasks”

What Task Assist suggests:
☐ Survey team for preferred dates (due: this Friday)
☐ Research 3 venue options within budget
☐ Draft agenda with team-building activities
☐ Send calendar invites once venue confirmed
☐ Book catering and AV equipment
☐ Send pre-offsite survey to attendees

The AI considers the task content and generates a logical sequence of subtasks. You can edit, reorder, or add more — but the hard part (going from blank to a first draft) is handled.

Todoist’s Today view — the only screen you need to look at. High-priority tasks at the top, everything else out of sight.

⚠ Common mistake: Using Todoist as a project management tool. It is a task list. If you find yourself creating 15 nested subtasks with dependencies, you need Notion or a proper PM tool — not a deeper to-do list.

Plugin 3

Obsidian — build a second brain that compounds

◆

Obsidian

Local-first Markdown · 2,690+ community plugins · Your data, your vault

Obsidian is the reflection and knowledge layer. Every book you read, every idea you have, every lesson you learn — it goes here and gets linked to everything else you know. Three community plugins turn Obsidian from a Markdown editor into a life-tracking system: Dataview for automated queries, Templater for dynamic templates, and Periodic Notes for daily/weekly/monthly rhythms.

Dataview Plugin Templater Plugin Periodic Notes Plugin 100% Free

The Three Plugins You Need

Plugin	What It Does	Why It Matters
Dataview by BlackSmithGu	Queries your vault like a database. Write one line and it pulls all notes matching your criteria into a live table.	Your weekly review builds itself — no manual copying. “Show me all notes tagged #health from this month” just works.
Templater by SilentVoid13	Runs dynamic templates with auto-inserted dates, navigation links, and conditional logic when you create a note.	Every daily note starts pre-populated. No blank-page friction. The structure is always there waiting for you.
Periodic Notes by Liamcain	Creates daily, weekly, and monthly notes on a schedule using your templates, organized in date-based folders.	One click creates today’s note with the right template. Weekly and monthly reviews auto-generate at the right cadence.

Template: Friday Reflection Note

Obsidian Templater — Weekly Reflection

— date: <% tp.date.now("YYYY-MM-DD") %> type: weekly-reflection energy: focus-area: — ## Wins This Week – ## What Didn’t Work – ## Lessons Learned – ## Next Week’s One Big Thing – ## Dataview: This Week’s Notes “`dataview TABLE file.cday as “Created”, tags as “Tags” FROM “Daily Notes” WHERE file.cday >= date(<%tp.date.now("YYYY-MM-DD", -6)%>) SORT file.cday ASC “`

What happens: Every Friday, the Periodic Notes plugin creates this note automatically. The Dataview block at the bottom generates a live table of every daily note from the past seven days — so your week is summarized without you having to look anything up. You fill in the reflection prompts in under five minutes.

Template: Book/Article Capture

Obsidian Templater — Knowledge Capture

— date: <% tp.date.now("YYYY-MM-DD") %> type: reading-note source: author: rating: /5 tags: [reading, ] — ## Key Ideas 1. ## How This Connects to What I Already Know – [[]] — ## One Thing I Will Do Differently Because of This – ## Raw Highlights –

Why this template works: The “How This Connects” section forces you to link new knowledge to existing notes using Obsidian’s [[wikilinks]]. Over time, your vault becomes a web of connected ideas — not a graveyard of highlights you never revisit. The “One Thing I Will Do Differently” question turns passive reading into active change.

Template: Life Maintenance Log

Obsidian Templater — Maintenance Tracker

— date: <% tp.date.now("YYYY-MM-DD") %> type: maintenance category: item: next-due: tags: [maintenance, ] — ## What Was Done – ## Cost – ## Notes / Next Steps – ## Dataview: All Maintenance for This Category “`dataview TABLE date as “Date”, item as “Item”, next-due as “Next Due” FROM #maintenance WHERE category = this.category SORT date DESC “`

Why this matters: When did you last change the HVAC filter? Replace the car’s cabin air filter? The Dataview query at the bottom auto-generates a history of every maintenance task in the same category. No more guessing. Search “HVAC” and every service is there with dates and costs.

An Obsidian vault with daily notes, Dataview-generated tables, and linked references connecting ideas across months of entries.

🔗 Smart Connections plugin: If you want AI-powered semantic search inside Obsidian, install the Smart Connections plugin. It uses embeddings to find notes related to what you are currently writing — even if they share zero keywords. Writing about “remote team communication” might surface your note about “async video updates” from three months ago.

The System

How the three tools connect

These are not three isolated apps. They form a loop:

Day	Tool	What You Do	Time
Sunday	Notion	Run the Weekly Reset prompt. Identify top 3 outcomes. Move tasks into Todoist.	15 min
Mon–Fri	Todoist	Capture tasks via Ramble. Work from Today view. Check off as you go.	2 min/day
Daily	Obsidian	Open daily note. Log energy, wins, and any lessons or ideas worth capturing.	5 min
Friday	Obsidian	Open weekly reflection. Dataview auto-pulls the week. Fill in reflections.	10 min
Month-end	Notion	Run the Monthly Goal Audit prompt. Decide what stays, what gets dropped.	20 min

Total weekly overhead: roughly 50 minutes. That is less than one episode of a TV show, and it replaces the low-grade anxiety of not knowing what you should be working on.

✓ The golden rule: Notion is where you think. Todoist is where you do. Obsidian is where you remember. Never mix them. The moment you start managing tasks in Notion or writing essays in Todoist, the system breaks.

Comparison

Where each tool wins — and where it breaks

Scenario	Notion	Todoist	Obsidian
Quick task capture on the go	Slow — too many pages	Best ✓ — Ramble + widget	Possible but friction-heavy
Weekly/monthly planning	Best ✓ — databases + AI	Limited views	Manual without Dataview
Knowledge management	Good but siloed	Not designed for this	Best ✓ — links + graph
Team collaboration	Best ✓	Good for shared projects	Single-player only
Offline / data ownership	Cloud only	Cloud only	Best ✓ — local files
AI features (2026)	Custom AI blocks	Assist + Ramble	Smart Connections plugin

Summary

Key takeaways

Match the tool to the problem, not the hype

Notion for planning, Todoist for doing, Obsidian for remembering. Each tool does one thing extremely well. Forcing any of them to do all three is how systems collapse.

Use AI prompts to eliminate blank-page friction

The Sunday reset prompt, the meeting debrief prompt, the voice brain dump — every one of these removes the hardest part of organization: starting. Pre-load them as templates so the system runs itself.

Start with two areas, not your whole life

Pick work and one personal area (health, finances, a side project). Build those habits for three weeks. Expand once the rhythm is automatic. Trying to organize everything on day one is how you end up with 47 unused Notion pages.

Reflection is the compounding engine

The Friday reflection note is where all of this pays off. Without it, you are just moving tasks around. With it, you are learning from your own patterns — what drains your energy, what you keep avoiding, what actually moves the needle.

Quick Reference

Cheat sheet — prompts and templates at a glance

Notion

Sunday Weekly Reset

Top 3 outcomes ranked by impact, one stalled project with a specific unblock action, one neglected personal goal. Output as a priority table. Takes 15 minutes and replaces Sunday-night dread with Monday-morning clarity.

Weekly cadence Custom AI Block High impact

Notion

Meeting Debrief → Action Items

Paste rough notes, extract action items with owners and dates, flag undocumented decisions. Turns 15 minutes of post-meeting cleanup into 30 seconds.

After every meeting Checklist output

Todoist

Voice Brain Dump via Ramble

Speak your tasks naturally — Ramble parses dates, priorities, and projects from unstructured speech. 38 languages, zero typing. Best used while commuting or walking.

Voice input On the go Pro plan required

Todoist

Natural Language Filter

Describe the view you want in plain English. Filter Assist generates the syntax. Replaces the need to learn Todoist’s query language.

Filter Assist One-time setup

Obsidian

Friday Weekly Reflection

Auto-generated note with Dataview table of the week’s entries. Fill in wins, failures, and next week’s one big thing. Five minutes of reflection that compounds over months.

Dataview Templater Periodic Notes

Obsidian

Life Maintenance Log

Track HVAC filters, oil changes, appliance repairs with dates and costs. Dataview auto-generates a history per category. Never wonder “when did I last do this?” again.

Dataview Searchable history

Vijay Gokarn

The post 3 Plugins That Actually Organize Your Life — Notion, Todoist & Obsidian in 2026 appeared first on Vijay Gokarn.

AI Pre-Trade Analyzer

Vijay Gokarn — Sun, 26 Apr 2026 21:38:01 +0000

AI Pre-Trade Analyzer — Powered by Claude

Tool Type

Browser-side AI Agent

Model

Claude Sonnet 4

Scores

12 Dimensions

Stack

HTML · JS · Anthropic API

Security: Ticker accepts A–Z letters only (1–6 chars). API key must match sk-ant-... format. Your key is never stored — sent only to api.anthropic.com via CSP-restricted connection. No third-party scripts.

📊Pre-Trade Analysis

Ticker A–Z only, 1–6 letters

Anthropic API Key Must start with sk-ant-

NVDAclaude-sonnet-4 · 8 sections + 12 scores

Act as a professional trader and give me a concise pre-trade analysis for NVDA for a potential LONG position. 1. Trend & Momentum · 2. Bollinger Bands 3. Key Levels · 4. ATR & Risk 5. Volume & Structure · 6. Catalysts 7. Trade Plan · 8. Multi-Day Outlook Verdict: “Good entry now” OR “Wait for pullback to $___” OR “Avoid — reason” SCORES (each 1–10, 10 = most favorable for long entry): Trend Direction Score: X/10 RSI Score: X/10 Bollinger Position Score: X/10 Bollinger Volatility Score: X/10 Breakout vs Pullback Score: X/10 Volume Confirmation Score: X/10 Key Levels Score: X/10 ATR Risk Score: X/10 Catalyst Score: X/10 Trade Setup Score: X/10 Outlook Score: X/10 Overall Score: X/10

Claude is building analysis + 12-score scorecard…

Claude’s verdict ·

📈Trade Setup Scorecard

—/10

Context

Why build a Claude-powered trade analyzer?

Most retail traders spend 20–30 minutes pulling together trend data, Bollinger Bands, RSI, ATR, and catalyst checks before every trade. This tool collapses that workflow into a single structured API call — returning a full 8-section analysis and 12-dimension scorecard in ~10 seconds.

Analysis sections

Scores generated

~10s

Time to full output

Backend servers

Understanding the Analysis

Trading terms, how Claude agents work, and the prompt structure — all explained.

RSI score

Rates RSI position 1–10. RSI 40–60 = neutral (5–6). RSI <40 from oversold bounce = bullish (8–9). RSI >70 approaching overbought = bearish (2–3). Divergence from price adds weight.

Bollinger position score

Rates where price sits within the bands. Near lower band = high score (potential discount entry). Near upper band = low score (extended/overbought). Mid-band = neutral 5.

Bollinger volatility score

Rates band width and direction. Bands squeezing (narrowing) before a breakout = high score. Bands already wide and expanding = lower score — most of the move may already be priced in.

Breakout vs pullback score

Rates the structural entry quality. Clean pullback to support in uptrend = high score (favorable R/R). Chasing a breakout extension = low score. Range-bound with no clear setup = 5.

Volume confirmation score

Rates volume behavior. Declining volume on pullback (healthy retracement) = high score. Rising volume on breakout = high score. Breakout on low volume = low score — conviction is absent.

ATR & risk score

Rates the ATR-implied daily risk. Small ATR relative to entry distance = high score (tight stop viable). Large ATR requiring wide stop = lower score. A 14-day ATR of $6 means $9 stop minimum (1.5×).

Key levels score

Rates proximity and clarity of support/resistance. Entry near a clean, untested support = high score. Entry in open air with no nearby level = low score. Multiple confluent levels = highest score.

Catalyst score

Rates macro and event risk. No earnings, no FOMC in the next 2 weeks = high score. Earnings next week = low score (binary risk). Positive recent news flow = +1 bonus to score.

Trade setup score

Overall entry quality combining R/R ratio, trend alignment, and entry zone. 8–10 = high conviction long. 5–7 = proceed cautiously. Below 5 = wait or avoid entirely.

Overall score

Average of all 11 dimension scores. Above 7 = favorable setup. 5–7 = proceed cautiously. Below 5 = wait for better conditions. Use as a go/no-go gauge — not a guarantee.

Trend direction score

Rates the strength and clarity of trend on Daily and 1H charts. Strong uptrend with higher highs/lows on both timeframes = 9–10. Mixed or consolidating = 5. Downtrend = 1–3.

Outlook score

Rates the 3–10 day expected scenario quality. Strong bullish catalyst + clear path to target = 8–10. Uncertain macro backdrop = 5. Bearish macro + sector weakness = 2–4.

Step 01 — Input

You enter a ticker

JavaScript validates: ticker must match /^[A-Z]{1,6}$/, API key must match /^sk-ant-[A-Za-z0-9\-_]{10,}$/. Nothing is sent until both pass. Invalid characters are stripped on each keystroke.

↓

Step 02 — Prompt

Ticker injected into the expert prompt

The validated ticker replaces [TICKER] in a pre-written prompt with 8 structured sections plus a 12-score block. This is prompt engineering — forcing consistent, machine-parseable output from the model.

↓

Step 03 — API call

Browser → api.anthropic.com only

POST to api.anthropic.com/v1/messages. CSP header restricts connect-src to Anthropic exclusively. No third-party scripts, no proxy, no data logging on any intermediary server.

↓

Step 04 — Inference

Claude generates 8 sections + 12 scores

claude-sonnet-4 reads the full structured prompt, applies domain knowledge about technical analysis, and outputs all 8 analysis sections followed by a SCORES block with 12 numeric 1–10 ratings.

↓

Step 05 — Render

Parse → grouped scorecard + cards

JavaScript extracts all 12 score values via regex, groups them into 4 themed sections (Trend, Bollinger, Structure, Risk), builds DOM nodes with textContent (no innerHTML — XSS-safe), colors bars green/amber/red.

What makes this an AI Agent pattern?

An AI Agent receives a role, a multi-step task, and a required output contract — then executes autonomously. Here Claude is briefed as “professional trader”, given 8 structured sections plus a 12-score format, and returns a complete trade brief with ratings. Same architecture as Cursor, Perplexity, and enterprise copilots — model has a job, context, and output format. No memory or tool calls needed for this single-call agentic pattern.

Three prompt engineering techniques: role injection, numbered sections (parseable), and constrained score format (12 labeled scores make automated rendering trivial).

Act as a professional trader · Analyze [TICKER] for LONG position Sections 1–8: Trend · Bollinger · Key Levels · ATR Volume · Catalysts · Trade Plan · Outlook Verdict: “Good entry now” | “Wait for pullback to $___” | “Avoid — reason” SCORES (1–10): Trend Direction · RSI · Bollinger Position · Bollinger Volatility Breakout vs Pullback · Volume Confirmation · Key Levels · ATR Risk Catalyst · Trade Setup · Outlook · Overall

Why 12 scores instead of 8? The original 8 sections map to broad categories. The 4 new sub-scores — RSI, Bollinger Position, Bollinger Volatility, Breakout vs Pullback, Volume Confirmation — give you granular signal within the two most action-relevant sections (momentum and structure). A Bollinger section that says “near upper band, volatility expanding” is very different from “near lower band, bands squeezing” — the two sub-scores make that distinction visible at a glance.

Summary

Key takeaways

🧠

Prompt structure is everything

Numbered sections with sub-questions force structured expert reasoning. The 12-score format makes automation trivial. This is the most important applied AI skill.

📊

Sub-scores reveal nuance

Bollinger Position and Bollinger Volatility tell different stories. RSI at 68 vs RSI at 45 have opposite implications. Granular scores surface signal that section-level summaries hide.

🔒

Security by design

Input validation, textContent rendering (no XSS), CSP restricting connections to Anthropic only, and zero third-party scripts — non-negotiable even for educational tools.

⚠️

Always verify the live price

Claude has a training cutoff. Treat output as a structured checklist to verify against your live chart — not a Bloomberg terminal substitute or a trading signal.

🤖

One prompt can be an agent

No LangChain, no pipelines. Encoding role + 8-step task + 12-score output contract into a single prompt is the same architecture behind most real-world copilots.

🏗️

The foundation to extend

Add a live market data API (Polygon.io), brokerage execution (Alpaca), and persistent watchlist storage — this prompt + parse + render architecture scales directly.

Interview Prep

Cheat sheet — quick definitions

Define

What does the Bollinger Position score measure?

It rates where price sits relative to the three Bollinger Band lines. Near the lower band = high score (potential discount entry, mean reversion likely). Near the upper band = low score (extended, overbought risk). At the 20-day MA middle band = neutral 5. The Volatility score is separate — it rates whether the bands are expanding (breakout possible) or squeezing (setup forming).

Lower band = high scoreUpper band = low scoreSeparate from volatility

Define

What does the RSI score represent and what are the thresholds?

The RSI score translates the raw RSI reading into a 1–10 entry quality rating. RSI 30–45 bouncing upward = 8–9 (oversold recovery). RSI 45–60 in neutral zone = 5–6. RSI 65–70 approaching overbought = 3–4. RSI above 70 = 1–2 (caution, extended). RSI divergence — price making new high but RSI declining — overrides the raw number and caps the score at 3.

30–45 = 8–945–60 = 5–670+ = 1–2

Compare

Breakout vs pullback score — how does each scenario score?

A clean pullback to tested support in an established uptrend scores 8–9: your stop is tight, target is far, R/R is excellent. A fresh breakout above resistance with strong volume scores 7–8 if volume confirms. Chasing a breakout that has already extended 5%+ from the breakout level scores 3–4: your stop must be wide, R/R deteriorates. A range-bound setup with no clear directional bias scores 5.

Pullback to support = 8–9Fresh breakout = 7–8Chasing = 3–4

Define

What does Volume Confirmation score measure?

It rates whether volume behavior supports the trade thesis. Declining volume on a pullback = bullish (sellers are exhausted, healthy retracement) → score 8–9. Rising volume on a breakout = conviction confirmed → score 8–9. Rising volume on a down day during a pullback = distribution risk → score 3–4. Low volume across all bars = no conviction either way → score 5.

Declining vol pullback = 8–9Breakout + volume = 8–9Low vol = 5

Gotcha

Why can’t you blindly trust Claude’s specific price levels?

Claude has a training data cutoff — it does not see live ticks, real-time order books, or today’s chart. Price levels ($105, $112, etc.) are approximations from training patterns, not live market data. Always cross-reference every specific price level against your live charting platform before acting. Use the analysis as a structured reasoning checklist — not a Bloomberg terminal substitute.

Training cutoff ≠ live dataVerify all price levelsFramework, not signal

Weakness

What are the three hard limits of this tool?

(1) No live data — cannot see today’s candles, volume profile, or order flow; (2) No execution — cannot place orders, set broker alerts, or connect to a trading platform; (3) Stateless — no memory between sessions, no watchlist persistence. To extend: add Polygon.io for live prices, Alpaca for execution, and localStorage or a backend for watchlist persistence.

No live dataNo executionStateless

Built for vijay-gokarn.com · Powered by Claude API · Educational use only — not financial advice

The post AI Pre-Trade Analyzer appeared first on Vijay Gokarn.

From Amazon Reviews to Numbers: A Hands-On Tour of One-Hot, Bag of Words, and TF-IDF

Vijay Gokarn — Sat, 11 Apr 2026 15:02:33 +0000

NLP · Machine Learning · Text Feature Engineering

From Amazon Reviews to Numbers: A Hands-On Tour of One-Hot, Bag of Words, and TF-IDF

Corpus128 real reviews

TechniquesOHE · BoW · TF-IDF

StackPython · sklearn · BeautifulSoup

SourceGitHub ↗

How I took 128 real Amazon product reviews and turned them into features a machine-learning model can actually chew on — and what I learned about where these classical techniques still shine in 2026.

Context

Why bother with “classical” text features at all?

If you have been anywhere near an LLM in the last two years, you have probably heard that “embeddings solved text.” They did — for a lot of problems. But if you are building a spam filter with 100k labelled examples, a BM25-powered search box, a cold-start classifier for a brand-new product line, or a compliance-audited system where a human needs to understand why the model fired — then Bag of Words and TF-IDF are still in the toolbox.

They are fast, deterministic, interpretable, and an honest baseline you should always beat before reaching for a neural model.

Step 1

Get real data — not toy sentences

Every blog post on TF-IDF uses the same three cooked-up sentences about cats and dogs. I wanted the messiness of real user-generated content, so I wrote a BeautifulSoup scraper across ~20 popular ASINs — Echo Dots, AirPods Pro, Kindles, an Apple Watch, a Ninja blender, a PS5 controller, a Nespresso machine, and so on.

128

Real Reviews

Products

3,461

Unique Tokens

Scraper gotchas: Set a real User-Agent header or Amazon returns a stripped page. Anchor on [data-hook="review-body"] inside celwidget blocks — not the div[data-hook="review"] wrapper on the dedicated reviews page. A few reviews came back in Spanish and Arabic — a lovely reminder that real data never matches the shape your slides promised.

Step 2

Clean the text — the boring part that matters most

A review like “I LOVE it!!! Sound is 🔥. Read more” is not something a counting-based model can work with. Each cleaning step kills a specific kind of noise:

Step	What it kills	Why it matters
Lowercase	LOVE vs love	Avoids vocabulary duplicates
Drop “Read more”	Amazon truncation marker	Otherwise becomes one of the most frequent tokens
Strip punctuation / digits	!!!, $199	They rarely help classical models
Tokenize	—	Gives you units to count
Remove stopwords	the, and, is	Appear in every document → no signal
Lemmatize	speakers → speaker	Tightens the vocabulary

After processing: 11,138 tokens spanning a 3,461-word vocabulary. Top words were exactly the product-review clichés you would expect — use, one, like, great, noise, sound, quality — a perfect sanity check.

Step 3

Three ways to turn text into numbers

One-Hot Encoding

OHE · Binary presence

For each review, build a binary vector over the whole vocabulary: 1 if the word appears, 0 otherwise. Simplest thing that works, easiest to explain to a non-technical stakeholder.

⚠ Throws away frequency — “amazing” once and ten times look identical.

Bag of Words

BoW · CountVectorizer

Same vector shape, but store actual counts. A review that hammers on “sound” three times ranks differently from one that drops the word once. Frequency-aware.

⚠ Still order-blind — “not good, very bad” ≈ “good, not very bad”.

TF-IDF

TfidfVectorizer · The trick

Take the BoW count and divide by how common the word is across the whole corpus. Generic words like “good” get pushed toward zero. Rare, distinctive words like “cancellation” stay loud.

✓ Best signal for downstream classifiers.

TF-IDF Formula tfidf(t, d) = tf(t, d) · log( N / (1 + df(t)) )

In my corpus, the highest-IDF words were exactly the long-tail product features that appeared in just one review. The lowest-IDF words were the generic review vocabulary. That is the whole story of TF-IDF in one experiment.

Step 4

The “aha” moment — one review, three lenses

Encode the same review three times and print the top-weighted tokens:

OHE just lists every unique word in the review. No ranking.

BoW surfaces the most repeated words — almost always filler like one, like, use.

TF-IDF surfaces the words this review says that few others do. That is exactly what a downstream classifier wants to see.

Once you have seen this side-by-side even once, you stop reaching for plain BoW unless you have a very specific reason. (Naive Bayes is one — its underlying math prefers raw counts.)

Step 5

Sparsity — the thing nobody warns you about

Every one of my three matrices came out ~98.15% zero. That is normal — reviews are short, vocabularies are long, and most words do not appear in most documents. Two huge practical implications:

Never store these dense. A 1-million-document × 200k-vocab corpus is a 200-billion-cell matrix. It must live in CSR or equivalent compressed form.

Classical pipelines do not scale forever. Once you are in the tens-of-millions-of-documents range, even sparse storage becomes painful — which is one reason industry moved to dense embedding pipelines for web-scale retrieval.

Step 6

A mini sentiment classifier — and a class imbalance lesson

4–5 star = positive, 1–2 star = negative, 3-star dropped. Two models per feature set: Logistic Regression with class_weight="balanced" and Multinomial Naive Bayes.

Headline accuracy looks great — ~97% on the test split. But the test split has 31 positives and 1 negative. The interesting metric is recall on the negative class, and with only five one-star reviews in the whole corpus, no model is going to learn that cleanly. Amazon surfaces highly-rated reviews first, so any pipeline that scrapes top-of-page reviews inherits the same lopsided distribution.

TF-IDF gives Logistic Regression a small, consistent edge by silencing filler words.

Naive Bayes prefers raw BoW counts — rescaling with IDF can actually hurt it.

Never trust a single accuracy number on imbalanced data. Always print per-class precision/recall.

Step 7

Where these techniques break — and where they still win

Scenario	BoW / TF-IDF	Embeddings
Semantic similarity “audio excellent” vs “sound great”	Zero shared tokens → fails	Maps synonyms close ✓
Negation “battery lasts” vs “battery dies”	Near-identical vectors → fails	Directional context ✓
Interpretability	Each feature is a word ✓	1024-dim black box
Training speed	Millions of docs, minutes, laptop ✓	GPU required at scale
Exact keyword / ID retrieval	BM25 still wins ✓	Can miss rare tokens
Cold start (zero labels)	Cosine sim on day one ✓	Needs fine-tuning data

Summary

Key takeaways

Preprocessing is 80% of the game

Before you touch any encoder, understand exactly what “a token” means in your corpus. Lowercase, stopwords, lemmatization — each step has a specific purpose.

Always inspect a single document’s top features

It is the fastest way to develop intuition about what your encoding is actually rewarding. Print OHE vs BoW vs TF-IDF side-by-side at least once.

Watch sparsity and class imbalance

Both will bite you long before modelling choices do. Use CSR storage. Never trust a single accuracy number on skewed data — always check per-class recall.

Know why you would pick the classical tool

If your answer is only “because it is in every tutorial”, reach for an embedding model. If your answer is “interpretability and speed” — BoW/TF-IDF are still excellent choices.

Interview Prep

Cheat sheet — quick definitions to remember

Define
What is One-Hot Encoding in NLP?

Binary presence vector over the vocabulary. 1 if the word appears in the document, 0 otherwise. No frequency, no order. Size = vocabulary length.

Binary: 0 or 1 Ignores frequency Simplest encoder

Define
What is Bag of Words?

Word count vector over the vocabulary. Stores how many times each word appears. Frequency-aware but order-blind — treats a document as an unordered bag of tokens.

Counts, not binary Order-blind CountVectorizer in sklearn

Define
What is TF-IDF and why does it outperform BoW?

Term Frequency × Inverse Document Frequency. Scales BoW counts down for words that appear in many documents. Words like “good” that are everywhere get suppressed; rare words that are distinctive get amplified. Formula: tf(t,d) · log(N / (1 + df(t)))

Rewards rarity Penalises ubiquity TfidfVectorizer

Compare
When would you use BoW over TF-IDF?

Use raw BoW counts with Naive Bayes — its probability estimates are count-based; IDF rescaling can hurt it. Otherwise, TF-IDF almost always gives a better signal for classifiers.

Naive Bayes → BoW Logistic Regression → TF-IDF

Gotcha
What is sparsity and why does it matter?

A BoW/TF-IDF matrix is typically 95–99% zeros because documents are short and vocabularies are large. Always store in sparse format (CSR) — a dense matrix of 1M docs × 200k vocab = 200B cells, which won’t fit in RAM.

98% zeros = normal Always use CSR format

Weakness
What can’t BoW/TF-IDF do that embeddings can?

They are lexical, not semantic. “Audio is excellent” and “sound is great” share zero tokens → zero similarity. “Battery lasts” and “battery dies” share most tokens → high similarity. Embeddings fix both by mapping meaning, not just words.

No synonyms No negation Use embeddings for semantics

Use Case
When do classical methods still win in 2026?

4 scenarios where BoW/TF-IDF beat neural alternatives: (1) exact-match / keyword search — BM25 still outperforms embeddings for identifier queries; (2) interpretability requirements; (3) training speed at millions of documents on a laptop; (4) cold-start with zero labelled data.

BM25 search Interpretability Cold start Speed

The post From Amazon Reviews to Numbers: A Hands-On Tour of One-Hot, Bag of Words, and TF-IDF appeared first on Vijay Gokarn.

The GenAI Landscape: From Zero to Transformer Series name: GenAI Mastery Series — Chapter 02

Vijay Gokarn — Sun, 29 Mar 2026 14:48:39 +0000

GenAI Mastery Series · Chapter 02 · March 28, 2026

Coding Assistants, the AI/ML Roadmap, and How Machines Learn to Understand Language

Read~14 min

SessionMarch 28, 2026

TopicsNLP · Embeddings · Transformers · Tools

Three Pillars AI Coding Assistants AI/ML Family Tree RNN → Transformer Encoding & Embeddings GenAI Tool Stack Career Paths Build Path Interview Prep

If you’ve ever wondered what it actually takes to go from “I know some Python” to “I build AI-powered applications for a living” — this chapter maps out the entire journey. From the complete AI/ML family tree to the fundamental concept that makes all of modern NLP possible: teaching machines to understand the meaning of words.

Foundation

The three pillars of this course

Before diving into any specific technology, understand the structure. This course is built on three pillars, each supporting the next. Think of it as a building: Python is the foundation, ML/DL is the structure, and GenAI is the penthouse. You can’t skip floors.

Python App Dev

The Foundation

Building real applications, Git, VS Code, practical coding. You need hands that can build things before you can build AI things.

ML / DL / NLP / CV

The Structure

Classical ML, deep learning, NLP, and computer vision theory. The brain — the conceptual foundation everything else sits on.

Generative AI

The Destination

Transformers, LLMs, RAG, fine-tuning, agents, LLMOps. Where the industry is heading and where the jobs are.

Practical Takeaway: You don’t need to master classical ML before touching GenAI, but you do need to be comfortable with Python and understand the basics of how models learn. Run all three tracks in parallel — build all three muscles simultaneously.

Tooling

AI coding assistants — your new pair programmer

In 2026, writing code without an AI assistant is like writing a document without spell-check. The industry has standardized around a few key tools.

GitHub Copilot

Most widely adopted. Built into VS Code and PyCharm. Free tier includes GPT-4.1, GPT-4o, GPT-4.5. Paid tier ($10/mo) unlocks Opus-6.5 and GPT-5.3 for complex reasoning and multi-file tasks.

Claude Code

Anthropic’s coding assistant integrated directly with VS Code. Strong performance on code understanding and generation, especially for complex reasoning tasks.

OpenAI Codex

OpenAI’s dedicated code generation engine. Less of a daily-driver IDE plugin; powers many code-generation features across the ecosystem.

Cursor / Anysphere

AI-native code editors that rethink the entire IDE experience rather than adding AI as a plugin. Worth experimenting with as you advance.

Recommended Setup: Start with VS Code + GitHub Copilot free tier — covers 90% of what you’ll need. Experiment with Claude Code for strong reasoning. Upgrade Copilot only when free models aren’t keeping up.

Big Picture

The complete AI/ML family tree

At the highest level, AI splits into three research branches — each with its own tools, techniques, and career paths.

Branch	Core Libraries	Specializations	Best For
Machine Learning	Pandas, NumPy, Scikit-learn	Decision Trees, SVMs, Ensemble Methods, EDA, Feature Engineering	Structured tabular data, classical classification/regression
Deep Learning	PyTorch, TensorFlow, Keras	CNNs (Vision), RNNs (Sequences), GANs (Synthesis), DRL (Agents)	Images, text, audio, generative models
Reinforcement Learning	Stable Baselines, Ray RLlib	Q-Learning, PPO, RLHF (LLM fine-tuning)	Games, robotics, LLM alignment

NLP History

From RNNs to Transformers — the five-step revolution

This is the story that matters most for understanding GenAI. A story of limitations breeding innovation. Understanding this progression is non-negotiable for anyone working in GenAI — it explains why modern architectures are designed the way they are.

~2014–2016

Step 01

RNNs — Sequential Processing

Processed text one word at a time, passing a hidden state forward. Could handle sequences but struggled badly with long-range dependencies — by the end of a long paragraph, the model had largely forgotten the beginning.

~2018–2019

Step 02

LSTM & GRU — Memory Gates

Added memory gates that could selectively remember and forget. Solved the vanishing gradient problem, but processing was still painfully sequential — you couldn’t parallelize training effectively.

~2014–2016

Step 03

Encoder-Decoder — The Context Vector

Compress the entire input into a fixed-size numerical representation (the context vector), then decode that into output. This is what made machine translation actually work.

2017

Step 04 — The Breakthrough

Transformers — “Attention Is All You Need”

Removed the sequential bottleneck entirely. Instead of reading one word at a time, transformers process all words simultaneously using self-attention — every word in a sentence directly attends to every other word.

2020–Today

Step 05 — Where We Are

LLMs, SLMs & Multimodal LLMs

Scale the transformer to billions of parameters, train on internet-scale data, and you get GPT-4, Claude, Llama, and their peers. SLMs run on-device; Multimodal LLMs understand text, images, audio, and more.

Core Concept

Encoding, embeddings & tokenization — making machines read

This is arguably the single most important concept in all of NLP. Computers understand numbers. Humans understand words. Encoding and embedding are the bridge — and how well you build that bridge determines how well your AI understands language.

The Pipeline — when you type into an LLM:

1. Tokenize — Break sentence into pieces. “unbelievable” → [“un”, “believ”, “able”] (BPE / WordPiece / SentencePiece)

2. Encode — Map each token to a numerical ID from a vocabulary table. “cat” = 4523. Arbitrary — carries no meaning.

3. Embed — Map each ID to a dense learned vector. Now “cat” is [0.23, -0.51, 0.87, …] — a point in high-dimensional space where similar concepts cluster together.

Encoding

Arbitrary integer mapping

Assigns a random number to each token
“king” = 42, “queen” = 7891 — look completely unrelated
Single integer output
Static lookup table — not trained
Analogy: giving every student a random ID badge number
Does not capture meaning

Embedding

Learned dense vector

Assigns a meaningful vector trained by a neural network
“king” and “queen” end up near each other in vector space
768 to 4096+ dimension vector output
Trained — learned from data
Analogy: placing students on a campus map by major, interests, and friend group
Captures semantic meaning ✓

Once you have good embeddings, entirely new capabilities emerge. Semantic search becomes possible — instead of matching keywords, you match meaning. A search for “I’m hungry and want something cheesy” can return results about pizza even if the word “pizza” never appears in the query.

Ecosystem

The GenAI tool stack — 10 frameworks you’ll need

The modern GenAI engineer’s toolkit, in the order you’ll typically encounter them.

#	Tool	What It Does	When to Add It
01	PyTorch	The dominant deep learning framework. Most LLM research and production code runs on it.	Day one
02	Hugging Face	Model hub and library ecosystem — tokenizers, transformers, datasets. Think “npm for ML”.	Day one
03	Unsloth	Optimized fine-tuning library. Makes training LLMs dramatically faster and cheaper.	When fine-tuning
04	LangChain	Framework for building LLM apps with chains, agents, memory, and tool integration.	When building apps
05	LlamaIndex	Specialized for RAG pipelines — connects your private data to LLMs.	When building RAG
06	LangGraph	Builds stateful, multi-step agent workflows as directed graphs.	When building agents
07	VDB / Cloud	Vector databases (Pinecone, Weaviate, pgvector) and cloud infrastructure.	When scaling
08	OpenAI SDK	Standard API pattern for LLM interaction — most providers mirror this interface.	Day one
09	Guardrails	Safety and validation layer ensuring LLM outputs meet business rules and constraints.	Before production
10	MCP	Model Context Protocol — standardized way to connect LLMs to external tools and data.	When connecting tools

Pro Tip: Start with PyTorch + Hugging Face for understanding models, add LangChain when you start building apps, layer in the rest as your projects demand them.

Career

Where this knowledge takes you

StartData Analyst / BA

→

Data Engineer

→

Data Scientist

→

MLE / MLOps

→

SeniorDL Engineer

AI Architect

Designs end-to-end AI systems and makes technology choices across the stack.

AI Product Manager

Bridges business strategy and AI capabilities. No-code path into the space.

AI Engineer

Builds and integrates AI features into products. The generalist role.

GenAI Engineer ★

Specializes in LLM-powered applications. Strongest demand right now.

Agentic AI Engineer ★

Builds autonomous multi-step agent systems. The frontier role.

Techno-Functional

Combines deep domain expertise with AI skills. High leverage in enterprise.

Build Path

From learning to shipping

Theory → Base

Encodings, embeddings, transformers, LLMs, SLMs, multimodal. Your conceptual foundation.

Interview Ready

Explain concepts clearly, discuss trade-offs. If you can teach it, you understand it.

Applied Skills

Fine-tuning, RAG, agentic AI, LLMOps, vector DBs, cloud deployment, MCP integrations.

The Build Cycle

POC → MVP → Full Dev → Deployment → Scalable App. AI coding assistants compress every stage.

Phase 1

POC

Does this idea even work? Quick, dirty validation.

Phase 2

MVP

Smallest version that delivers real value.

Phase 3

Full Dev

Production-quality code, tests, documentation.

Phase 4

Deployment

CI/CD, monitoring, scaling infrastructure.

Phase 5

Scalable App

Real traffic, cost optimization, feedback iteration.

Interview Prep

Cheat sheet — quick definitions to remember

Define
What is tokenization?

Breaking text into units a model can process. Tokens may be whole words, subwords, or characters. “unbelievable” → [“un”, “believ”, “able”]. Methods: BPE, WordPiece, SentencePiece.

BPEWordPieceSubword units

Define
Encoding vs Embedding — what’s the difference?

Encoding maps tokens to arbitrary integers (lookup table, no meaning). Embedding maps tokens to dense learned vectors where similar concepts cluster. Encoding is a student ID; embedding is placing that student on a map by personality and interests.

Encoding = integerEmbedding = learned vector768–4096 dims

Explain
Why did Transformers replace RNNs?

RNNs are sequential — they process one token at a time and forget long-range context. Transformers use self-attention to process all tokens simultaneously, letting every word attend directly to every other. This removes the sequential bottleneck and enables parallelization.

Self-attentionParallel processingNo vanishing gradient

Compare
Keyword search vs semantic search

Keyword search matches exact tokens. Semantic search matches meaning using vector similarity. Query “I’m hungry and want something cheesy” can retrieve pizza results even if “pizza” doesn’t appear. Most modern systems combine both (hybrid search).

Keyword = exact matchSemantic = vector similarityHybrid = best of both

Explain
What is RLHF and why does it matter?

Reinforcement Learning from Human Feedback — humans rate model outputs, those ratings become reward signals, and the model is fine-tuned to maximize human preference. This is how raw language models become aligned, helpful assistants. It’s the key step between a pretrained LLM and ChatGPT/Claude.

Human ratings → rewardRL fine-tuningAlignment technique

Define
What is RAG?

Retrieval-Augmented Generation — instead of relying only on training data, the model retrieves relevant documents from an external knowledge base at inference time and uses them as context. Powered by embeddings and vector databases. Keeps LLMs accurate on private or recent data without retraining.

Retrieve → Embed → Generatepgvector / PineconeLlamaIndex

Name
The 5-step NLP evolution in order

RNN → LSTM/GRU → Encoder-Decoder → Transformer → LLM. Each step solved the prior step’s core limitation: long-range forgetting, vanishing gradients, fixed context vectors, sequential bottleneck, scale.

RNNLSTM/GRUEnc-DecTransformerLLM

Action Items

Pre-flight checklist

Dashboard access

Log into the course platform and verify you can access all session materials.

Shared resources bookmarked

Google Sheet, GitHub repo, or Notion workspace from the session.

Python installed and verified

Run python --version in your terminal. Any 3.10+ is fine.

VS Code + GitHub Copilot configured

Install, authenticate, and test with a quick code completion. Or use Claude Code if you prefer.

Baseline ML/DL/NLP familiarity

Or a concrete plan to learn alongside. You don’t need to be an expert — you need a foundation to build on.

The post The GenAI Landscape: From Zero to Transformer Series name: GenAI Mastery Series — Chapter 02 appeared first on Vijay Gokarn.

Creating AI Storytelling Agents Using Flowise: A Step-by-Step Guide

Vijay Gokarn — Fri, 24 Jan 2025 16:51:01 +0000

GenAI Mastery Series · Agentic AI · Flowise Walkthrough

Building an AI Storytelling Agent with Flowise — No Code Required

StackFlowise · OpenAI GPT-4 · Supervisor/Worker Nodes

DeploymentLocal · Cloud-ready

OutputIBM the Robot’s Marshmallow Party

Concepts Covered AI Agents Flowise Workflows Supervisor / Worker Pattern ChatOpenAI Node No-Code Orchestration Prompt Engineering

In today’s AI landscape, agents are becoming powerful tools to automate complex tasks — from chatbots to interactive storytelling. Flowise is a no-code AI workflow builder that makes it easy to design, deploy, and manage AI agents for a wide range of applications. This walkthrough builds a fully functional storytelling agent, locally deployable and cloud-ready.

Concepts

What are AI agents in Flowise?

AI agents in Flowise are intelligent modules that can handle tasks autonomously by combining logic, AI models, and external tools. They process inputs, make decisions, and generate tailored outputs — without manual intervention at each step.

In this project we use the Supervisor and Worker node pattern with OpenAI Chat. The supervisor coordinates the overall workflow; worker nodes each own a specific sub-task — here, storytelling and title assignment.

Why multi-agent? Splitting responsibilities between nodes keeps each prompt focused and small. A dedicated Storyteller node generates better stories than one giant prompt trying to write a story, title it, and format it all at once. This mirrors how real engineering teams work — one job per role.

Step 1

Setting up Flowise

Flowise runs as a local Node.js server you access through a browser-based canvas. Two commands are all you need to get started.

Terminal — Install & Run

npm install flowise
npx flowise run

Once running, open your browser, log in, and click “New Workflow” to open the interactive canvas. You’ll drag, drop, and wire nodes together visually — no boilerplate code.

Local vs Cloud: The setup above runs entirely on your machine. For cloud deployment, Flowise supports Railway, Render, and self-hosted Docker. The workflow JSON is portable — build locally, deploy anywhere.

Step 2 — Workflow Design

Building the agent — node by node

Chat Model Node

ChatOpenAI — The Brain

Drag a ChatOpenAI Node onto the canvas and connect it as the model backend for all worker nodes. Configure GPT-4 with elevated temperature for imaginative outputs.

Model: GPT-4 Temperature: 0.9 Max Tokens: 400–500

Worker Node · Role: Storyteller

Storytelling Agent

Add a Worker Node and connect it to the ChatOpenAI node. Set its role as the Storyteller. This node owns the core creative generation task — it receives the theme prompt and writes the full story.

Worker Prompt You are a storyteller. Write a fun and engaging story for kids aged 5–8. The main character is a robot named IBM. Make it funny, magical, and include a twist. Limit the story to 400 words.

Worker Node · Role: Title Assigner

Title Assigner Agent

Add a second Worker Node downstream of the Storyteller. This node’s sole job is to extract a short, engaging title from the generated story — a focused single-responsibility task.

Worker Prompt Extract the title of the story you just created. Keep it short and engaging.

Format Prompt Values Node

Output Formatter

Use the Format Prompt Values Node to combine the story and title from the two worker nodes into a clean, structured output ready for display.

Title: {‘{‘}Title Extracted{‘}’} Story: {‘{‘}Generated Story{‘}’}

Chat Output Node

Chat Output — Delivery

Connect the formatted output to the Chat Output Node. This is the interface layer — the final assembled story and title are surfaced here for users to read, copy, or embed.

Displays story + title Embeddable chat widget

Step 3

Running the agent

With all nodes configured and wired together, save your workflow — name it something like “AI Storytelling Agent” — and hit Run. Enter a theme prompt or use the default storytelling instructions, and the agent pipeline fires automatically: ChatOpenAI powers the Storyteller worker, its output flows to the Title Assigner, both outputs merge in the Formatter, and the Chat Output displays the result.

Tip: Experiment with different temperature values. At 0.7 the stories are coherent but predictable. At 0.95 you get genuinely surprising plot twists — which for kids’ stories is exactly what you want.

Example Output

IBM’s Marshmallow Party

Generated Title

IBM the Robot and the Marshmallow Party

Once upon a time, in a land of giggles and sparkles, there lived a silly little robot named IBM. Now, IBM wasn’t your everyday robot who danced or painted; oh no! He was known as a “computer,” which is a magic box that helps people do all sorts of amazing things!

One sunny day, IBM decided to throw a party for all his robot friends. He said, “Let’s make it the best party ever! I’ll invite my buddy, Printer Pete, and my bestie, Codey the Coder!” But IBM accidentally ordered 1,000 bags of rainbow-colored marshmallows instead of snacks — and chaos ensued!

The party turned into a marshmallow-filled adventure with pillow fights, marshmallow towers, and lots of laughs. In the end, IBM and his friends agreed: whether it’s chips or marshmallows, any party is fun when friends are around!

ModelGPT-4

Temperature0.9

Word Count~150 words

Nodes Used5

Interview Prep

Cheat sheet — quick definitions to remember

Define
What is an AI agent?

An autonomous module that combines a language model, logic, and optionally external tools to complete a task without step-by-step human instruction. It receives a goal, plans sub-steps, and executes them independently.

AutonomousGoal-drivenTool-using

Explain
What is the Supervisor / Worker pattern?

A Supervisor node coordinates the overall workflow and delegates tasks to Worker nodes, each of which handles one focused sub-task. This mirrors microservices architecture — single responsibility per agent, composable into larger pipelines.

Supervisor = orchestratorWorker = specialistSingle responsibility

Define
What does temperature control in an LLM?

Randomness in token sampling. Low temperature (0.1–0.4) = deterministic, factual, conservative outputs. High temperature (0.8–1.0) = creative, surprising, occasionally incoherent. For storytelling, 0.9 hits the sweet spot of imaginative without losing coherence.

Low = deterministicHigh = creative0.9 for stories

Compare
No-code (Flowise) vs code-first (LangChain) — when to use which?

Use Flowise for rapid prototyping, demos, non-developer stakeholders, or when the workflow is straightforward and visual. Use LangChain / LangGraph in code when you need version control, CI/CD, complex branching, custom tool integrations, or production-grade observability.

Flowise = prototype fastLangChain = production

Explain
Why split story generation and title extraction into separate nodes?

Focused prompts outperform omnibus prompts. A prompt that must write a story, extract a title, and format output all at once tends to trade off quality across tasks. Separate nodes give each sub-task its own context window, model parameters, and success criteria — and makes each step independently testable and replaceable.

One node, one jobBetter qualityIndependently testable

Use Case
What other use cases suit a Flowise multi-agent setup?

Any pipeline with distinct sequential sub-tasks: customer support (intent classification → knowledge retrieval → response drafting), content pipelines (research → outline → write → SEO optimize), data workflows (extract → validate → transform → summarize).

Customer supportContent pipelinesData workflowsCode review agents

Gotcha
What are the main limits of no-code agent builders?

Three key limitations: (1) Observability — debugging visual workflows is harder than reading stack traces. (2) Version control — workflow JSON doesn’t diff cleanly in Git. (3) Custom logic — complex conditional branching, stateful memory, and custom tool integrations are much easier in code-first frameworks.

Hard to debugNo clean Git diffLimited branching

GenAI Mastery Series · Agentic AI · Flowise Walkthrough

Building an AI Storytelling Agent with Flowise — No Code Required

StackFlowise · OpenAI GPT-4 · Supervisor/Worker Nodes

DeploymentLocal · Cloud-ready

OutputIBM the Robot’s Marshmallow Party

Concepts Covered AI Agents Flowise Workflows Supervisor / Worker Pattern ChatOpenAI Node No-Code Orchestration Prompt Engineering

Concepts

What are AI agents in Flowise?

Step 1

Setting up Flowise

Flowise runs as a local Node.js server you access through a browser-based canvas. Two commands are all you need to get started.

Terminal — Install & Run

npm install flowise
npx flowise run

Once running, open your browser, log in, and click “New Workflow” to open the interactive canvas. You’ll drag, drop, and wire nodes together visually — no boilerplate code.

Step 2 — Workflow Design

Building the agent — node by node

Chat Model Node

ChatOpenAI — The Brain

Drag a ChatOpenAI Node onto the canvas and connect it as the model backend for all worker nodes. Configure GPT-4 with elevated temperature for imaginative outputs.

Model: GPT-4 Temperature: 0.9 Max Tokens: 400–500

Worker Node · Role: Storyteller

Storytelling Agent

Worker Node · Role: Title Assigner

Title Assigner Agent

Add a second Worker Node downstream of the Storyteller. This node’s sole job is to extract a short, engaging title from the generated story — a focused single-responsibility task.

Worker Prompt Extract the title of the story you just created. Keep it short and engaging.

Format Prompt Values Node

Output Formatter

Use the Format Prompt Values Node to combine the story and title from the two worker nodes into a clean, structured output ready for display.

Title: {‘{‘}Title Extracted{‘}’} Story: {‘{‘}Generated Story{‘}’}

Chat Output Node

Chat Output — Delivery

Connect the formatted output to the Chat Output Node. This is the interface layer — the final assembled story and title are surfaced here for users to read, copy, or embed.

Displays story + title Embeddable chat widget

Step 3

Running the agent

Example Output

IBM’s Marshmallow Party

Generated Title

IBM the Robot and the Marshmallow Party

ModelGPT-4

Temperature0.9

Word Count~150 words

Nodes Used5

Interview Prep

Cheat sheet — quick definitions to remember

Define
What is an AI agent?

AutonomousGoal-drivenTool-using

Explain
What is the Supervisor / Worker pattern?

Supervisor = orchestratorWorker = specialistSingle responsibility

Define
What does temperature control in an LLM?

Low = deterministicHigh = creative0.9 for stories

Compare
No-code (Flowise) vs code-first (LangChain) — when to use which?

Flowise = prototype fastLangChain = production

Explain
Why split story generation and title extraction into separate nodes?

One node, one jobBetter qualityIndependently testable

Use Case
What other use cases suit a Flowise multi-agent setup?

Customer supportContent pipelinesData workflowsCode review agents

Gotcha
What are the main limits of no-code agent builders?

Hard to debugNo clean Git diffLimited branching

The post Creating AI Storytelling Agents Using Flowise: A Step-by-Step Guide appeared first on Vijay Gokarn.

Long Context LLM Comparison

Vijay Gokarn — Mon, 19 Aug 2024 21:31:01 +0000

GenAI Mastery Series · Long Context LLMs · Deep Dive

Long Context LLMs — How They Work, How They Compare, and When to Use Which

Models CoveredGPT-4 · Claude 2 · Mistral · PaLM 2 · LLaMA 2

FocusContext Length · Architecture · Use Cases

128k

GPT-4 max tokens

100k

Claude 2 max tokens

32k

PaLM 2 max tokens

LLaMA 2 base tokens

A long context LLM can process and remember extended pieces of text or conversation history — maintaining continuity and coherence over longer interactions. This makes them particularly powerful for tasks that require understanding context across documents, extended dialogues, or complex multi-step reasoning.

Fundamentals

How long context LLMs actually work

Four core capabilities define what makes a model “long context” — and why it matters for real-world applications.

Extended Memory

These models hold a larger amount of text in working memory, allowing them to refer back to earlier parts of a conversation or document. Critical for maintaining context in complex, multi-turn discussions.

Context Awareness

The model uses extended context to provide more accurate and relevant responses, understanding nuances and how the conversation shifts over time — not just the last few exchanges.

Coherence

Long context LLMs strive to maintain logical coherence across many interactions, avoiding the contradictions and misunderstandings that arise in shorter-context models when earlier context is lost.

Broad Applications

Customer support, storytelling, technical support, legal document review, code review across large codebases — any scenario where understanding and maintaining context over time is critical.

What Matters

Three factors that define performance

Context Length

Longer context allows models to maintain coherence across larger chunks of text. But more tokens in context means more computational resources — there is always a trade-off between window size and speed.

Efficiency

Processing long contexts without a significant performance drop is crucial, especially for real-time applications. Architecture innovations like sliding window attention and sparse transformers directly address this.

Use Case Fit

Each model has specific strengths. Whether you need creative writing, technical documentation, ethical guardrails, multimodal capabilities, or open-source flexibility — the right model depends on the task.

Model Comparison

Five leading long context LLMs compared

OpenAI

GPT-4

128k tokens

Transformer · Proprietary

Strengths

Excellent at complex, coherent long-form text
Strong context retention across long conversations
Widely applicable — writing, coding, research
Largest ecosystem and third-party integrations

Challenges

Computationally intensive
Potential latency on very long inputs
Proprietary — no fine-tuning access

Best Use Cases

Writing Assistants Dialogue Systems Long Doc Summarization Complex Automation

Anthropic

Claude 2

100k tokens

Transformer · Safety-optimized

Strengths

Designed for ethical use and AI alignment
Coherent context over extended discussions
Strong on sensitive, high-stakes interactions
Excellent at processing entire documents at once

Challenges

Less widely tested than GPT-4 at time of release
Can be more conservative on edge cases

Best Use Cases

Conversational AI Content Moderation Legal / Compliance Summarization

Mistral AI

Mistral

Extended (varies)

Transformer · Efficient architecture

Strengths

Efficient long context with reduced compute overhead
Strong long-form content generation
Sliding window attention — better memory use
Open weights available for self-hosting

Challenges

Newer entrant — still gathering real-world benchmarks
Context length varies by variant

Best Use Cases

Narrative Generation Technical Docs Research Synthesis Self-hosted Apps

Google

PaLM 2

~32k tokens

Pathways Architecture · Multimodal

Strengths

Strong multilingual and multimodal performance
Deep integration with Google Search and Knowledge Graph
Excellent at translation and cross-lingual tasks
Contextually rich long-form generation

Challenges

Smaller context window than GPT-4 / Claude 2
Balancing multimodal vs long-context performance

Best Use Cases

Multilingual Tasks Translation Multimodal Apps Research Tools

Side-by-side quick reference

Model	Provider	Max Context	Open Source	Key Edge	Main Constraint
GPT-4	OpenAI	128k tokens	No	Best overall coherence, ecosystem	Compute cost, latency
Claude 2	Anthropic	100k tokens	No	Safety, alignment, ethical use	Less benchmark data vs GPT-4
Mistral	Mistral AI	Varies	Yes (weights)	Efficient compute, self-hostable	Newer — fewer benchmarks
PaLM 2	Google	~32k tokens	No	Multilingual, multimodal, Search integration	Smaller context window
LLaMA 2	Meta	4k base	Yes (fully open)	Customizable, runs on consumer hardware	Shortest base context

Bottom Line: GPT-4 leads for raw context management. Claude 2 wins where safety and ethical handling matter. Mistral and LLaMA 2 are the open-source options for teams that need full control. PaLM 2 is the pick for multilingual and multimodal workloads.

Interview Prep

Cheat sheet — quick definitions to remember

Define
What is a long context LLM?

A model with a large token window — the amount of text it can hold in memory and reason over at once. Longer windows allow maintaining coherence over extended documents or multi-turn conversations without losing earlier context.

Token window = memoryLonger = more coherentTradeoff: compute cost

Explain
What is a “token” and why does window size matter?

A token is roughly ¾ of a word (~4 characters). 128k tokens ≈ ~100,000 words ≈ a full novel. Window size determines how much of a document or conversation the model can “see” at once. Once context overflows the window, earlier information is lost.

~4 chars per token128k ≈ 100k wordsOverflow = forgetting

Compare
GPT-4 vs Claude 2 — when would you pick each?

Pick GPT-4 for breadth, ecosystem integrations, and the widest context window (128k). Pick Claude 2 when safety, ethical handling, or processing very large documents in one shot matters (100k tokens, strong alignment focus).

GPT-4 = breadth + ecosystemClaude 2 = safety + alignment

Gotcha
Why doesn’t bigger context always mean better results?

The “lost in the middle” problem — models tend to attend best to the beginning and end of a long context, with degraded recall in the middle. More tokens also means quadratic compute cost in standard attention, increasing latency significantly.

Lost in the middleQuadratic attention costLatency tradeoff

Use Case
When would you use LLaMA 2 over a proprietary model?

When you need data privacy (no external API calls), full customization (fine-tune on your own data), cost control (no per-token pricing), or you’re in a regulated industry that prohibits sending data to third-party vendors.

Data privacyFine-tuning controlNo API costRegulated industries

Define
What is RAG and how does it relate to context length?

Retrieval-Augmented Generation — instead of stuffing an entire knowledge base into the context window, you retrieve only the relevant chunks and inject them. RAG is often a better alternative to brute-force long context: cheaper, faster, and avoids the “lost in the middle” problem.

Retrieve → Inject → GenerateAlternative to long contextCheaper at scale

Name
Three applications where long context LLMs are essential

1. Legal / contract review — entire agreements must be held in context simultaneously. 2. Codebase analysis — understanding how functions across many files interact. 3. Medical record summarization — patient history spanning hundreds of pages must be synthesized in one pass.

Legal reviewCode analysisMedical recordsLong doc summarization

The post Long Context LLM Comparison appeared first on Vijay Gokarn.

Long Context LLM’s vs RAG

Vijay Gokarn — Mon, 19 Aug 2024 11:47:57 +0000

Retrieval-Augmented Generation. It is a method that combines the strengths of retrieval-based models and generative models to improve the performance and accuracy of AI systems, particularly in natural language processing tasks.

How RAG Works:

Retrieval Step: The system first retrieves relevant documents or pieces of information from a large corpus based on the input query. This retrieval process helps to bring in contextually relevant information that the generative model might need to generate a more accurate response.
Generation Step: After retrieving the relevant information, the generative model (often a large language model like GPT) uses this information as a basis to generate a coherent and contextually appropriate response.

Applications:

Question Answering: RAG models can be used to answer questions by retrieving relevant text from a knowledge base and then generating an answer based on that information.
Chatbots: In conversational AI, RAG models help to provide more accurate and context-aware responses by pulling in relevant information before generating a reply.
Content Creation: For generating content such as articles, reports, or summaries, RAG models can retrieve relevant data and then generate content that integrates this information effectively.

RAG models help in grounding the limitations of generative AI and removing the hallucinations from the responses.

Why We Needed RAG

Before the advent of long context LLMs, traditional language models had severe limitations in processing and understanding large amounts of text. This constraint hindered their ability to perform tasks like:

Summarizing lengthy documents
Answering complex questions requiring extensive knowledge
Generating text based on large datasets

RAG emerged as a solution to this problem. By retrieving relevant information from external knowledge bases, RAG could effectively expand the model’s access to information, improving its performance on these tasks.

Long Context LLMs New Kid on the block

With the development of long context LLMs, the landscape has changed significantly. These models can now process and understand much larger amounts of text directly, reducing the reliance on external knowledge sources.

Long Context LLMs

Core concept: Directly process and understand a larger amount of text within a single input.
Strengths:
- Can capture complex relationships within the text.
- Potentially better at understanding nuances and context.
Weaknesses:
- Limited by the maximum context window size.
- Can be computationally expensive for very long inputs

This has led to a debate about whether long context LLMs will render RAG obsolete. Here is comparison of long context LLM by major players

The Reality: A Complex Interplay

While long context LLMs are impressive, they are not a panacea. Here’s why:

Computational Costs: Processing extremely long contexts is computationally expensive and time-consuming.
Attention Limitations: Attention mechanisms, essential for long context models, can still struggle with capturing complex relationships within massive amounts of text.
Information Overload: Feeding an LLM with an overwhelming amount of information can lead to dilution of focus and potential hallucinations.

Therefore, RAG is not entirely obsolete. It still offers several advantages:

Efficiency: RAG can be more efficient in retrieving and processing specific information.
Scalability: RAG can handle virtually unlimited amounts of data.
Focus: By providing the LLM with targeted information, RAG can improve accuracy and reduce hallucinations.

In conclusion, the relationship between long context LLMs and RAG is complex and evolving. The optimal approach often involves a hybrid strategy, combining the strengths of both technologies. The specific choice depends on the task, the available resources, and the desired level of performance.

The post Long Context LLM’s vs RAG appeared first on Vijay Gokarn.

AWS Sagemaker Jumpstart and AWS Bedrock Choosing the Right AI Tool for Your Needs

Vijay Gokarn — Thu, 01 Aug 2024 11:19:23 +0000

AWS · Cloud AI · Service Comparison

SageMaker JumpStart vs Amazon Bedrock — Choosing the Right AWS AI Tool

ServicesSageMaker JumpStart · Amazon Bedrock

FocusUse Cases · Architecture · Decision Guide

StackAWS Cloud AI

SageMaker JumpStart

The Swiss Army Knife of Machine Learning

Amazon Bedrock

The Generative AI Powerhouse

AWS offers two powerful tools for businesses looking to leverage AI capabilities. While both services simplify AI adoption, they cater to fundamentally different needs. SageMaker JumpStart gives you control, customization, and the full ML lifecycle. Amazon Bedrock gives you immediate access to state-of-the-art foundation models with minimal setup. Knowing which to reach for is a core cloud AI skill.

Service Deep Dive

What each service actually does

Amazon SageMaker

JumpStart

Pre-built ML solutions + full customization control within the SageMaker ecosystem

ScopePart of the broader SageMaker ecosystem — notebooks, training, tuning, and deployment in one platform

ModelsWide range of pre-built ML solutions: image classification, object detection, text analysis, and more

CustomizationFine-tuning, transfer learning, and model retraining — you own the model weights

DeploymentSeamless SageMaker integration; supports both batch and real-time inference endpoints

User ProfileData scientists and ML engineers who want to customize models and build tailored solutions

Amazon

Bedrock

Fully managed foundation model API — access top GenAI models without infrastructure

ScopeStandalone service dedicated to generative AI — no ML infrastructure to manage

ModelsCurated foundation models from Amazon, Anthropic (Claude), AI21 Labs, Cohere, Meta, and Stability AI

CustomizationFine-tuning available on select models; RAG via Knowledge Bases; no direct model weight access

DeploymentAPI-first. Minimal setup to start generating text, images, or embeddings — serverless by default

User ProfileDevelopers and product teams integrating GenAI features fast — rapid prototyping and deployment

At a Glance

Side-by-side quick reference

Factor	SageMaker JumpStart	Amazon Bedrock
Primary Focus	Classical ML + customizable models	Generative AI via foundation models
Setup Complexity	Medium — SageMaker config needed	Low — API call to start
Model Ownership	Full — fine-tune and own weights	No — managed by providers
Customization Depth	Deep — transfer learning, retraining	Limited — fine-tuning + RAG
Data Privacy	Full control in your VPC	Data not used for model training
Inference Mode	Batch + real-time endpoints	Serverless API (on-demand)
Model Variety	Vision, NLP, tabular, forecasting	Text, image, embedding, multimodal
Best For	Tailored ML, regulated industries, data science teams	Rapid GenAI features, chatbots, content generation

Decision rule of thumb: If you’re asking “how do I add AI to my app fast?” — reach for Bedrock. If you’re asking “how do I build a custom model on my proprietary data?” — reach for SageMaker JumpStart.

Real-World Examples

Use cases — side by side

SageMaker JumpStart Examples

E-Commerce · Computer Vision

Product Image Classification

An e-commerce company needs to automatically categorize product images uploaded by sellers.

Deploy a pre-trained image classification model via JumpStart. Products are routed into categories — Electronics, Clothing, Home Appliances — with minimal setup. Model can be fine-tuned on proprietary category taxonomy.

Retail · NLP

Customer Sentiment Analysis

A company wants to analyze customer reviews at scale to understand satisfaction trends.

Deploy a pre-trained sentiment analysis model that classifies reviews as positive, negative, or neutral. Integrates into customer feedback pipelines — no model training required, fine-tuning available if needed.

Financial Services · Fraud Detection

Real-Time Transaction Fraud Detection

A financial institution needs to flag fraudulent transactions in real time.

Use a fraud detection solution template from JumpStart. Deploys a ready-to-use model that analyzes transaction patterns and flags suspicious activities for investigation — with a real-time inference endpoint.

Amazon Bedrock Examples

Healthcare · Predictive AI

Patient Readmission Prediction

A healthcare provider wants to predict patient readmissions based on historical patient data.

Use Bedrock to access a foundation model, fine-tune on proprietary patient records, and deploy a readmission prediction API. Bedrock’s managed infrastructure handles scale without the provider managing ML infrastructure.

Retail · Forecasting

Demand Forecasting Across Stores

A retail chain needs to forecast product demand across hundreds of locations.

Build a custom demand forecasting model via Bedrock, incorporating historical sales, seasonal trends, and promotions. Train, validate, and deploy at scale — Bedrock handles the infrastructure entirely.

Manufacturing · Computer Vision

Production Line Quality Control

A manufacturer wants to identify product defects using images from production lines in real time.

Develop a custom computer vision model in Bedrock trained on defective vs. non-defective product images. Deploy for real-time inspection — reducing defect rates without managing GPU infrastructure.

Interview Prep

Cheat sheet — quick definitions to remember

Define
What is Amazon SageMaker JumpStart?

A curated library of pre-built ML solutions and models within the SageMaker ecosystem. Lets you deploy, fine-tune, and retrain models for tasks like image classification, NLP, and fraud detection — with full control over the ML lifecycle.

Pre-built modelsFine-tunableSageMaker ecosystem

Define
What is Amazon Bedrock?

A fully managed API service that provides access to foundation models from multiple providers (Anthropic, AI21 Labs, Cohere, Stability AI, Amazon). No ML infrastructure to manage — you call an API and get generative AI capabilities immediately.

Managed FM APIMulti-providerServerless

Compare
JumpStart vs Bedrock — when do you pick each?

Pick JumpStart when you need to customize a model on your own data, require full model weight ownership, or are building classical ML pipelines (vision, tabular, NLP). Pick Bedrock when you need GenAI features fast, want managed infrastructure, or are integrating LLMs into an application.

JumpStart = custom MLBedrock = GenAI fast

Explain
What is a Foundation Model (FM)?

A large model pre-trained on broad data that can be adapted for many downstream tasks. Foundation models (GPT-4, Claude, Llama) are trained once at massive scale and then fine-tuned or prompted for specific use cases. Bedrock provides access to these FMs as managed APIs.

Pre-trained at scalePrompt or fine-tuneClaude, Llama, Titan

Gotcha
Can you use RAG with Bedrock? How?

Yes — Bedrock Knowledge Bases lets you connect S3 data sources to an FM. Documents are chunked, embedded, and stored in a vector store (OpenSearch or Aurora). At inference, relevant chunks are retrieved and injected as context. This is Bedrock’s native managed RAG pipeline.

Bedrock Knowledge BasesS3 → Embed → RetrieveNative RAG

Use Case
Which service suits a regulated industry (healthcare, finance)?

SageMaker JumpStart for full data control inside your own VPC — no data leaves your environment. Bedrock is also enterprise-safe (data not used for model training, VPC endpoints available), but JumpStart gives deeper control for compliance-heavy workloads requiring model auditability.

JumpStart = full VPC controlBedrock = enterprise-safe API

Name
Three Bedrock model providers and what they’re known for

Anthropic (Claude) — safety-focused, long context, strong reasoning. AI21 Labs (Jurassic) — instruction-following, enterprise text generation. Stability AI — image generation (Stable Diffusion). Amazon’s own Titan models cover embeddings and text generation natively.

Anthropic = safety + reasoningStability = imagesTitan = embeddings

AWS · Cloud AI · Service Comparison

SageMaker JumpStart vs Amazon Bedrock — Choosing the Right AWS AI Tool

ServicesSageMaker JumpStart · Amazon Bedrock

FocusUse Cases · Architecture · Decision Guide

StackAWS Cloud AI

SageMaker JumpStart

The Swiss Army Knife of Machine Learning

Amazon Bedrock

The Generative AI Powerhouse

Service Deep Dive

What each service actually does

Amazon SageMaker

JumpStart

Pre-built ML solutions + full customization control within the SageMaker ecosystem

ScopePart of the broader SageMaker ecosystem — notebooks, training, tuning, and deployment in one platform

ModelsWide range of pre-built ML solutions: image classification, object detection, text analysis, and more

CustomizationFine-tuning, transfer learning, and model retraining — you own the model weights

DeploymentSeamless SageMaker integration; supports both batch and real-time inference endpoints

User ProfileData scientists and ML engineers who want to customize models and build tailored solutions

Amazon

Bedrock

Fully managed foundation model API — access top GenAI models without infrastructure

ScopeStandalone service dedicated to generative AI — no ML infrastructure to manage

ModelsCurated foundation models from Amazon, Anthropic (Claude), AI21 Labs, Cohere, Meta, and Stability AI

CustomizationFine-tuning available on select models; RAG via Knowledge Bases; no direct model weight access

DeploymentAPI-first. Minimal setup to start generating text, images, or embeddings — serverless by default

User ProfileDevelopers and product teams integrating GenAI features fast — rapid prototyping and deployment

At a Glance

Side-by-side quick reference

Factor	SageMaker JumpStart	Amazon Bedrock
Primary Focus	Classical ML + customizable models	Generative AI via foundation models
Setup Complexity	Medium — SageMaker config needed	Low — API call to start
Model Ownership	Full — fine-tune and own weights	No — managed by providers
Customization Depth	Deep — transfer learning, retraining	Limited — fine-tuning + RAG
Data Privacy	Full control in your VPC	Data not used for model training
Inference Mode	Batch + real-time endpoints	Serverless API (on-demand)
Model Variety	Vision, NLP, tabular, forecasting	Text, image, embedding, multimodal
Best For	Tailored ML, regulated industries, data science teams	Rapid GenAI features, chatbots, content generation

Real-World Examples

Use cases — side by side

SageMaker JumpStart Examples

E-Commerce · Computer Vision

Product Image Classification

An e-commerce company needs to automatically categorize product images uploaded by sellers.

Retail · NLP

Customer Sentiment Analysis

A company wants to analyze customer reviews at scale to understand satisfaction trends.

Financial Services · Fraud Detection

Real-Time Transaction Fraud Detection

A financial institution needs to flag fraudulent transactions in real time.

Amazon Bedrock Examples

Healthcare · Predictive AI

Patient Readmission Prediction

A healthcare provider wants to predict patient readmissions based on historical patient data.

Retail · Forecasting

Demand Forecasting Across Stores

A retail chain needs to forecast product demand across hundreds of locations.

Manufacturing · Computer Vision

Production Line Quality Control

A manufacturer wants to identify product defects using images from production lines in real time.

Interview Prep

Cheat sheet — quick definitions to remember

Define
What is Amazon SageMaker JumpStart?

Pre-built modelsFine-tunableSageMaker ecosystem

Define
What is Amazon Bedrock?

Managed FM APIMulti-providerServerless

Compare
JumpStart vs Bedrock — when do you pick each?

JumpStart = custom MLBedrock = GenAI fast

Explain
What is a Foundation Model (FM)?

Pre-trained at scalePrompt or fine-tuneClaude, Llama, Titan

Gotcha
Can you use RAG with Bedrock? How?

Bedrock Knowledge BasesS3 → Embed → RetrieveNative RAG

Use Case
Which service suits a regulated industry (healthcare, finance)?

JumpStart = full VPC controlBedrock = enterprise-safe API

Name
Three Bedrock model providers and what they’re known for

Anthropic = safety + reasoningStability = imagesTitan = embeddings

The post AWS Sagemaker Jumpstart and AWS Bedrock Choosing the Right AI Tool for Your Needs appeared first on Vijay Gokarn.

FAST API

Vijay Gokarn — Wed, 17 Jul 2024 08:47:26 +0000

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. Here are several reasons why FastAPI is an excellent choice for building APIs:

1. Speed

FastAPI is one of the fastest web frameworks available, thanks to its use of Starlette for the web parts and Pydantic for the data parts. It’s designed to be as fast as possible, ensuring that your API has minimal latency and high throughput.

2. Ease of Use

FastAPI leverages Python’s type hints, making the code easier to write and understand. It automatically generates interactive API documentation (using Swagger UI and ReDoc), which simplifies testing and understanding of the API endpoints.

3. Data Validation

FastAPI uses Pydantic for data validation and parsing. This ensures that the data sent to the API is correctly formatted and validated before any further processing.

4. Automatic Documentation

FastAPI generates interactive API documentation from your code using OpenAPI. This is extremely helpful for developers, as they can see and interact with the API directly from the browser.

5. Asynchronous Support

FastAPI has first-class support for asynchronous programming, making it easy to write asynchronous endpoints that can handle large numbers of concurrent requests efficiently.

6. Dependency Injection

FastAPI provides a powerful dependency injection system that makes it easy to manage and inject dependencies into your endpoints, which can simplify the design of your application and improve its testability.

7. Security

FastAPI includes tools to handle security and authentication, like OAuth2 and JWT tokens, right out of the box.

8. Community and Ecosystem

FastAPI has a growing community and a rich ecosystem of plugins and extensions. It integrates well with other popular Python libraries and tools, such as SQLAlchemy for ORM, Celery for background tasks, and others.

Example: Building a tasklist FastAPI

1. Imports

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
from uuid import UUID, uuid4

FastAPI: The web framework for building APIs.
HTTPException: Used to raise HTTP errors.
BaseModel: A base class from Pydantic for creating data models.
List, Optional: Type hints for better code readability and type checking.
UUID, uuid4: For generating unique identifiers for each task or blog post.

2. Initialize FastAPI

app = FastAPI()

Creates an instance of the FastAPI application.

3. Define the Task Model

class Task(BaseModel):
    id: Optional[UUID] = None
    title: str
    description: Optional[str] = None
    completed: bool = False

Task: This is a data model for tasks with optional id, title, optional description, and completed status.

4. In-Memory Storage

tasks = []

tasks: A list to store the tasks. In a real application, this would typically be a database.

5. Create Task Endpoint

@app.post("/tasks/", response_model=Task)
def create_task(task: Task):
    task.id = uuid4()
    tasks.append(task)
    return task

@app.post(“/tasks/”): This is a POST endpoint to create a new task.
create_task: A function that accepts a Task, assigns it a unique id, and adds it to the tasks list.

6. Read All Tasks Endpoint

@app.get("/tasks/", response_model=List[Task])
def read_tasks():
    return tasks

@app.get(“/tasks/”): This is a GET endpoint to read all tasks.
read_tasks: A function that returns the list of all tasks.

7. Read Task by ID Endpoint

@app.get("/tasks/{task_id}", response_model=Task)
def read_task(task_id: UUID):
    for task in tasks:
        if task.id == task_id:
            return task
        
    raise HTTPException(status_code=404, detail="Task not found")

@app.get(“/tasks/{task_id}”): This is a GET endpoint to read a specific task by its ID.
read_task: A function that searches for a task by its ID. If found, it returns the task; otherwise, it raises a 404 error.

8. Update Task Endpoint

@app.put("/tasks/{task_id}", response_model=Task)
def update_task(task_id: UUID, task_update: Task):
    for idx, task in enumerate(tasks):
        if task.id == task_id:
            updated_task = task.copy(update=task_update.dict(exclude_unset=True))
            tasks[idx] = updated_task
            return updated_task
        
    raise HTTPException(status_code=404, detail="Task not found")

@app.put(“/tasks/{task_id}”): This is a PUT endpoint to update an existing task.
update_task: A function that updates a task’s information by copying the updated fields and replacing the old task. If the task is not found, it raises a 404 error.

9. Delete Task Endpoint

@app.delete("/tasks/{task_id}", response_model=Task)
def delete_task(task_id: UUID):
    for idx, task in enumerate(tasks):
        if task.id == task_id:
            return tasks.pop(idx)
    
    raise HTTPException(status_code=404, detail="Task not found")

@app.delete(“/tasks/{task_id}”): This is a DELETE endpoint to delete a task by its ID.
delete_task: A function that removes a task from the list if found; otherwise, it raises a 404 error.

10. Run the Application

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

if name == “main”: This ensures that the app runs only if the script is executed directly.
uvicorn.run: Runs the FastAPI app using Uvicorn, a lightning-fast ASGI server.

Git

The post FAST API appeared first on Vijay Gokarn.