Custom GPTs, Agents, and RAG Systems¶
Summary¶
This chapter explores building custom AI solutions beyond basic prompting. Students will learn to create custom GPTs for specific business applications, understand AI agents and autonomous systems, and implement Retrieval-Augmented Generation (RAG) to enhance AI accuracy with external knowledge. These skills enable the development of sophisticated AI-powered workflows.
Concepts Covered¶
This chapter covers the following 17 concepts from the learning graph:
- Custom GPT
- GPT Builder
- GPT Actions
- AI Agents
- Autonomous Systems
- Agent Workflows
- No-Code AI Tools
- Low-Code Platforms
- Workflow Automation
- RAG
- Retrieval Systems
- Knowledge Bases
- Vector Database
- Semantic Search
- Similarity Search
- Cosine Similarity
Prerequisites¶
This chapter builds on concepts from:
- Chapter 2: Large Language Model Architecture
- Chapter 3: AI Platform Landscape
- Chapter 4: Prompt Engineering
Learning Objectives¶
After completing this chapter, students will be able to:
- Build custom GPTs for specific business applications
- Design AI agent workflows for automated task completion
- Implement RAG patterns for knowledge-augmented applications
- Use vector databases and embeddings for semantic search
- Evaluate no-code and low-code AI platforms
Introduction¶
Moving beyond direct prompting opens a vast landscape of possibilities for AI-powered applications. This chapter explores three interconnected approaches to building sophisticated AI systems: custom GPTs that package specialized capabilities into reusable assistants, AI agents that can plan and execute multi-step tasks autonomously, and Retrieval-Augmented Generation (RAG) that grounds AI responses in authoritative knowledge sources.
These technologies represent the practical frontier of generative AI deployment. While prompting remains foundational, scaling AI capabilities across an organization requires tools that non-technical users can customize, workflows that execute without constant human oversight, and systems that access proprietary knowledge bases rather than relying solely on training data.
Custom GPTs: Packaging AI Expertise¶
Understanding Custom GPTs¶
A Custom GPT is a specialized version of ChatGPT configured with specific instructions, knowledge, and capabilities for a particular purpose. Introduced by OpenAI in late 2023, Custom GPTs democratize AI customization by enabling non-developers to create tailored AI assistants.
Custom GPTs package together:
- System instructions: Detailed guidance on persona and behavior
- Knowledge files: Documents the GPT can reference
- Actions: API integrations for external system access
- Conversation starters: Suggested prompts for users
| Component | Purpose | Example |
|---|---|---|
| Instructions | Define behavior and expertise | "You are a legal contract analyst specializing in SaaS agreements" |
| Knowledge | Provide reference information | Company policy documents, product catalogs |
| Actions | Enable external integrations | Query CRM, update databases, send emails |
| Starters | Guide user interaction | "Analyze this contract," "What's our refund policy?" |
Building with GPT Builder¶
The GPT Builder is OpenAI's conversational interface for creating Custom GPTs. Rather than configuring settings manually, users describe what they want, and GPT Builder translates those descriptions into configuration.
The GPT Builder process:
- Describe your GPT: Explain its purpose and capabilities in natural language
- Configure details: Refine instructions, add knowledge files
- Define actions: Connect to external APIs if needed
- Test: Interact with the GPT in preview mode
- Publish: Share privately, within organization, or publicly
GPT Builder conversation example:
User: I want to create a GPT that helps my sales team understand our product features and answer customer objections.
GPT Builder: Great! I'll help you create a sales assistant. What products does your company sell?
User: We sell enterprise project management software called TeamFlow.
GPT Builder: I'll configure this GPT as a TeamFlow product expert. You'll want to upload product documentation and common objection handling guides. Would you like me to set this up?
GPT Actions: Connecting to the World¶
GPT Actions enable Custom GPTs to interact with external systems via API calls. This transforms GPTs from knowledge assistants into operational tools that can retrieve real-time data, trigger workflows, and update external systems.
Actions are defined using OpenAPI specifications:
openapi: 3.0.0
info:
title: CRM Integration
version: 1.0.0
paths:
/customers/{id}:
get:
summary: Get customer details
parameters:
- name: id
in: path
required: true
schema:
type: string
responses:
'200':
description: Customer data
Action use cases:
| Domain | Action Example | Business Value |
|---|---|---|
| Sales | Retrieve customer history from CRM | Personalized conversation context |
| Support | Create support tickets | Automated issue logging |
| HR | Query employee policies | Self-service HR assistance |
| Finance | Look up invoice status | Real-time billing information |
| Operations | Trigger workflow in automation platform | Process automation |
Security Considerations
GPT Actions execute API calls with configured credentials. Implement proper authentication, rate limiting, and audit logging. Avoid exposing sensitive operations without appropriate access controls.
AI Agents: Autonomous Task Execution¶
What Are AI Agents?¶
AI Agents are systems that use language models to reason about tasks, create plans, and execute actions autonomously. Unlike simple prompt-response interactions, agents can decompose complex goals into subtasks, use tools to gather information or take actions, and iterate until objectives are achieved.
Key agent capabilities:
- Planning: Breaking down goals into actionable steps
- Tool use: Invoking external functions, APIs, or code execution
- Memory: Maintaining context across extended interactions
- Reasoning: Deciding next actions based on observations
- Self-correction: Detecting and recovering from errors
Autonomous Systems¶
Autonomous systems extend agent concepts to operate with minimal human intervention. These systems observe environments, make decisions, and take actions in pursuit of defined objectives.
The autonomy spectrum:
| Level | Description | Human Role | Example |
|---|---|---|---|
| Assistive | AI suggests, human executes | Decision maker | Email drafting suggestions |
| Collaborative | AI executes routine, escalates complex | Supervisor | Automated ticket routing with escalation |
| Supervised | AI executes autonomously, human reviews | Auditor | Code review bots with merge approval |
| Autonomous | AI executes and self-evaluates | Monitor | Automated trading within parameters |
The Loop: AI in the Loop vs. Human in the Loop
"Human in the loop" systems require human approval for actions. "Human on the loop" systems proceed autonomously but with human monitoring capability. "Human out of the loop" systems operate fully autonomously. Choose the appropriate level based on risk tolerance and regulatory requirements.
Agent Workflows¶
Agent workflows orchestrate multiple agents or agent actions to accomplish complex business processes. These workflows define how agents coordinate, share information, and hand off tasks.
Common workflow patterns:
Sequential Pipeline
Parallel Execution
Supervisor Pattern
Supervisor Agent
├── Worker Agent 1 (Specialized Task)
├── Worker Agent 2 (Specialized Task)
└── Worker Agent 3 (Specialized Task)
Diagram: Agent Workflow Patterns¶
The following diagram compares three common patterns for organizing multi-agent workflows, each suited to different types of automation scenarios.
flowchart TB
subgraph Pattern3["Pattern 3: Supervisor/Worker"]
direction TB
S3["🎯 Supervisor Agent<br/>Coordinates & Decides"]
W3A["👷 Worker A<br/>Code Generation"]
W3B["👷 Worker B<br/>Testing"]
W3C["👷 Worker C<br/>Documentation"]
S3 <-->|"Tasks"| W3A
S3 <-->|"Results"| W3B
S3 <-->|"Feedback"| W3C
end
subgraph Pattern2["Pattern 2: Parallel Fan-Out"]
direction TB
D2["📤 Dispatcher"]
P2A["🔍 Search Agent"]
P2B["📊 Analysis Agent"]
P2C["📝 Summary Agent"]
A2["📥 Aggregator"]
D2 --> P2A & P2B & P2C
P2A & P2B & P2C --> A2
end
subgraph Pattern1["Pattern 1: Sequential Pipeline"]
direction LR
A1["📥 Research<br/>Agent"] --> B1["🔬 Analysis<br/>Agent"] --> C1["📤 Output<br/>Agent"]
end
style Pattern1 fill:#E3F2FD,stroke:#1565C0,stroke-width:2px
style Pattern2 fill:#E8F5E9,stroke:#388E3C,stroke-width:2px
style Pattern3 fill:#FFF3E0,stroke:#F57C00,stroke-width:2px
| Pattern | Best For | Use Case | Trade-offs |
|---|---|---|---|
| Sequential Pipeline | Linear processes with clear handoffs | Document processing, data transformation | Simple but no parallelism |
| Parallel Fan-Out | Independent subtasks that can run concurrently | Multi-source research, batch processing | Fast but requires aggregation logic |
| Supervisor/Worker | Complex tasks requiring dynamic coordination | Software development, creative projects | Flexible but more complex orchestration |
Choosing the Right Pattern
- Use Sequential when each step depends entirely on the previous step's output
- Use Parallel when you can split work into independent chunks and merge results
- Use Supervisor/Worker when tasks require iteration, feedback loops, or dynamic decision-making
No-Code and Low-Code AI Platforms¶
The Democratization of AI Development¶
No-code AI tools enable users without programming skills to build AI-powered applications through visual interfaces, pre-built components, and natural language configuration. Low-code platforms provide visual development with optional code customization for advanced use cases.
This democratization enables:
- Faster prototyping: Ideas to working prototypes in hours
- Business ownership: Domain experts build their own solutions
- Reduced bottlenecks: Less dependency on engineering teams
- Experimentation: Easy testing of AI applications before investment
Platform Categories¶
| Category | Examples | Capabilities |
|---|---|---|
| Custom GPT builders | OpenAI GPTs, Claude Projects | Specialized assistants with knowledge |
| Visual workflow builders | Zapier AI, Make | Connect apps with AI processing |
| Chatbot platforms | Botpress, Voiceflow | Conversational interfaces |
| Content generation | Jasper, Copy.ai | Marketing and content creation |
| Document processing | DocuSign Intelligent Agreement Management | Contract analysis, extraction |
| Analytics | ThoughtSpot, Tableau AI | Natural language data queries |
Workflow Automation¶
Workflow automation platforms connect AI capabilities with business applications, enabling automated data flows and decision-making across systems.
Typical automation patterns:
Trigger: New email received
↓
AI Action: Classify intent and extract entities
↓
Condition: If intent = "support request"
↓
Action: Create ticket in helpdesk
↓
AI Action: Generate initial response
↓
Action: Send email response
Key automation platforms:
- Zapier: Connect 5,000+ apps with AI-powered automations
- Make (Integromat): Visual workflow builder with AI modules
- Microsoft Power Automate: Enterprise automation with Copilot integration
- n8n: Open-source workflow automation
- Tray.io: Enterprise integration platform
Retrieval-Augmented Generation (RAG)¶
The RAG Architecture¶
Retrieval-Augmented Generation (RAG) is an architecture pattern that enhances LLM responses by retrieving relevant information from external knowledge sources and including it in the prompt context. This addresses fundamental LLM limitations around knowledge currency and hallucination.
The RAG process:
- Query: User submits a question or request
- Retrieve: System searches knowledge base for relevant content
- Augment: Retrieved content is added to the prompt context
- Generate: LLM produces response grounded in retrieved information
Diagram: RAG Architecture¶
The following diagram illustrates the complete RAG (Retrieval-Augmented Generation) pipeline, showing both the offline ingestion process and the real-time query flow.
flowchart LR
subgraph Ingestion["📚 Offline Ingestion Pipeline"]
direction TB
D1["📄 Documents<br/>PDFs, Web, DBs"]
D2["✂️ Chunking<br/>Split into segments"]
D3["🔢 Embedding<br/>Text → Vectors"]
D4[("💾 Vector<br/>Database")]
D1 --> D2 --> D3 --> D4
end
subgraph Query["⚡ Real-Time Query Pipeline"]
direction TB
Q1["❓ User Query"]
Q2["🔢 Query<br/>Embedding"]
Q3["🔍 Similarity<br/>Search"]
Q4["📋 Retrieved<br/>Chunks"]
Q5["🤖 LLM with<br/>Context"]
Q6["💬 Grounded<br/>Response"]
Q1 --> Q2 --> Q3 --> Q4 --> Q5 --> Q6
end
D4 -.->|"Semantic<br/>Matching"| Q3
style Ingestion fill:#E3F2FD,stroke:#1565C0,stroke-width:2px
style Query fill:#E8F5E9,stroke:#388E3C,stroke-width:2px
style D4 fill:#FFCC80,stroke:#F57C00
style Q5 fill:#E1BEE7,stroke:#7B1FA2
| Stage | Component | Function |
|---|---|---|
| Ingestion | Document Sources | Gather content from files, databases, APIs |
| Ingestion | Chunking | Split documents into retrievable segments (200-500 tokens) |
| Ingestion | Embedding | Convert text chunks to vector representations |
| Ingestion | Vector Database | Store embeddings for efficient similarity search |
| Query | Query Embedding | Convert user question to same vector space |
| Query | Similarity Search | Find most relevant chunks by cosine similarity |
| Query | LLM Generation | Produce response grounded in retrieved evidence |
Key RAG Insight
The power of RAG lies in semantic matching—the query and document chunks are compared in a learned vector space where similar meanings cluster together, enabling retrieval based on conceptual relevance rather than just keyword matching.
Why RAG Matters¶
RAG addresses critical LLM limitations:
| Limitation | How RAG Helps |
|---|---|
| Knowledge cutoff | Access current information from updated knowledge bases |
| Hallucination | Ground responses in retrieved evidence |
| Domain specificity | Include proprietary organizational knowledge |
| Source attribution | Cite specific documents supporting claims |
| Data privacy | Keep sensitive data in controlled systems |
Knowledge Bases¶
A knowledge base in RAG context is a structured or semi-structured collection of documents that the retrieval system can search. Effective knowledge bases require:
- Comprehensive coverage: Include all relevant information
- Quality content: Well-written, accurate source material
- Appropriate chunking: Documents split into retrievable units
- Metadata: Tags, dates, sources for filtering and attribution
- Currency: Regular updates to maintain relevance
Knowledge base sources:
- Internal documentation (policies, procedures, guides)
- Product catalogs and specifications
- Customer support histories
- Research reports and white papers
- Email archives and communication records
- Database exports and structured data
Vector Databases and Semantic Search¶
Understanding Embeddings for Search¶
Traditional search relies on keyword matching—finding documents containing query terms. Semantic search uses embeddings to find conceptually similar content regardless of exact word matches.
Comparison:
| Query | Keyword Match | Semantic Match |
|---|---|---|
| "How to cancel subscription" | Documents with "cancel" and "subscription" | Also finds documents about "ending membership," "stopping service," "termination process" |
| "Employee vacation policy" | Documents mentioning "employee," "vacation," "policy" | Also finds "PTO guidelines," "time off procedures," "leave entitlement" |
Vector Databases¶
A vector database is a specialized database optimized for storing and searching high-dimensional embedding vectors. Unlike traditional databases that search by field values, vector databases find similar items by mathematical distance between vectors.
Key vector database capabilities:
- High-dimensional indexing: Efficient search across thousands of dimensions
- Approximate nearest neighbor (ANN): Fast similarity search at scale
- Metadata filtering: Combine semantic search with attribute filters
- Real-time updates: Add new vectors without full reindexing
- Scalability: Handle millions to billions of vectors
Popular vector databases:
| Database | Type | Key Features |
|---|---|---|
| Pinecone | Managed | Easy scaling, high performance |
| Weaviate | Open source | Schema support, modules |
| Milvus | Open source | High throughput, GPU acceleration |
| Chroma | Open source | Simple, good for prototyping |
| pgvector | PostgreSQL extension | Integrate with existing PostgreSQL |
| Qdrant | Open source | Filtering, cloud-native |
Similarity Search and Cosine Similarity¶
Similarity search finds vectors most similar to a query vector. The most common similarity metric is cosine similarity, which measures the angle between two vectors.
The cosine similarity formula:
Where: - \(A \cdot B\) is the dot product of vectors A and B - \(||A||\) and \(||B||\) are the magnitudes (lengths) of the vectors
Cosine similarity ranges from -1 to 1: - 1: Vectors point in same direction (identical meaning) - 0: Vectors are orthogonal (unrelated) - -1: Vectors point in opposite directions (opposite meaning)
Diagram: Vector Similarity Visualization¶
Vector Similarity Concepts
Type: microsim
Purpose: Interactive visualization of how semantic similarity works with embeddings
Bloom Taxonomy: Understand (L2) - Explain how vector similarity captures semantic relationships
Learning Objective: Students should be able to interpret cosine similarity values and understand why semantic search outperforms keyword matching
Canvas layout (responsive, minimum 700x500px): - Left panel: 2D projection of embedding space with sample words - Right panel: Similarity calculator and explanation
Visual elements in embedding space: - Words plotted as points in 2D space - Clusters of related concepts (colors by category) - Example clusters: Animals (blue), Vehicles (green), Food (orange) - Lines showing similarity between selected words
Interactive controls: - Click any word to select it - Click second word to see similarity - Display cosine similarity score - Show the angle between vectors visually - Slider to add/remove word clusters
Sample words by cluster: - Animals: dog, cat, puppy, kitten, pet, mammal - Vehicles: car, truck, automobile, vehicle, motorcycle - Food: apple, banana, fruit, vegetable, meal
Behavior: - Select two words, display similarity score - Similar concepts (dog, puppy) show high scores (~0.9) - Related concepts (dog, cat) show moderate scores (~0.7) - Unrelated concepts (dog, automobile) show low scores (~0.1) - Animate the angle measurement between vectors
Educational annotations: - "High similarity: concepts are semantically related" - "Low similarity: concepts are unrelated" - "Embeddings capture meaning, not just words"
Implementation: p5.js with clickable elements
Building RAG Applications¶
The RAG Development Process¶
Implementing RAG requires attention to each pipeline stage:
1. Document Processing - Extract text from various formats (PDF, DOCX, HTML) - Clean and normalize content - Split into chunks (typically 200-1000 tokens) - Handle overlapping for context continuity
2. Embedding Generation - Select appropriate embedding model - Generate embeddings for all chunks - Store embeddings with source metadata
3. Vector Indexing - Choose vector database - Configure index parameters (dimension, distance metric) - Load embeddings and metadata - Optimize for query patterns
4. Query Processing - Embed the user query - Perform similarity search - Apply metadata filters if needed - Rank and select top results
5. Response Generation - Construct prompt with retrieved context - Generate response using LLM - Include source citations - Handle cases with insufficient context
Chunking Strategies¶
Effective chunking balances several concerns:
| Strategy | Description | Trade-offs |
|---|---|---|
| Fixed size | Split by character/token count | Simple but may cut mid-sentence |
| Sentence | Split at sentence boundaries | Maintains coherence, variable sizes |
| Paragraph | Split at paragraph breaks | Natural units, may be too large |
| Semantic | Use embeddings to find topic boundaries | Optimal meaning preservation, complex |
| Recursive | Try large splits first, subdivide if needed | Adaptive, handles variable content |
Chunking best practices:
- Include overlap between chunks (10-20%) to preserve context
- Maintain metadata linking chunks to sources
- Consider hierarchical chunking for long documents
- Test retrieval quality with representative queries
Prompt Construction for RAG¶
The augmented prompt must effectively integrate retrieved content:
You are a customer support assistant. Answer questions using ONLY the
information provided in the context below. If the context doesn't contain
relevant information, say "I don't have information about that in my
knowledge base."
CONTEXT:
{retrieved_chunk_1}
{retrieved_chunk_2}
{retrieved_chunk_3}
USER QUESTION: {user_query}
Provide a helpful response, citing the source documents where applicable.
Key prompt design considerations:
- Instruct the model to use only provided context
- Handle missing information gracefully
- Request source citations
- Balance context length with response quality
Integration Patterns¶
Combining Custom GPTs, Agents, and RAG¶
The technologies in this chapter are complementary:
Custom GPT with RAG - Upload knowledge files directly to Custom GPT - GPT automatically retrieves from uploaded content - Limited by file size and format constraints
Agent with RAG - Agent uses RAG as a tool for knowledge retrieval - Can decide when to search vs. use existing context - Enables multi-source, multi-step research
Full Integration
User Query
↓
Custom GPT (conversational interface)
↓
Agent (plans research steps)
↓
RAG System (retrieves relevant knowledge)
↓
LLM (generates grounded response)
↓
Response with citations
Implementation Considerations¶
| Factor | Consideration |
|---|---|
| Latency | RAG adds retrieval time; consider caching |
| Cost | Embedding + retrieval + generation costs compound |
| Accuracy | Retrieval quality bounds generation quality |
| Maintenance | Knowledge bases require regular updates |
| Scale | Vector databases need sizing for content volume |
| Security | Access control for sensitive knowledge |
Key Takeaways¶
- Custom GPTs package specialized instructions, knowledge, and capabilities into reusable AI assistants accessible to non-developers
- GPT Builder enables conversational creation of Custom GPTs; GPT Actions connect them to external systems via APIs
- AI agents can plan, use tools, and execute multi-step tasks autonomously, operating along a spectrum of human oversight levels
- Workflow automation platforms enable no-code/low-code integration of AI with business applications
- RAG addresses LLM limitations by retrieving relevant information from knowledge bases before generation
- Vector databases store embeddings for efficient semantic search; cosine similarity measures conceptual relatedness
- Effective RAG requires attention to chunking, embedding model selection, and prompt construction
- These technologies complement each other: agents can use RAG for knowledge; Custom GPTs can wrap agent capabilities
Review Questions¶
How do GPT Actions extend the capabilities of Custom GPTs beyond static knowledge?
GPT Actions connect Custom GPTs to external systems through API calls, enabling: (1) Real-time data retrieval—accessing current information rather than static uploads, (2) Write operations—creating records, triggering workflows, sending notifications, (3) Authentication—accessing protected resources with user credentials, (4) Multi-system integration—connecting to CRMs, databases, internal tools. This transforms Custom GPTs from knowledge assistants into operational tools that can take actions and access live data.
What are the key components of a RAG system and how do they work together?
A RAG system has five key components: (1) Document processing—extracts, cleans, and chunks source documents, (2) Embedding model—converts chunks to vector representations, (3) Vector database—stores embeddings for similarity search, (4) Retrieval system—finds chunks similar to user queries, (5) LLM with augmented prompt—generates responses using retrieved context. The flow is: documents are processed and embedded offline; at query time, the query is embedded, similar chunks are retrieved, and the LLM generates a response grounded in the retrieved content.
Why does cosine similarity work for semantic search, and what do the values mean?
Cosine similarity works because embeddings encode semantic meaning as direction in high-dimensional space. Similar concepts point in similar directions regardless of vector magnitude. The cosine of the angle between vectors captures this directional similarity: (1) Score near 1.0 means vectors point the same direction—semantically very similar, (2) Score near 0 means vectors are orthogonal—semantically unrelated, (3) Score near -1.0 means vectors point opposite directions—semantically opposite (rare in practice). This enables finding conceptually related content even when queries don't share exact words with documents.