Custom GPTs, Agents, and RAG Systems¶

Summary¶

This chapter explores building custom AI solutions beyond basic prompting. Students will learn to create custom GPTs for specific business applications, understand AI agents and autonomous systems, and implement Retrieval-Augmented Generation (RAG) to enhance AI accuracy with external knowledge. These skills enable the development of sophisticated AI-powered workflows.

Concepts Covered¶

This chapter covers the following 17 concepts from the learning graph:

Custom GPT
GPT Builder
GPT Actions
AI Agents
Autonomous Systems
Agent Workflows
No-Code AI Tools
Low-Code Platforms
Workflow Automation
RAG
Retrieval Systems
Knowledge Bases
Vector Database
Semantic Search
Similarity Search
Cosine Similarity

Prerequisites¶

This chapter builds on concepts from:

Learning Objectives¶

After completing this chapter, students will be able to:

Build custom GPTs for specific business applications
Design AI agent workflows for automated task completion
Implement RAG patterns for knowledge-augmented applications
Use vector databases and embeddings for semantic search
Evaluate no-code and low-code AI platforms

Introduction¶

Moving beyond direct prompting opens a vast landscape of possibilities for AI-powered applications. This chapter explores three interconnected approaches to building sophisticated AI systems: custom GPTs that package specialized capabilities into reusable assistants, AI agents that can plan and execute multi-step tasks autonomously, and Retrieval-Augmented Generation (RAG) that grounds AI responses in authoritative knowledge sources.

These technologies represent the practical frontier of generative AI deployment. While prompting remains foundational, scaling AI capabilities across an organization requires tools that non-technical users can customize, workflows that execute without constant human oversight, and systems that access proprietary knowledge bases rather than relying solely on training data.

Custom GPTs: Packaging AI Expertise¶

Understanding Custom GPTs¶

A Custom GPT is a specialized version of ChatGPT configured with specific instructions, knowledge, and capabilities for a particular purpose. Introduced by OpenAI in late 2023, Custom GPTs democratize AI customization by enabling non-developers to create tailored AI assistants.

Custom GPTs package together:

System instructions: Detailed guidance on persona and behavior
Knowledge files: Documents the GPT can reference
Actions: API integrations for external system access
Conversation starters: Suggested prompts for users

Component	Purpose	Example
Instructions	Define behavior and expertise	"You are a legal contract analyst specializing in SaaS agreements"
Knowledge	Provide reference information	Company policy documents, product catalogs
Actions	Enable external integrations	Query CRM, update databases, send emails
Starters	Guide user interaction	"Analyze this contract," "What's our refund policy?"

Building with GPT Builder¶

The GPT Builder is OpenAI's conversational interface for creating Custom GPTs. Rather than configuring settings manually, users describe what they want, and GPT Builder translates those descriptions into configuration.

The GPT Builder process:

Describe your GPT: Explain its purpose and capabilities in natural language
Configure details: Refine instructions, add knowledge files
Define actions: Connect to external APIs if needed
Test: Interact with the GPT in preview mode
Publish: Share privately, within organization, or publicly

GPT Builder conversation example:

User: I want to create a GPT that helps my sales team understand our product features and answer customer objections.

GPT Builder: Great! I'll help you create a sales assistant. What products does your company sell?

User: We sell enterprise project management software called TeamFlow.

GPT Builder: I'll configure this GPT as a TeamFlow product expert. You'll want to upload product documentation and common objection handling guides. Would you like me to set this up?

GPT Actions: Connecting to the World¶

GPT Actions enable Custom GPTs to interact with external systems via API calls. This transforms GPTs from knowledge assistants into operational tools that can retrieve real-time data, trigger workflows, and update external systems.

Actions are defined using OpenAPI specifications:

openapi: 3.0.0
info:
  title: CRM Integration
  version: 1.0.0
paths:
  /customers/{id}:
    get:
      summary: Get customer details
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Customer data

Action use cases:

Domain	Action Example	Business Value
Sales	Retrieve customer history from CRM	Personalized conversation context
Support	Create support tickets	Automated issue logging
HR	Query employee policies	Self-service HR assistance
Finance	Look up invoice status	Real-time billing information
Operations	Trigger workflow in automation platform	Process automation

Security Considerations

GPT Actions execute API calls with configured credentials. Implement proper authentication, rate limiting, and audit logging. Avoid exposing sensitive operations without appropriate access controls.

AI Agents: Autonomous Task Execution¶

What Are AI Agents?¶

AI Agents are systems that use language models to reason about tasks, create plans, and execute actions autonomously. Unlike simple prompt-response interactions, agents can decompose complex goals into subtasks, use tools to gather information or take actions, and iterate until objectives are achieved.

Key agent capabilities:

Planning: Breaking down goals into actionable steps
Tool use: Invoking external functions, APIs, or code execution
Memory: Maintaining context across extended interactions
Reasoning: Deciding next actions based on observations
Self-correction: Detecting and recovering from errors

Autonomous Systems¶

Autonomous systems extend agent concepts to operate with minimal human intervention. These systems observe environments, make decisions, and take actions in pursuit of defined objectives.

The autonomy spectrum:

Level	Description	Human Role	Example
Assistive	AI suggests, human executes	Decision maker	Email drafting suggestions
Collaborative	AI executes routine, escalates complex	Supervisor	Automated ticket routing with escalation
Supervised	AI executes autonomously, human reviews	Auditor	Code review bots with merge approval
Autonomous	AI executes and self-evaluates	Monitor	Automated trading within parameters

The Loop: AI in the Loop vs. Human in the Loop

"Human in the loop" systems require human approval for actions. "Human on the loop" systems proceed autonomously but with human monitoring capability. "Human out of the loop" systems operate fully autonomously. Choose the appropriate level based on risk tolerance and regulatory requirements.

Agent Workflows¶

Agent workflows orchestrate multiple agents or agent actions to accomplish complex business processes. These workflows define how agents coordinate, share information, and hand off tasks.

Common workflow patterns:

Sequential Pipeline

Agent A (Research) → Agent B (Analysis) → Agent C (Report Generation)

Parallel Execution

                → Agent A (Data Source 1) ─┐
User Request ─┤                           ├→ Synthesizer Agent
                → Agent B (Data Source 2) ─┘

Supervisor Pattern

Supervisor Agent
    ├── Worker Agent 1 (Specialized Task)
    ├── Worker Agent 2 (Specialized Task)
    └── Worker Agent 3 (Specialized Task)

Diagram: Agent Workflow Patterns¶

The following diagram compares three common patterns for organizing multi-agent workflows, each suited to different types of automation scenarios.

flowchart TB
    subgraph Pattern3["Pattern 3: Supervisor/Worker"]
        direction TB
        S3["🎯 Supervisor Agent<br/>Coordinates & Decides"]
        W3A["👷 Worker A<br/>Code Generation"]
        W3B["👷 Worker B<br/>Testing"]
        W3C["👷 Worker C<br/>Documentation"]
        S3 <-->|"Tasks"| W3A
        S3 <-->|"Results"| W3B
        S3 <-->|"Feedback"| W3C
    end

    subgraph Pattern2["Pattern 2: Parallel Fan-Out"]
        direction TB
        D2["📤 Dispatcher"]
        P2A["🔍 Search Agent"]
        P2B["📊 Analysis Agent"]
        P2C["📝 Summary Agent"]
        A2["📥 Aggregator"]
        D2 --> P2A & P2B & P2C
        P2A & P2B & P2C --> A2
    end

    subgraph Pattern1["Pattern 1: Sequential Pipeline"]
        direction LR
        A1["📥 Research<br/>Agent"] --> B1["🔬 Analysis<br/>Agent"] --> C1["📤 Output<br/>Agent"]
    end

    style Pattern1 fill:#E3F2FD,stroke:#1565C0,stroke-width:2px
    style Pattern2 fill:#E8F5E9,stroke:#388E3C,stroke-width:2px
    style Pattern3 fill:#FFF3E0,stroke:#F57C00,stroke-width:2px

Pattern	Best For	Use Case	Trade-offs
Sequential Pipeline	Linear processes with clear handoffs	Document processing, data transformation	Simple but no parallelism
Parallel Fan-Out	Independent subtasks that can run concurrently	Multi-source research, batch processing	Fast but requires aggregation logic
Supervisor/Worker	Complex tasks requiring dynamic coordination	Software development, creative projects	Flexible but more complex orchestration

Choosing the Right Pattern

Use Sequential when each step depends entirely on the previous step's output
Use Parallel when you can split work into independent chunks and merge results
Use Supervisor/Worker when tasks require iteration, feedback loops, or dynamic decision-making

No-Code and Low-Code AI Platforms¶

The Democratization of AI Development¶

No-code AI tools enable users without programming skills to build AI-powered applications through visual interfaces, pre-built components, and natural language configuration. Low-code platforms provide visual development with optional code customization for advanced use cases.

This democratization enables:

Faster prototyping: Ideas to working prototypes in hours
Business ownership: Domain experts build their own solutions
Reduced bottlenecks: Less dependency on engineering teams
Experimentation: Easy testing of AI applications before investment

Platform Categories¶

Category	Examples	Capabilities
Custom GPT builders	OpenAI GPTs, Claude Projects	Specialized assistants with knowledge
Visual workflow builders	Zapier AI, Make	Connect apps with AI processing
Chatbot platforms	Botpress, Voiceflow	Conversational interfaces
Content generation	Jasper, Copy.ai	Marketing and content creation
Document processing	DocuSign Intelligent Agreement Management	Contract analysis, extraction
Analytics	ThoughtSpot, Tableau AI	Natural language data queries

Workflow Automation¶

Workflow automation platforms connect AI capabilities with business applications, enabling automated data flows and decision-making across systems.

Typical automation patterns:

Trigger: New email received
  ↓
AI Action: Classify intent and extract entities
  ↓
Condition: If intent = "support request"
  ↓
Action: Create ticket in helpdesk
  ↓
AI Action: Generate initial response
  ↓
Action: Send email response

Key automation platforms:

Zapier: Connect 5,000+ apps with AI-powered automations
Make (Integromat): Visual workflow builder with AI modules
Microsoft Power Automate: Enterprise automation with Copilot integration
n8n: Open-source workflow automation
Tray.io: Enterprise integration platform

Retrieval-Augmented Generation (RAG)¶

The RAG Architecture¶

Retrieval-Augmented Generation (RAG) is an architecture pattern that enhances LLM responses by retrieving relevant information from external knowledge sources and including it in the prompt context. This addresses fundamental LLM limitations around knowledge currency and hallucination.

The RAG process:

Query: User submits a question or request
Retrieve: System searches knowledge base for relevant content
Augment: Retrieved content is added to the prompt context
Generate: LLM produces response grounded in retrieved information

Diagram: RAG Architecture¶

The following diagram illustrates the complete RAG (Retrieval-Augmented Generation) pipeline, showing both the offline ingestion process and the real-time query flow.

flowchart LR
    subgraph Ingestion["📚 Offline Ingestion Pipeline"]
        direction TB
        D1["📄 Documents<br/>PDFs, Web, DBs"]
        D2["✂️ Chunking<br/>Split into segments"]
        D3["🔢 Embedding<br/>Text → Vectors"]
        D4[("💾 Vector<br/>Database")]
        D1 --> D2 --> D3 --> D4
    end

    subgraph Query["⚡ Real-Time Query Pipeline"]
        direction TB
        Q1["❓ User Query"]
        Q2["🔢 Query<br/>Embedding"]
        Q3["🔍 Similarity<br/>Search"]
        Q4["📋 Retrieved<br/>Chunks"]
        Q5["🤖 LLM with<br/>Context"]
        Q6["💬 Grounded<br/>Response"]
        Q1 --> Q2 --> Q3 --> Q4 --> Q5 --> Q6
    end

    D4 -.->|"Semantic<br/>Matching"| Q3

    style Ingestion fill:#E3F2FD,stroke:#1565C0,stroke-width:2px
    style Query fill:#E8F5E9,stroke:#388E3C,stroke-width:2px
    style D4 fill:#FFCC80,stroke:#F57C00
    style Q5 fill:#E1BEE7,stroke:#7B1FA2

Stage	Component	Function
Ingestion	Document Sources	Gather content from files, databases, APIs
Ingestion	Chunking	Split documents into retrievable segments (200-500 tokens)
Ingestion	Embedding	Convert text chunks to vector representations
Ingestion	Vector Database	Store embeddings for efficient similarity search
Query	Query Embedding	Convert user question to same vector space
Query	Similarity Search	Find most relevant chunks by cosine similarity
Query	LLM Generation	Produce response grounded in retrieved evidence

Key RAG Insight

The power of RAG lies in semantic matching—the query and document chunks are compared in a learned vector space where similar meanings cluster together, enabling retrieval based on conceptual relevance rather than just keyword matching.

Why RAG Matters¶

RAG addresses critical LLM limitations:

Limitation	How RAG Helps
Knowledge cutoff	Access current information from updated knowledge bases
Hallucination	Ground responses in retrieved evidence
Domain specificity	Include proprietary organizational knowledge
Source attribution	Cite specific documents supporting claims
Data privacy	Keep sensitive data in controlled systems

Knowledge Bases¶

A knowledge base in RAG context is a structured or semi-structured collection of documents that the retrieval system can search. Effective knowledge bases require:

Comprehensive coverage: Include all relevant information
Quality content: Well-written, accurate source material
Appropriate chunking: Documents split into retrievable units
Metadata: Tags, dates, sources for filtering and attribution
Currency: Regular updates to maintain relevance

Knowledge base sources:

Internal documentation (policies, procedures, guides)
Product catalogs and specifications
Customer support histories
Research reports and white papers
Email archives and communication records
Database exports and structured data

Vector Databases and Semantic Search¶

Understanding Embeddings for Search¶

Traditional search relies on keyword matching—finding documents containing query terms. Semantic search uses embeddings to find conceptually similar content regardless of exact word matches.

Comparison:

Query	Keyword Match	Semantic Match
"How to cancel subscription"	Documents with "cancel" and "subscription"	Also finds documents about "ending membership," "stopping service," "termination process"
"Employee vacation policy"	Documents mentioning "employee," "vacation," "policy"	Also finds "PTO guidelines," "time off procedures," "leave entitlement"

Vector Databases¶

A vector database is a specialized database optimized for storing and searching high-dimensional embedding vectors. Unlike traditional databases that search by field values, vector databases find similar items by mathematical distance between vectors.

Key vector database capabilities:

High-dimensional indexing: Efficient search across thousands of dimensions
Approximate nearest neighbor (ANN): Fast similarity search at scale
Metadata filtering: Combine semantic search with attribute filters
Real-time updates: Add new vectors without full reindexing
Scalability: Handle millions to billions of vectors

Popular vector databases:

Database	Type	Key Features
Pinecone	Managed	Easy scaling, high performance
Weaviate	Open source	Schema support, modules
Milvus	Open source	High throughput, GPU acceleration
Chroma	Open source	Simple, good for prototyping
pgvector	PostgreSQL extension	Integrate with existing PostgreSQL
Qdrant	Open source	Filtering, cloud-native

Similarity Search and Cosine Similarity¶

Similarity search finds vectors most similar to a query vector. The most common similarity metric is cosine similarity, which measures the angle between two vectors.

The cosine similarity formula:

\[\text{cosine\_similarity}(A, B) = \frac{A \cdot B}{||A|| \times ||B||}\]

Where: - \(A \cdot B\) is the dot product of vectors A and B - \(||A||\) and \(||B||\) are the magnitudes (lengths) of the vectors

Cosine similarity ranges from -1 to 1: - 1: Vectors point in same direction (identical meaning) - 0: Vectors are orthogonal (unrelated) - -1: Vectors point in opposite directions (opposite meaning)

Diagram: Vector Similarity Visualization¶

Vector Similarity Concepts

Type: microsim

Purpose: Interactive visualization of how semantic similarity works with embeddings

Bloom Taxonomy: Understand (L2) - Explain how vector similarity captures semantic relationships

Learning Objective: Students should be able to interpret cosine similarity values and understand why semantic search outperforms keyword matching

Canvas layout (responsive, minimum 700x500px): - Left panel: 2D projection of embedding space with sample words - Right panel: Similarity calculator and explanation

Visual elements in embedding space: - Words plotted as points in 2D space - Clusters of related concepts (colors by category) - Example clusters: Animals (blue), Vehicles (green), Food (orange) - Lines showing similarity between selected words

Interactive controls: - Click any word to select it - Click second word to see similarity - Display cosine similarity score - Show the angle between vectors visually - Slider to add/remove word clusters

Sample words by cluster: - Animals: dog, cat, puppy, kitten, pet, mammal - Vehicles: car, truck, automobile, vehicle, motorcycle - Food: apple, banana, fruit, vegetable, meal

Behavior: - Select two words, display similarity score - Similar concepts (dog, puppy) show high scores (~0.9) - Related concepts (dog, cat) show moderate scores (~0.7) - Unrelated concepts (dog, automobile) show low scores (~0.1) - Animate the angle measurement between vectors

Educational annotations: - "High similarity: concepts are semantically related" - "Low similarity: concepts are unrelated" - "Embeddings capture meaning, not just words"

Implementation: p5.js with clickable elements

Building RAG Applications¶

The RAG Development Process¶

Implementing RAG requires attention to each pipeline stage:

1. Document Processing - Extract text from various formats (PDF, DOCX, HTML) - Clean and normalize content - Split into chunks (typically 200-1000 tokens) - Handle overlapping for context continuity

2. Embedding Generation - Select appropriate embedding model - Generate embeddings for all chunks - Store embeddings with source metadata

3. Vector Indexing - Choose vector database - Configure index parameters (dimension, distance metric) - Load embeddings and metadata - Optimize for query patterns

4. Query Processing - Embed the user query - Perform similarity search - Apply metadata filters if needed - Rank and select top results

5. Response Generation - Construct prompt with retrieved context - Generate response using LLM - Include source citations - Handle cases with insufficient context

Chunking Strategies¶

Effective chunking balances several concerns:

Strategy	Description	Trade-offs
Fixed size	Split by character/token count	Simple but may cut mid-sentence
Sentence	Split at sentence boundaries	Maintains coherence, variable sizes
Paragraph	Split at paragraph breaks	Natural units, may be too large
Semantic	Use embeddings to find topic boundaries	Optimal meaning preservation, complex
Recursive	Try large splits first, subdivide if needed	Adaptive, handles variable content

Chunking best practices:

Include overlap between chunks (10-20%) to preserve context
Maintain metadata linking chunks to sources
Consider hierarchical chunking for long documents
Test retrieval quality with representative queries

Prompt Construction for RAG¶

The augmented prompt must effectively integrate retrieved content:

You are a customer support assistant. Answer questions using ONLY the
information provided in the context below. If the context doesn't contain
relevant information, say "I don't have information about that in my
knowledge base."

CONTEXT:
{retrieved_chunk_1}

{retrieved_chunk_2}

{retrieved_chunk_3}

USER QUESTION: {user_query}

Provide a helpful response, citing the source documents where applicable.

Key prompt design considerations:

Instruct the model to use only provided context
Handle missing information gracefully
Request source citations
Balance context length with response quality

Integration Patterns¶

Combining Custom GPTs, Agents, and RAG¶

The technologies in this chapter are complementary:

Custom GPT with RAG - Upload knowledge files directly to Custom GPT - GPT automatically retrieves from uploaded content - Limited by file size and format constraints

Agent with RAG - Agent uses RAG as a tool for knowledge retrieval - Can decide when to search vs. use existing context - Enables multi-source, multi-step research

Full Integration

User Query
    ↓
Custom GPT (conversational interface)
    ↓
Agent (plans research steps)
    ↓
RAG System (retrieves relevant knowledge)
    ↓
LLM (generates grounded response)
    ↓
Response with citations

Implementation Considerations¶

Factor	Consideration
Latency	RAG adds retrieval time; consider caching
Cost	Embedding + retrieval + generation costs compound
Accuracy	Retrieval quality bounds generation quality
Maintenance	Knowledge bases require regular updates
Scale	Vector databases need sizing for content volume
Security	Access control for sensitive knowledge

Key Takeaways¶

Custom GPTs package specialized instructions, knowledge, and capabilities into reusable AI assistants accessible to non-developers
GPT Builder enables conversational creation of Custom GPTs; GPT Actions connect them to external systems via APIs
AI agents can plan, use tools, and execute multi-step tasks autonomously, operating along a spectrum of human oversight levels
Workflow automation platforms enable no-code/low-code integration of AI with business applications
RAG addresses LLM limitations by retrieving relevant information from knowledge bases before generation
Vector databases store embeddings for efficient semantic search; cosine similarity measures conceptual relatedness
Effective RAG requires attention to chunking, embedding model selection, and prompt construction
These technologies complement each other: agents can use RAG for knowledge; Custom GPTs can wrap agent capabilities

Review Questions¶

How do GPT Actions extend the capabilities of Custom GPTs beyond static knowledge?

GPT Actions connect Custom GPTs to external systems through API calls, enabling: (1) Real-time data retrieval—accessing current information rather than static uploads, (2) Write operations—creating records, triggering workflows, sending notifications, (3) Authentication—accessing protected resources with user credentials, (4) Multi-system integration—connecting to CRMs, databases, internal tools. This transforms Custom GPTs from knowledge assistants into operational tools that can take actions and access live data.

What are the key components of a RAG system and how do they work together?

A RAG system has five key components: (1) Document processing—extracts, cleans, and chunks source documents, (2) Embedding model—converts chunks to vector representations, (3) Vector database—stores embeddings for similarity search, (4) Retrieval system—finds chunks similar to user queries, (5) LLM with augmented prompt—generates responses using retrieved context. The flow is: documents are processed and embedded offline; at query time, the query is embedded, similar chunks are retrieved, and the LLM generates a response grounded in the retrieved content.

Why does cosine similarity work for semantic search, and what do the values mean?

Cosine similarity works because embeddings encode semantic meaning as direction in high-dimensional space. Similar concepts point in similar directions regardless of vector magnitude. The cosine of the angle between vectors captures this directional similarity: (1) Score near 1.0 means vectors point the same direction—semantically very similar, (2) Score near 0 means vectors are orthogonal—semantically unrelated, (3) Score near -1.0 means vectors point opposite directions—semantically opposite (rare in practice). This enables finding conceptually related content even when queries don't share exact words with documents.