Quiz 2: LLM Architecture¶

Test your understanding of large language model architecture and training concepts.

Questions¶

Question 1 (Remember)¶

What is a token in the context of large language models?

A security credential
The basic unit of text processed by the model
A type of neural network
An API parameter

Answer

B) The basic unit of text processed by the model - Tokens are typically words or subwords that LLMs process. The model predicts the next most likely token based on the input sequence.

Question 2 (Remember)¶

What neural network architecture underlies most modern LLMs?

Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Transformer
Perceptron

Answer

C) Transformer - The transformer architecture, introduced in 2017's "Attention is All You Need" paper, is the foundation of GPT, Claude, Gemini, and other modern LLMs.

Question 3 (Understand)¶

What is the primary function of the attention mechanism in transformers?

To reduce model size
To allow the model to focus on relevant parts of input
To speed up training
To encrypt data

Answer

B) To allow the model to focus on relevant parts of input - Attention mechanisms enable the model to weigh the importance of different input tokens when generating each output token, capturing relationships regardless of distance.

Question 4 (Understand)¶

What does RLHF stand for and why is it important?

Rapid Learning from Historical Files - speeds up training
Reinforcement Learning from Human Feedback - aligns outputs with human preferences
Recursive Language Handling Framework - improves grammar
Real-time Learning for Higher Fidelity - improves accuracy

Answer

B) Reinforcement Learning from Human Feedback - aligns outputs with human preferences - RLHF is a training method where human evaluators rate model outputs, and the model learns to produce responses that align with human preferences.

Question 5 (Apply)¶

If a model has a 128K token context window and you want to process a 200-page document (~100K tokens), what approach should you use?

Process the document directly—it fits within the context window
Use a model with a larger context window
Split the document and process separately
Convert to images first

Answer

A) Process the document directly—it fits within the context window - A 100K token document fits within a 128K token context window. Models like GPT-4 Turbo (128K) and Claude 3 (200K) can handle long documents directly.

Question 6 (Apply)¶

You notice that LLM responses are too predictable and lack creativity. Which parameter should you adjust?

Increase max tokens
Increase temperature
Decrease context window
Add more system prompts

Answer

B) Increase temperature - Temperature controls randomness. Higher values (0.7-1.0) produce more creative, varied outputs. Lower values (0-0.3) produce more consistent, predictable responses.

Question 7 (Analyze)¶

Compare pre-training and fine-tuning in terms of data requirements and purpose:

Both require the same amount of data
Fine-tuning requires more data than pre-training
Pre-training uses massive general data; fine-tuning uses smaller task-specific data
Pre-training is optional; fine-tuning is required

Answer

C) Pre-training uses massive general data; fine-tuning uses smaller task-specific data - Pre-training on billions of tokens teaches general language understanding. Fine-tuning with smaller datasets adapts the model to specific tasks or domains.

Question 8 (Analyze)¶

Why does multi-head attention provide advantages over single-head attention?

It requires less computation
It uses fewer parameters
It captures different types of relationships simultaneously
It eliminates the need for training

Answer

C) It captures different types of relationships simultaneously - Multiple attention heads can learn different aspects of relationships (syntax, semantics, coreference) in parallel, providing richer representations.

Question 9 (Evaluate)¶

A company wants to process sensitive legal documents. They're comparing a 7B parameter open-source model versus GPT-4. What is the most important consideration?

The open-source model is always better for privacy
GPT-4 is always more accurate
Privacy requirements, accuracy needs, and deployment options must be balanced
Parameter count is the only factor that matters

Answer

C) Privacy requirements, accuracy needs, and deployment options must be balanced - The decision involves trade-offs: open-source allows on-premise deployment for privacy, but may have lower accuracy. GPT-4 offers better performance but sends data to external servers.

Question 10 (Create)¶

You need to design a system that generates consistent customer support responses. Which combination of architecture decisions is most appropriate?

High temperature, no system prompt, minimal context
Random temperature, long context, no fine-tuning
Low temperature, structured system prompt, relevant context, possibly fine-tuned
Maximum tokens, no constraints, creative mode

Answer

C) Low temperature, structured system prompt, relevant context, possibly fine-tuned - Consistent responses require low temperature (predictability), clear system prompts (behavior guidance), relevant context (accuracy), and potentially fine-tuning on support examples.

Score Interpretation¶

9-10 correct: Excellent understanding of LLM architecture
7-8 correct: Good grasp, review missed concepts
5-6 correct: Fair understanding, revisit chapter sections
Below 5: Re-read Chapter 2 before proceeding

Previous: Quiz 1 | Back to Quizzes | Next: Quiz 3 - AI Platforms