CME295 Lecture Notes
이 폴더는 CME295 강의 영상을 기반으로 정리한 한국어 lecture notes 모음이다. 각 강의 README는 강의 요약, 핵심 개념, 실무 관점의 메모, 복습 질문과 답변, 그리고 Mermaid/SVG 다이어그램을 포함한다.
Lectures
Section titled “Lectures”| Lecture | Topic | Notes | Source |
|---|---|---|---|
| 01 | Transformer 기초 | lec-01/README.md | YouTube |
| 02 | Transformer-based models and tricks | lec-02/README.md | YouTube |
| 03 | LLMs, decoding, prompting, and inference | lec-03/README.md | YouTube |
| 04 | LLM training, fine-tuning, and efficient adaptation | lec-04/README.md | YouTube |
| 05 | LLM tuning and human preferences | lec-05/README.md | YouTube |
| 06 | LLM reasoning and GRPO | lec-06/README.md | YouTube |
Learning Path
Section titled “Learning Path”%%{init: {"theme": "base", "themeVariables": {"background": "#171717", "primaryColor": "#232323", "primaryTextColor": "#f5f5f5", "primaryBorderColor": "#d0d0d0", "lineColor": "#cfcfcf", "fontFamily": "Inter, Arial, sans-serif"}}}%%
flowchart LR
A[Lecture 1<br/>Transformer basics] --> B[Lecture 2<br/>Transformer variants]
B --> C[Lecture 3<br/>LLM inference]
C --> D[Lecture 4<br/>LLM training]
D --> I[Lecture 5<br/>Preference tuning]
I --> J[Lecture 6<br/>LLM reasoning]
A --> E[Attention<br/>Q/K/V]
B --> F[Position, norm,<br/>BERT, KV cache]
C --> G[MoE, decoding,<br/>prompting]
D --> H[Pre-training, ZeRO,<br/>SFT, LoRA]
I --> K[RLHF, PPO,<br/>DPO]
J --> L[GRPO, pass@K,<br/>DeepSeek-R1]
classDef primary fill:#232323,stroke:#d0d0d0,color:#f5f5f5,stroke-width:2px;
classDef secondary fill:#3b2f20,stroke:#d0d0d0,color:#f5f5f5,stroke-width:2px;
classDef note fill:#52676b,stroke:#d0d0d0,color:#f5f5f5,stroke-width:2px;
classDef accent fill:#62164d,stroke:#d0d0d0,color:#f5f5f5,stroke-width:2px;
class A,B,C,D,I,J primary
class E,F,G,K,L note
class H accent
Lecture Overview
Section titled “Lecture Overview”Lecture 1: Transformer
Section titled “Lecture 1: Transformer”Lecture 1은 NLP task, tokenization, representation learning, RNN/LSTM의 한계, attention의 필요성을 거쳐 Transformer encoder-decoder 구조를 설명한다. 핵심은 self-attention이 token 간 의존성을 병렬적으로 계산하고, query/key/value 구조로 information retrieval처럼 동작한다는 점이다.
주요 다이어그램:
Lecture 2: Transformer-Based Models and Tricks
Section titled “Lecture 2: Transformer-Based Models and Tricks”Lecture 2는 Transformer를 실제 model family로 확장할 때 필요한 positional encoding, RoPE, layer normalization, RMSNorm, MHA/MQA/GQA, encoder-only model, BERT pre-training을 다룬다. Lecture 1이 architecture의 기본 원리를 설명했다면, Lecture 2는 modern Transformer를 안정적이고 효율적으로 만드는 설계 선택을 정리한다.
주요 다이어그램:
Lecture 3: Large Language Models, Decoding, Prompting, and Inference
Section titled “Lecture 3: Large Language Models, Decoding, Prompting, and Inference”Lecture 3은 decoder-only Transformer가 LLM으로 확장되는 과정을 다룬다. Mixture of Experts, next-token decoding, greedy/beam/sampling, temperature, prompt structure, in-context learning, chain of thought, KV cache, PagedAttention, speculative decoding 등 inference-time behavior와 serving optimization이 중심이다.
주요 다이어그램:
Lecture 4: LLM Training, Fine-Tuning, and Efficient Adaptation
Section titled “Lecture 4: LLM Training, Fine-Tuning, and Efficient Adaptation”Lecture 4는 LLM을 어떻게 학습하고 조정하는지 설명한다. Pre-training, scaling laws, FLOPs/FLOP/s, GPU memory footprint, data parallelism, ZeRO, model parallelism, FlashAttention, mixed precision, SFT, instruction tuning, evaluation, alignment, LoRA, QLoRA가 핵심이다.
주요 다이어그램:
Lecture 5: LLM Tuning and Human Preferences
Section titled “Lecture 5: LLM Tuning and Human Preferences”Lecture 5는 SFT model을 human preference에 맞게 조정하는 preference tuning을 설명한다. Pairwise preference data, reward model, Bradley-Terry formulation, RLHF, PPO clip/KL penalty, reward hacking, best-of-N, DPO가 핵심이다.
주요 다이어그램:
Lecture 6: LLM Reasoning and GRPO
Section titled “Lecture 6: LLM Reasoning and GRPO”Lecture 6은 reasoning model을 answer 전에 reasoning chain을 생성하는 LLM으로 보고, math/code처럼 정답 검증이 가능한 task에서 verifiable rewards와 GRPO로 reasoning behavior를 학습하는 방법을 설명한다. pass@K, sampling temperature, reasoning token cost, output length growth, DeepSeek-R1-Zero/R1 training pipeline, reasoning distillation이 핵심이다.
주요 다이어그램:
Concept Map
Section titled “Concept Map”| Concept | First covered | Later use |
|---|---|---|
| Tokenization | Lecture 1 | pre-training data, SFT loss masking |
| Self-attention | Lecture 1 | MHA, KV cache, FlashAttention |
| Q/K/V | Lecture 1 | MHA/MQA/GQA, RoPE, KV cache |
| Positional information | Lecture 2 | long context, RoPE, context rot |
| Transformer families | Lecture 2 | decoder-only LLMs |
| Decoder-only LLM | Lecture 3 | pre-training and SFT |
| MoE | Lecture 3 | expert parallelism |
| KV cache | Lecture 3 | inference memory and throughput |
| Scaling laws | Lecture 4 | model/data/compute allocation |
| ZeRO | Lecture 4 | distributed training memory |
| FlashAttention | Lecture 4 | exact attention with lower HBM IO |
| LoRA/QLoRA | Lecture 4 | efficient fine-tuning |
| Preference tuning | Lecture 5 | human preference alignment after SFT |
| Reward model | Lecture 5 | RLHF and best-of-N scoring |
| PPO | Lecture 5 | RLHF policy optimization |
| DPO | Lecture 5 | supervised-style preference optimization |
| Chain of thought | Lecture 3 | reasoning chains and test-time compute |
| Preference tuning / PPO | Lecture 5 | GRPO comparison and reasoning RL |
| pass@K | Lecture 6 | coding/math reasoning evaluation |
| GRPO | Lecture 6 | reasoning model RL training |
| DeepSeek-R1 | Lecture 6 | multi-stage reasoning training pipeline |
Repository Notes
Section titled “Repository Notes”- Lecture notes are written in Korean, while technical terms are often kept in English when that is clearer.
- SVG diagrams use the shared editorial style from AGENTS.md: restrained palette, thin line boxes, minimal fill, and accent red only for critical paths.
- Mermaid diagrams include local
classDefstyling so they follow the same visual scheme in GitHub-rendered markdown. - Each lecture README ends with review questions and answers for quick self-checking.