|
--- |
|
license: mit |
|
datasets: |
|
- MuzzammilShah/people-names |
|
language: |
|
- en |
|
model_name: Bigram Character-Level Language Model |
|
library_name: pytorch |
|
tags: |
|
- makemore |
|
- bigram |
|
- language-model |
|
- andrej-karpathy |
|
--- |
|
|
|
# Bigram Character-Level Language Model: Makemore (Part 1) |
|
|
|
Introduced to the concept of a bigram character-level language model, this repository explores its **training**, **sampling**, and **evaluation** processes. The model evaluation was conducted using the **Negative Log Likelihood (NLL)** loss to assess its quality. |
|
|
|
## Overview |
|
The model was trained in two distinct ways, both yielding identical results: |
|
1. **Frequency-Based Approach**: Directly counting and normalizing bigram frequencies. |
|
2. **Gradient-Based Optimization**: Optimizing the counts matrix using a gradient-based framework guided by minimizing the NLL loss. |
|
|
|
This demonstrated that **both methods converge to the same result**, showcasing their equivalence in achieving the desired outcome. |
|
|
|
## Documentation |
|
For a better reading experience and detailed notes, visit my **[Road to GPT Documentation Site](https://muzzammilshah.github.io/Road-to-GPT/Makemore-part1/)**. |
|
|
|
## Acknowledgments |
|
Notes and implementations inspired by the **Makemore - Part 1** video by [Andrej Karpathy](https://karpathy.ai/). |
|
|
|
For more of my projects, visit my [Portfolio Site](https://muhammedshah.com). |