Post

Diving Deep: My First Public Project - Training a GPT-2 Level LLM from Scratch!

Hey everyone!

So, this is it. My first ever blog post, and also the first time I’m really putting a project out there in the open. Super exciting!

Behind the Scenes
February 24, 2025

I actually wrote and rewrote this intro about four times! It's surprisingly demanding putting your thoughts out in public, but I'm glad I finally took the plunge. This blog has been on my to-do list for almost a year now.

Lately, I’ve been completely consumed by the world of Large Language Models (LLMs). It’s been a bit of a rabbit hole, to be honest, bouncing between research papers, blog posts, and just trying to wrap my head around how these things really work under the hood.

And that’s where this project comes in. I’ve decided to take the plunge and train a GPT-2 level LLM completely from scratch.

Why GPT-2? Well, it feels like a really solid starting point. It’s complex enough to be incredibly interesting, showcasing the power of transformers, but not so massive that it’s completely out of reach for someone like me working on my own (for now at least!). Plus, understanding GPT-2 feels like a crucial step before tackling the newer architectures like MAMBA that are causing such a buzz. I’m really curious to see how they stack up, performance-wise, but first we get to build that foundation first, right?

Technical Confession
March 02, 2025

After two weeks of work, I’ve discovered my initial training approach was way too ambitious. I started with trying to implement everything at once and got completely stuck debugging attention mechanisms. I’ve now taken a step back and am building it up layer by layer, testing each component thoroughly.

My whiteboard is completely covered in equations and diagrams - my roommate thinks I’ve gone a bit mad!

My main motivation here isn’t necessarily to create the next ground-breaking model (though, who knows, maybe someday!). It’s really about understanding. I want to truly grasp the transformer architecture, and especially the maths behind it.

Honestly, the math is what initially drew me in. The attention mechanism, backpropagation through these massive networks, the intricacies of layer normalization… it’s all so fascinating! I’ve been spending a lot of time brushing up on my linear algebra and calculus, and I’m hoping this project will be the perfect way to solidify that theoretical knowledge and see it in action.

Think of it as… cooking a complex dish from first try. You can read recipes all day, but until you actually start chopping onions, mixing spices, and feeling the dough in your hands, you don’t really understand how it all comes together. That’s how I feel about LLMs right now. Time to get cooking!

Project Shutdown
March 10, 2025

Today, I made the tough call to put this project on hold: Burnout call 😞. My days have been packed with daily tasks and responsibilities, leaving little room for the mental energy this project demands. I realized I was pushing myself too hard, and it started to take a toll. For now, I’m stepping back to focus on my studies and well-being. This isn’t the end—just a pause to recharge and come back stronger when the time is right.

So, over the next (who knows how long!), I’ll be documenting this first personal journey here.

Expect posts about:

  • Transformer architecture deep dives: Breaking down each component and trying to implement it myself.
  • The maths!: Explaining the key mathematical concepts as I encounter them, hopefully in a way that’s actually understandable (even to myself!).
  • Training challenges : Seeing the estimates for training GPT-2, and knowing my trusty ThinkPad laptop doesn’t even have a dedicated GPU… things are definitely going to be challenging! I’m anticipating a lot of research and optimizations to find the best and cheapest way (if not free 😃) to get the job done. But hey, that’s part of the fun, learning through the pain!
  • And hopefully, MAMBA!: Once I feel like I have a good handle on GPT-2, I’m really keen to explore MAMBA and see how does it compare to the initial model.
Research Update
April 04, 2025

Quick update: I’ve been exploring Google Colab for training and it’s been a game-changer! With their free T4 GPUs, I can at least experiment with smaller models. For anyone following along who’s also on a budget, I highly recommend this approach. Next post will dive into my Colab setup and optimization tricks.

Platform Discovery
April 15, 2025

I’ve just discovered RunPod and it completely blows Colab out of the water! While Colab was a good starting point, RunPod offers so much more flexibility with pay-as-you-go GPU access, more powerful hardware options, and longer runtimes without disconnects. I’ve been experimenting with their A100 instances and the training speed difference is incredible. The interface is intuitive too - you can create custom templates and save environments. Definitely switching my workflow over to RunPod for all future experiments!

This is all very new to me – blogging, sharing projects publicly, the whole serial experiments. So, bear with me as I figure things out. If you have any tips, suggestions, or just want to follow along, I’d be thrilled to have you! Consider this my little corner of the internet where I’ll be nerding out about technology and mathematics.

Let’s see where this adventure takes us!

This post is licensed under CC BY 4.0 by the author.