Copy And Learn, Don't Paste

Last year I wrote about what to learn if you already know something about programming (On Fundamentals). But what do you do if you are just starting? How can you tell what good code looks like if there is no mentor around to help you? I recently reflected on my own path (I self-taught programming as a teenager), and here’s a piece of advice that I don’t hear very often.

The idea is embarrassingly simple - whatever you try to learn, be it a programming language, a framework, a library - pick an existing open source project that uses it, and try to rewrite it. Open two editors side by side, one empty and one showing the reference code, and go ahead. Copy it, rewriting line by line. On the surface, this sounds boring, but I assure you the benefits are worth it.

The standard advice when learning programming is to think of a small project to write from scratch. That is perfectly fine, but if this is your first endeavour into programming, you will very likely underestimate how long something will take, and thus lose focus. Not finishing small projects can be demoralising and eventually suck out enthusiasm and motivation (“This is too hard for me.”) Even if you do - there is no good reference point to judge whether the code is of good quality.

Now consider an existing project, written by more established engineers, maybe open-source. Immediately you get the guardrails. Whatever you (re)write will have a clearly defined finish line and will be of a comparable quality. Sure, you may not pick up on details immediately but gradually, as you do this copying and later as you write your own code, things will fall into place. You will find yourself having “Aha, that’s why!” moments over and over again.

Pasting the code would be useless, but the slower rewrite process gives more time and space to ask better questions. Instead of “How do I write it well?”, which for a beginner may be hard to judge, you start with good code and ask, “Why is it written the way it is?” Go on to documentation, search opinion pieces, maybe even politely ask the author. It will help you better internalise why the code works the way it does.

Everyone in their programming career stumbled upon some terrible code and thought. “What is this piece of spaghetti?” or “Who in their right mind has written this?” Starting programming by copying and understanding other people’s code teaches appreciation and humility. If you are a beginner reading an established project and the code does not make sense, the chances are higher that there is something for you to learn.

This becomes very useful later when joining an established project. Old code is often written off as a haunted graveyard with sweeping statements such as, “Oh, that legacy system is a mess, better not touch it.” Copying existing code teaches us how to navigate “other people’s code.” Being able to quickly grasp the structure and navigate through someone else’s codebase is truly a beginner’s superpower.

Copying other people’s code can teach you to connect theoretical concepts to their practical application. Object-oriented programming, functional programming, typed code vs untyped code - all sound good in theory. Applying them without fully understanding the practicalities will not teach much. However, through rewriting the code you get a chance to see them all interacting. And while you may not get the theory at the start, you’ll understand the practical application.

The original code is someone else’s, but the one you rewrite is yours. You completely own the creative licence. If you feel like renaming a variable, class or function, go ahead. This simple deviation will teach you how to trace that part throughout the code, because you will want to keep the naming consistent. Those skills are practical and valuable in real world situations.

With time, your confidence in your own skills will grow. You might think, “I know how to do this part better.” Excellent! Go ahead and do it! One of two things will happen: you will improve the code, or stumble into why the original code was written the way it was. Both are valuable lessons.

There is another way to raise the learning bar. Consider, if you were to start building that reference project from scratch, what would you leave out or keep in the first version? Your learning will be twofold. First, you will learn to strip the codebase to the bare essentials. Second, you will learn how to evolve the code so that with each incremental step it continues to work and is better than the previous version. One of the common denominators in the industry nowadays is “small incremental changes”, as opposed to “large, big bang, once a month changes.” How to be flexible with this change size while still keeping the code working is another superpower that can be learned early.

Finally, having a working reference project guarantees the end result is functional. You do not have to understand everything all at once. You can pick and choose, reimplement components or a small part of the system, and copy the rest as is. You will still learn.

My Personal Path

I got into programming wanting to be a game developer, and back in the late 1990s, there were not many books and enthusiast forums. There weren’t many opportunities to find a mentor for a teenager in Eastern Europe. I spent a lot of my learning time rewriting any code I could get my hands on (open source wasn’t as proliferated as it is now) and talking with other enthusiastic people.

Fond memories: I rewrote something called ‘The Nebula Device’, adding class prefixes. I also got my hands on the Half-Life 2 leaked codebase. It was my first insight into the structure of an actual commercial codebase. Each of these were followed by my own attempts at writing something similar, mimicking aspects of all the systems I copied to learn.

On occasion, when I tried to improve the code, I would soon run into a problem and understand why it wasn’t written the way I thought was better. I also did not understand the reference code fully; for example, I couldn’t grasp the difference between heap and stack memory, or how pointers worked in C/C++. I learned those concepts years later in a computer architecture course at university. University enabled me to connect the theoretical parts to all my practice until that point.

The aspects I understood I could command and use for my own purposes. The aspects I did not understand - I copied. I assumed that if people smarter than me wrote code a certain way, there must have been a good reason. Eventually, I understood most of those reasons and the trade-offs behind them.

One of the most frequent pieces of positive feedback I received early in my career was how quickly I could find my way around a new codebase. I still rely on the skills I learned from copying today. These include:

  • the ability to break down changes
  • the ability to build a working system in incremental steps (I wrote on this topic here and here)
  • lack of fear of legacy systems
  • a way to know how to ask questions and find answers in documentation
  • how to adjust my coding style to whatever codebase I am working on

This experience also inspired my love of Chesterton’s fence principle. Put simply, “don’t ever take a fence down until you know the reason why it was put up.” And partially why “boring technology” is so appealing - understand existing systems first, before building new ones.

Final Thoughts

The internet of the 1990s and early 2000s was much more innocent. There was no GitHub, and there was less entitlement; more collaboration and implicitly positive intent (at least in the places I used to frequent). The reason I say this is not nostalgic but more as a word of caution.

If you were to plainly copy a project in public and advertise it as your own without stating your intent, you might get bullied on Twitter or GitHub. At least that’s how I see it if I were to do something similar now. I would keep the code private and just for myself. Maybe this is primarily why no one talks about this way of learning.

This concept works in a commercial setting as well. If you are overwhelmed by legacy systems at work, trying to rewrite one line by line might give you a much deeper understanding of how it works and how to mend it to your will.

Good luck copying and learning!

Recent articles

What Can 75,000 Pull Requests Tell?

I recreated Google engineering research about code reviews in a company with 10 years of pull request data, and defined some new interesting metrics.