How Source Code Control Works
If you’re new here, I’d recommend that you start here and read about why I’m sharing all of my learnings on building technology products.
You’ve got your team of developers ready to get going on the new project. One of the first things they ask you to do is set up an account on Github for the project. “Boss, we’re going to need at least 2 private repos, so make sure you get the right account.” You nod your head and say “Sure, I totally knew we needed that”, and as you walk away you think “What in the heck is Github, what is a repo, and why do we need any of this?”
Welcome to the world of source code control.
What is Source Code Control?
In one sentence, source code control manages who gets to write the code for your project, and tracks all of the changes to the code in an orderly manner.
Over the years there have been lots of different source code control platforms, from CVS to Subversion to Mercurial to Git, and lots of others in between. All of these platforms have at least one thing in common: They provide version control for your code. Some do it better than others, and we’ll focus on Git for most of this article as it’s far and away the most heavily used today, but here’s a simple example (don’t worry, well’ get into all of the terminology in detail later).
- Someone on your team creates the home page for your website and calls it index.html. They do what’s called a “check in” or “commit” and create version 1 of the file.
- Later that day, they make a change to the same file and do another commit / check-in, as well as creating a new file called contact.html. Index.html is now on version 2, contact.html is now on version 1, but it is associated with the other change to Index.html via what is called a change set.
- And so it goes. Each time a developer commits a new change, each file in that change is given a new version, and is tracked as part of a larger set of changes.
What’s actually happening under the covers is quite a bit more complex, but from the outside looking in, this is how the process works. Your developers make changes, then save those changes to the source code control system, and the source code control system manages making sure that all of the changes will work with each other.
Make sense? Even if it doesn’t right now, by the time you’re done reading this article and completing your assignment, it will be much more clear.
Let’s dig in.
Gitting Started
Today the most popular source control platform is Git. Git came on the scene in the mid-2000’s and was created by Linus Torvalds, the same guy who invented Linux. It gradually picked up steam and now it’s as close to ubiquitous as one gets in the software world. The two most popular Git hosting platforms are Github and Bitbucket. If you haven’t heard of Github, you no doubt will as you start building your product.
Whether you’re using Github or Bitbucket or even running your own Git server, everything works the same way.
Let’s do a quick overview of the terms you’re going to hear all of the time from your developers.
- Repository – A repository (often called a repo) is a fancy way to describe a big bucket or folder where all of the source code and other documentation for a project are stored. Repositories can be public or private. If they’re public, the world can see and download your code. Open-source projects are good examples of public projects. If they’re private, you control who can download your code. Typically you would create a repository for each part of your product. For example, if your product had both an iOS app and a website, you would create a separate repository for the iOS app and the website. There are a couple of reasons why you separate different kinds of code into different projects: (1) Most of the tools out there that your team will use expect a repository to be one discrete app and (2) You can manage which developers have access to what by separating projects along logical lines. One important thing to remember is that with Git each developer has a copy of the repository on their local machine in addition to the “master” copy that you maintain on Github or wherever you host your code.
- Commit – As developers work on different parts of the app, they arrive at logical stopping points along the way. For example, let’s say that a developer is fixing a reported bug. After they’ve made the changes required to fix the bug, they would “commit” their work to their local version of the repository. Each commit is made up of a logical grouping of changes that address something specific in your product. When a developer makes a commit, they add a comment about why they’re making the change. In our example, the developer might add a comment that says “Fixed bug with login page” or something like that. This allows any other developer working on the project to then see both who made the commit and why they did it.
- Pull – Throughout the day, developers will want to make sure they stay up to date with what everyone else has done to the source code. So, periodically, they do a “pull” from the project repository on Github. At a high level, Git checks the local repository and sees what’s missing from the main repository and brings those changes down to the developer’s local machine. It only brings down the changes since the last pull, not the whole project every time. During a pull, Git manages this remarkable process of merging all of the remote changes with the local changes. Git is amazingly good at figuring out how to resolve conflicts between a developer’s local version and the main repository.
- Push – Once a developer has made a commit or series of commits, and done a pull to ensure that they’re up to date with everyone else, they can then “push” all of their changes to the repository that everyone is using on Github. What’s really happening is that the developer is saying “Here’s all of my changes since my last push.” Once a developer has pushed their changes, other developers working on the project can then pull those changes and everyone stays in sync. This push / pull process happens many times per day across all of the developers on your team.
- Branch – Branching can be a complicated topic, but for our purposes, think of a branch as a working version of the entire application. In Git, most projects will have a branch called ‘master’, which is the “gold copy” of the project. The master branch is what you’re typically shipping to customers or deploying out to your servers for users to use. As developers are working on big new features (or even small ones), they will often create new branches to ensure that they don’t break something on the master branch while they’re figuring the whole problem out. Once the work on a branch is completed, it is then merged back into the master branch for inclusion in the shippable version of the product.
- Pull Request – A pull request (often called a PR) is a way for developers to submit a logical group of work that someone else can then review, comment on, and eventually merge into the project when it’s ready. Pull requests are vital when a team is distributed and working on lots of different things at the same time. It (at least in theory) helps to keep bad code from finding its way into your product. Pull requests are used extensively in open source projects as a way to manage the contributions of outsiders to the project.
- Merge – The process of combining multiple branches into a single branch. Merges are managed by source code control systems unless there is a conflict. A conflict simply means that two people changed the same lines of code, and the source code control system needs someone to manually decide which changes should persist
Putting It All Together
When your project gets rolling, here’s a simple flow for how your team will manage their work every day:
- Pull the latest changes from the Github repository
- Write some new code to create a feature or fix a bug
- Commit their changes to their machine’s local repository
- Pull changes from the Github repository again, and Merge those changes with their new code
- Push their commits to the Github repository
In real life, it probably would look a lot like this:
- Pull the latest changes from the Github repository
- Create a new Branch for the feature or bug they’re about to work on, e.g. “login-page”
- Work on the new feature and Commit changes to their machine’s local repository
- When the feature is complete, the developer Pushes their code to the repository create a Pull Request to Merge the ‘login-page’ branch with the master branch
- A second developer reviews the Pull Request and suggests any changes that should be considered. Once the Pull Request is agreed upon, the Pull Request is Merged into the master branch
When all of your developers are following the process similar to the one described above, a project takes on a life of it’s own. It is like a symphony when it’s done properly. Of course, getting your team to work at peak performance is not just about the work going into source code control systems, but having a well-defined methodology for how your team will work is a key ingredient in achieving maximum performance.
When Should I Run My Own Source Control Server?
Unless you have very specific requirements (such as government contracts or similar), you should never try to run your own source code control server. Use one of the hosting platforms we’ve talked about (Github, Bitbucket). These companies are focused maniacally on providing the best performance, backups and security for your project.
The other reason to use a hosted service is that they’ve typically built integrations to other services like Slack or Trello that make it easy to manage your entire workflow more easily. We’ll get into the details of how you manage the entire workflow of your project in future posts.
Wrapping Up
Just Remember:
- Proper source code management is a critical component of your product’s success. Choose the platform wisely. Most everyone on your team will know and understand Git, so it’s a safe and wise choice.
- You should have an account on your source code control platform and should understand, even at a base level, what’s happening every day. We’ll cover how to understand the activity on Github in an upcoming post.
- The larger your team, the more formal your source control processes will need to be. In the beginning, focus on speed and don’t create artificial barriers or unnecessary bloat.
- Unless you’ve got very good reasons, do not self-host your source control. Use reputable services like Github or Bitbucket, not only for security but for convenience and future integrations
Your Assignment
- Create an account on Github if you don’t already have one
- Create a new public repository called ‘test’
- Click on the README link where it says “We recommend that every repository include a README”
- This will open up a text editor where you can enter some text. Just type in “This is my first ever project on Git”
- Scroll to the bottom and type in a Commit message. Make it something fun like “I’m learning how to write code and this is my first commit.” Once you’re done, click the ‘Commit’ button
- Congratulations! You’ve just made your first commit to a Git repository. You can go make some additional changes to the README file right in Github and commit those changes, then see if you can figure out how to view the differences between the two changes
- Once your project is up and running, login regularly to Github and check out the changes happening in your project. Though you may not understand all of it, be curious and click around. Ask questions about what each part of the system does. The more you understand, the less mystery there will be.
Have more questions about this or other topics? I want to hear from you! Contact me for additional resources or help, and stay tuned for more posts on this topic.