Working on any software project, especially within the context of an organisation that manages big software projects, will almost certainly expose you to the concept of a Version Control System (VCS). Knowledge of how to use a VCS is essential in any software developer role, as it’s the only real way to effectively build software collaboratively. Once you understand the full power of Git (the VCS discussed here) in code collaboration within teams, and versioning to prevent catastrophe, you’ll wonder why you haven’t been using it all along if you’re only starting out with it now.
Git is an open-source platform for version control. It’s a smart modern way of collaborating on and developing software. These days there are two main proponents using the Git VC system, namely GitHub (by the company of the same name) and Bitbucket, by Atlassian.
What makes Git different to other version control software out there, and what makes a VCS preferable to use in the first place, aside from potentially saving you hours of work doing regression bugfixes? Let’s examine the reasons.
Snap Happy
When you’re working on code, each time you implement a new feature, fix a bug, or write modified code, you’d want those changes to be recorded somewhere. Those changes should also be a running stream, that each change is cumulatively saved over the previous one. This is what Git does, but with a difference. Instead of saving only the base version of each file plus a set of changes, each time you make a commit, Git takes a ‘snapshot’ of the file that’s changed, plus all the links and dependencies to other files, so it knows how the whole “filesystem” fits together. You might think this wastes data, but Git cleverly links to the previous identical stored file, if its contents haven’t changed. Doing things this way comes in very handy when considering how branching and the like works (discussed in the terms section.)
Remember that Git is the platform, GitHub is the service, and there are then any number of GUI clients that will do the Git-ing for you, including Visual Studio, Atlassian’s excellent SourceTree utility, and also SyntEvo’s great equivalent for Linux called SmartGit. To use Git, you can either choose to go hardcore and use console commands, or manage your projects using any of these Windows/Linux clients instead.
Data-Conscious
Us South Africans struggle with data usage problems; those of us that have uncapped ADSL generally laud it over the people having to make do with capped packages or, worse still, mobile 3G dongles. That’s why you’d think that a version control system nannying over your code each time it’s modified is sounding incredibly data-hungry. In the case of Git, however, you’ll have a database of your project’s entire history stored on your local computer. What’s more, new changes you make are appended into this changelog locally too. This means you can compare your current code to a historical copy, and make changes to your code as you go, without having an internet connection at all.
As soon as you do connect to the internet, all your local changes are synchronised to your repository in the cloud. It’s much faster this way, and doesn’t bind you to constantly needing a connection, so coding with version control when Eskom turns the lights out, or you’re somewhere in the Drakensberg without signal, is a definite possibility!
Git-speak
Before we examine the workflow in Git, it’s important to understand some key terms used when considering Git and version control:
- Repository, or repo: This is the collection of files that defines a project. It has the code files, any of their dependencies, and Git-specific files to help it keep track of what’s what and what’s where. There are two types of repos, one offline, stored on each user’s computer, and a (mostly) identical version of it linked online. The whole point of the Git client is to keep these separate offline instances (of each collaborator to a project repo) in sync and playing nice with the one online repo.
- Push/Pull: Pull is the action that updates your local repo with changes made by others in the online repo, and Push is the opposite of it. To avoid confusion, Git generally mandates pulling before pushing, to make sure that if more than one person has made changes to the same file, things don’t get overwritten.
- Sync: Some Git GUI clients use this option, the “smart” Pull/Push.
- Commit: The action of “applying” changes to code in a project. Usually accompanied by a statement you’d write, identifying what these changes have achieved (i.e. “Fixed bug where program would hang” or “Implemented saving feature”). The commit makes the snapshot of the changed files and the rest of the project, so you can roll back to historical ones should you mess up later.
- Branch: The part of Git for fiddling. Say you’d like to take your current codebase, and design new features, or fiddle with the code, or work on different features of a project from other developers in your team, without affecting the current codebase by committing things that could potentially break it down the line. That’s what a branch is for: a parallel development “line” where you can make your own commits on the side without affecting the so-called master release. On most software projects, the rule of thumb is that any code in master should always be working.
- Merge: When a branch’s code is reintegrated into the master branch (or another branch is reintegrated into the first branch). This usually happens once feature development has been tested. It’s also for merging code with others, and working on different features with others on different branches. See Pull Request below.
- Pull Request: When the programmer of the code in the branch wishes to integrate their code into the main master branch of the project, a pull request is opened. This allows other collaborators to check the code, discuss any issues and make sure it’s ready before the merge to the main branch is done. This kind of thing often happens if you’re helping to add features to big open-source projects; you’d open a pull request and the main developers would check your code to make sure it was OK before it was merged into the main branch.
- Fork: Creates a new repo with as basis, the repo you choose to fork. This allows you to use existing code as a base and develop on top of that. Any development on top of a forked repo is not considered for merging back into the original project. On a large scale, forking is often used with Linux distros, where new development is continued as a forked distro from an older, unmaintained distro or codebase.
- Release: For most of its life, a Git repo consists of raw source code all over the place. Making a release is the way of “freezing” the code at a certain point and packaging it into a nice ZIP file for others to download. This release ZIP doesn’t get updated and is stored with info on what the release contains. Application version releases work this way.
Three Stages of Git
Let’s explain the concept of states in Git, to help make sense of all the jargon. This defines how things get done when you’re working with Version Control in the background.
Let’s examine a scenario:
- You’re wanting to start work on an existing project, but one that you haven’t got on your local device yet. You checkout the project from the remote server repository.
- The code is downloaded, into your working directory ready for you to start working.
- You make code edits to the project files yourself, and now you’re ready to integrate them back into the project that everyone else can see. You stage the files that have changed into the staging area.
- A commit action, well, commits, the changes you’ve made into the project, to the repository directory.
- Up to this point, this could all have been offline. You can sync to make the changes you’ve stepped through “stick” to the code in the remote repository.
Git-ting Creative
This is a very basic introduction to the workings of a version control system, and knowing this will allow you to keep your work organized, and save potential breakdowns if you end up breaking something. Git is much more powerful than this, enough to fill a large book, but this primer should help get you started.
Comment with your views on this article in the comments section below, and follow Hyperion Hub developments in the future if you’d like to see more articles like these for the South African market.