There is no excuse for a digital creative person to not use some sort of version control or source control. In the past disk space was too dear, version control systems were too expensive and software was not powerful enough; this is no longer the case. Unless your work is worthless both back it up and version control it. We will demonstrate a minimal set of version control commands that will one day save your bacon.First: acquire a good modern decentralized version control system. For this writeup we will use Git (there are other good choices, but we are only going to demonstrate one).
Most Git tutorials talk endlessly about branching, patches and a bunch of other crap that should not be considered until you have been using source control for quite some time. You may need somebody to help you install and configure Git. We will also assume you know how to run an interactive shell in on your computer (on Linux and OSX you tend to use “bash” as your shell, on Windows you can install Cygwin)
To get most of the benefit of Git you need to only become familiar with five commands:
- git init .
- git add -A .
- git commit
- git status
- git log
Now any time you start a new digital creative endeavor (writing, coding, digital imagery, data science and so-on) do the following:
- Start the project in a new directory. And place any work in either this directory or sub-directories.
- Once and only once move your interactive shell into this directory and type “git init .”. It is okay if you have already started producing work and there are already important files present.
You can check if you have already performed the init step by typing “git status”. If the init has not been done you will get a message similar to: “fatal: Not a git repository (or any of the parent directories): .git.” If the init has been done you will get a status message telling you something like “on branch master” and listing facts about many files.
The init step sets up in your directory a single hidden file tree called “.git” and prepares you to keep extra copies of every file in your directory (including sub-directories). Keeping all of these extra copies is called “versioning” and what is meant by “version control.” You can now start working on your project, save everything related to your work in this directory or some sub-directory of this directory.
Again, you only need to init a project once. And do not worry about accidentally running “git init .” a second time: it is harmless.
As often as practical enter the following two commands into an interactive shell in your project directory:
git add -A git commit
The second command should bring up an editor where you enter a comment as to what you are up to. Until you are a “git expert” allow yourself easy comments like: “update”, “going to lunch”, “just added a paragraph” or “corrected spelling.” Run these pair of commands after every minor accomplishment on your project. Run these commands every time you leave your project (to go to lunch, to go home or to work on another project). Do not fret if you forget to do this, just run the commands next time you remember. The “add” command schedules files and edits to be added to your history and the “commit” command completes the action. This split into two stages has some advantages, but for now just consider the two commands as always going together.
Any time you want to know about your work progress type either “git status” to see if there are any edits you can put through the add/commit cycle or “git log” to see the history of your work (from the viewpoint of the add/commit cycles).
For example here is the “git status” from my experimental logistic regression project:
$ git status # On branch master nothing to commit (working directory clean)
And the “git log” from the same project:
commit 4edc2ab58df5b806a4aac201541711676b288ef1 Author: John Mount Date: Sun Jul 29 15:17:08 2012 -0700 confirm balance is varying (they have the usual all outcomes introduce a dep commit 75ee8ff5dac447f2171c6afb2a947ae7b2bf5b99 Author: John Mount Date: Sun Jul 29 15:05:04 2012 -0700 try to clean up synthetic variable code commit ba9c49ff37f083114450ec0b585728948847e108 Author: John Mount Date: Fri Jul 27 15:33:08 2012 -0700 fix up logging a bit
The indented lines are what text I entered at git commit step, the dates are tracked automatically.
And that is it. That is all you need to know, until something goes wrong. But if you both follow the above procedure and also back-up regularly you have actually prepared for something going wrong.
When something does go wrong (i.e you delete a file you should not have, you wish you hadn’t made an edit you did, you want to know where you were in the project last Thursday, you need to share files with a new collaborator) you can at that time get some help or read a manual and be assured Git is already ready to solve your problem (as long as you have been issuing the add/commit commands often enough). This is our point: you don’t need to know how to recover a file until you need to recover a file; but you do need to have the file ready to be recovered (hence running the add/commit pair very very often). When you need last Tuesday’s file back do the following: slow down, take a breath, budget an hour for learning how to recover the file and get help or read a longer Git tutorial.
Until you need to undo a change or compare two revisions of your project you do not need to know any more about source control than the following: Git keeps a complete copy of all of your files at each time you successfully enter the pair of add/commit lines (Git usually requires a non-empty comment to consider a commit successful). So if you add/commit often enough Git is already ready to help you with any of the following tasks:
- Tracking your work over time.
- Recovering a deleted file.
- Comparing two past versions of a file.
- Finding when you added a specific bit of text.
- Recovering a whole file or a bit of text from the past (undo an edit).
- Sharing files with collaborators.
- Publicly sharing your project (ala Github).
- Maintaining different versions (branches) of your work.
- Exclude large or sensitive files from Git tracking.
Each of the above tasks is a special situation, a special need and requires special knowledge to perform. However you don’t need that knowledge until you want to perform the task. If you have been performing enough add/commit cycles Git has already prepared for these tasks (and many more). It is just a matter of finding help at this point. If you have not prepared with add/commit cycles none of these tasks are possible (so you might as well prepare, just in case).
The point is the habit of putting projects under version control and performing many add/commit cycles has huge value. In many cases it can recover a lost file for you (keep in mind: for many situations only backups can recover files; version control and backups are complimentary not competitive). You don’t need to know how to recover a file until you have the need- but you must prepare (start the add/commit pattern) before you need to recover a file. You should not wait to “learn Git” to start using Git. That is why I intentionally left so much out of this minimal guide to version control. This is all you need to know until you have an additional problem or need.
Categories: Exciting Techniques Public Service Article Tutorials
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
How to share projects with collaborators (without the danger of losing things).
1) git init (as above) and work in the above add/commit pattern.
2) tell your collaborator to run “git clone ssh://USERNAME@MACHINE/PATHTOWORK” on their machine in a directory of their choice.
3) they can now use “git pull ssh://USERNAME@MACHINE/PATHTOWORK” on their machine to get your committed changes. And you can pull back from them.
For deep collaboration use “git clone –bare” (brings over only the repository info, not the usable copies of files). This allows you to push to the repository from your own. Or set up at https://github.com to have a hosted repository (public for free, private for a fee).
At any time you can add a symbolic name for the remote repository and automatic tracking of how far ahead your local copy is:
1) Once (at any time) type: “git remote add origin ssh://USERNAME@MACHINE/PATHTOWORK”
2) After commits you now can push with the command: “git push -u origin master” and git status now tells you if you have a push to do.
Update: in using such workflow you are going to have to type “git fetch” at random times to get rid of stupid “Your branch is ahead of ‘origin/master’ by k commits.” messages that push and pull do not clear. My understanding of it is not complete, roughly it is the “git pull” is not equivalent to “git fetch; get merge” when there are arguments present and something needs to be resolved (so the naked fetch may do something different or might no be backed out in some cases). The issue comes up again and again: http://stackoverflow.com/questions/2432579/git-your-branch-is-ahead-by-x-commits http://stackoverflow.com/questions/277077/why-is-git-telling-me-your-branch-is-ahead-of-origin-master-by-11-commits-a
To make this easier I have added the following aliases to by .profile (OSX/bash environment):
# add some git convience aliases
alias gitstatus=’git status’
alias gitaddall=’git add -A .’
alias gitcommit=’git commit’
#alias gitpull=’git pull –rebase’
alias gitpull=’git fetch origin; git merge -m pull master origin/master’
alias gitpush=’git push -u origin master’
alias gitlog=’git log –name-status –graph’
How to recover a deleted file:
1) git init (as above) and work in the above add/commit pattern.
2) notice you are missing a file by typing “git status”.
3) git the latest committed copy of the file: “git checkout FILENAME” (FILENAME shown in status).
How to check history:
1) git init (as above) and work in the above add/commit pattern.
2) read up on “git diff” and “git log” (use web-search or any of the listed above tutorials or documents).
How to find out who wrote what and when:
1) git init (as above) and work in the above add/commit pattern.
2) type “git blame FILENAME”. Each line of the file is printed prepended with version, author and date. Equivalent function is “git annotate FILENAME”, but blame is the better word. Example output:
How to see every change in a file:
“git log -p FILENAME”
And a nice example of using branches (not the first thing you should do, but eventually you will want to):
http://annejsimmons.com/2012/11/14/git-for-beginners-a-sample-workflow/