http://hades.name/blog/tag/distributed/Hades Blag blog posts with tag intersection distributed2010-03-03T20:41:48ZEdwarddjango-atompubhttp://hades.name/blog/2010/01/17/git-your-friend-not-foe/Git Is Your Friend not a Foe Vol. 1: Distributed2010-03-03T20:41:48Z2010-01-17T18:24:05Z<p>Recently, I&#8217;ve been preaching <a href="http://git-scm.com">Git</a> to everyone that use the inferior version control software (like <span class="caps">SVN</span> or, pardon me, <span class="caps">CVS</span>). But somewhy the main obstacle I see in these people is that they are <em>so</em> used to <span class="caps">SVN</span> workflow that they don&#8217;t see the magnificence and flexibility Git offers. They mostly are able to read <a href="http://whygitisbetterthanx.com/">http://whygitisbetterthanx.com/</a> and acknowledge the fact that more and more projects have been switching over to&nbsp;it.</p> <p>But still, many of them don&#8217;t grasp the benefits Git gives, falling back to classic centralized edit–commit-to-server workflow of <span class="caps">SVN</span> and whining that &#8220;this stupid Git didn&#8217;t commit changes in that file; this stupid Git complains about &#8216;non fast-forward&#8217;; this stupid Git ate my kittens; etc.&#8221;. I would like to clear something out and introduce them to a better&nbsp;world.</p> <p>First of all, Git is a <em>distributed</em> version control system. What does that mean? In classic <span class="caps">VCS</span> you have a single holy place called The Repository, where all the project&#8217;s history is kept. Developers get only the small fraction of information from it: the actual files from the latest revision (termed the &#8220;working copy&#8221;, which is obviously an exaggeration). Basically, the <strong>only</strong> thing <span class="caps">SVN</span> client is able to do is compare your files with the latest revision and send this diff to the server. In <span class="caps">SVN</span> communications are possible only between The Repository and the puny client with the working&nbsp;copy.</p> <p><img src="http://hades.name/media/git/svn.png" alt="SVN" /></p> <p>In contrast, Git does not differentiate His Holiness The Repository from mere mortal working copies. Everyone gets a repository of his own. Everyone can do anything they want with it. <strong>Each developer can communicate with any other developer</strong>. This gives a developer so much freedom, that he often does not get into it, and just simply asks&nbsp;this:</p> <p>Uhm, an entire development history? With every working copy? Man, that will eat a lot of disk space! And I even can&#8217;t imagine how long it will take to checkout that&nbsp;repository!</p> <p>Well, first, not <em>checkout</em>, but <em>clone</em>. The checkout in Git is a somewhat different operation, and that is a Git club entry fee: you need to lose your centralized <span class="caps">VCS</span> habits and get used to new terms and ways. This can be painful at first, but it pays off at the end. You&#8217;ll thank me&nbsp;later.</p> <p>So, back to the repository size. Yes, Git requires you to have the whole repository on your person. Yes, it does increase your project directory size. But Git is extremely efficient in packing stuff, so that increase should not hurt you. In fact, the whole Git repository (with full project history) <a href="http://blog.emptyway.com/2008/03/31/using-git-for-rubyjruby-development/">is known to take less space than an <span class="caps">SVN</span> checkout</a>. And <span class="caps">SVN</span>&#8217;s checkout process is <em>so</em> inefficient, that for most projects Git clone takes less time than <span class="caps">SVN</span>&nbsp;checkout.</p> <p>Okay, now the next question is: what is so cool about having the whole repository along with project files? Well, the most basic advantage is that <strong>a developer can do everything without access to the server</strong>,&nbsp;i.e.:</p> <ul> <li>view the revision log starting from the very first&nbsp;commits;</li> <li>browse old versions of the&nbsp;project;</li> <li>and more importantly, <strong>commit his changes</strong>.</li> </ul> <p>It is a nice feature being able to browse the history without Internet access for people with slow link, or for people that travel a lot. But being able to commit things without asking anyone&#8217;s permission is <em>so</em> important that it&#8217;s worth a separate paragraph. Here it&nbsp;goes.</p> <p>Most software teams recognize the two simple principles that a developer should follow: keep commits atomic and don&#8217;t commit bad stuff. The problem is that centralized <span class="caps">VCS</span> make these principles incompatible. People just don&#8217;t work in a linear discrete fashion, instead they tend to steer between several things: a touch there, a refactor here, an occasional stupid bug fix. In the end you get a working tree with bunch of unrelated, uncommitted and untested changes. In Git you can commit as often as you want because <em>commits are local to your repository, no one sees them except yourself!</em> You can commit total rubbish and test everything later — you can edit every single commit without fear of embarrassment and humiliation. You can find out that the way you started to implement this killer feature everyone wanted is totally wrong and start from scratch — without spoiling the project version&nbsp;history.</p> <p>The second advantage is that developers can exchange their revisions with each other without the central server. Imagine John having reworked the main loop of nuclear reactor coolant control computer. He doesn&#8217;t want to incorporate this change to a live system, so he asks Fred to download the respective changes from his repository and test them on his nuclear plant in less populated area. After not having heard any loud explosions, John knows that at least one plant survived the&nbsp;change.</p> <p>You can also benefit from this even if you are the only developer. Imagine you have several different computers (for example Mac, Linux x86 and Linux amd64). You have developed something on your Mac box and tested it through and are ready to push this to the main repository. But you may also push it first to your Linux boxes and test it there. In <span class="caps">SVN</span> you would have to generate patch, transfer it to the boxes, and apply it. Everything manually. So you most probably wouldn&#8217;t bother at all and would discover that nasty bug that occurs only on 64-bit computers only in two month and lose your&nbsp;job.</p> <p><img src="http://hades.name/media/git/git.png" alt="Git" /></p> <p>Finally, the concept of &#8220;central repository&#8221; may be eliminated altogether. Every developer gets a &#8220;public&#8221; repository where he keeps the stuff he is not ashamed of and a private repository where he works as he wants. Or a bunch of private repositories. The developers exchange their work by pulling commits from each other&#8217;s public repositories. Or they can have a single lead developer, who collects the good commits, and use his repository as a &#8220;blessed&#8221; repository. The lead developer either watches for changes in other public repositories, or waits for a &#8220;merge request&#8221;. Merge request is a message (e-mail traditionally) that says something along the lines of &#8220;Hey, Sam, I&#8217;ve implemented the automatic road crosser for blind one-legged homosexuals, &#8216;git pull git://acmesoftware.com/~dave/shiny.git crosser&#8217;, love, Dave&#8221;. Sam copies-and-pastes the command and gets a new branch, tests it, and then pushes to his blessed public&nbsp;repository.</p> <p>For large projects (for example, <a href="http://kernel.org">Linux</a>) lead developer has several people responsible for specific subsystems (the so called Lieutenants). They collect the small commits from their fellow developers, test them and forward to Linus, who aggregates all the good stuff in his own repository. This ensures that the code is seen by at least one other person, before it gets stored in the repository and completely&nbsp;forgotten.</p> <p>The aforementioned site has a <a href="http://whygitisbetterthanx.com/#any-workflow">nice section about different Git workflows</a> (see under <strong>Any workflow</strong>) with&nbsp;pictures.</p> <p>Also, the nice side-effect of Git being a distributed system is that every repository is essentially a backup of the main repository. It doesn&#8217;t mean you should not do backups — you should! — it just means, that in case everything crashes and burns, any developer will provide you with full revision history, not only the recent project&nbsp;files.</p> <p>There are some more things that confuse novice users, especially branches and staging area. I shall cover them in following posts, stay&nbsp;tuned!</p> <p>Next&nbsp;posts:</p> <ul> <li><a href="http://hades.name/blog/2010/01/22/git-your-friend-not-foe-vol-2-branches/">Volume 2, on branches and&nbsp;merging</a></li> <li><a href="http://hades.name/blog/2010/01/28/git-your-friend-not-foe-vol-3-refs-and-index/">Volume 3, on refs and staging&nbsp;area</a></li> <li><a href="http://hades.name/blog/2010/03/03/git-your-friend-not-foe-vol-4-rebasing/">Volume 4, on cherry-picking and&nbsp;rebasing</a></li> </ul> <p><a href="http://hades.name/blog/tag/git/">All posts about&nbsp;Git</a></p>