Jump to navigation

Optimizing for git - iameven.com

- Even Alander

I've run into a huge git repo trap a couple of times where I just end up deleting my git repo to initialize a new one just to avoid getting into rebase.

Git works (in my simplistic view) by taking all the files in a directory storing them all in blob of some sort and store differences between changes in new blobs. This is awesome, all the versions of all the code is kept for every commit, creating a story of all the changes. This also means deleted code is still in git as the changes are stored, and the code is still there in an earlier blob. Rebase allows changing this history but it is sort of complex.

I use git for publishing this page, and everything that goes on line was kept in this repository. Since i build the page using a static site generator there is also some duplication happening. Git does a fairly good job of compressing and as long as the content is just text, I can generally ignore this. Without thinking about this too much I added all my media files; images, music and videos.

I noticed it took a while to publish, and found some commands to see how large the repository was.

$ du -hs .git
289M    .git
$ git gc
Counting objects: 3151, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2084/2084), done.
Writing objects: 100% (3151/3151), done.
Total 3151 (delta 1379), reused 626 (delta 323)

The network transfer was never too bad since git only uploads the difference (adding a video could take some time), but all the files get copied into a docker images which did take some time. The whole folder was essentially more than the double of 289MB, because the media folder was both in the src, and publish directory. This was hugely wasteful and I'm a sucker for optimization.

aws S3

amazon web services S3 (Simple Storage Service) is a really popular service for storing content for the web. Good uptime, great speeds and pretty cheap. I've looked into it before but found it a bit complicated to get it set up. Usually I gave up when I started struggling with access rights and such and not being sure if it's my code that is wrong or my settings at aws.

There is, I found, a grunt tool for s3: grunt-aws-s3.

// grunt.initConfig ({ ...
aws_s3: {
options: {
    accessKeyId: '<%= aws.AWSAccessKeyId %>',
    secretAccessKey: '<%= aws.AWSSecretKey %>',
    region: 'eu-central-1',
    uploadConcurrency: 5,
    downloadConcurrency: 5,
    bucket: 'iameven',
    differential: true
up: {
    files: [
        { expand: true, cwd: './media/', src: ['**'] }
down: {
    files: [
        { cwd: './media/', dest: '/', action: 'download' }

I get my keys from a separate json file, which I don't include in git to keep my storage safe (I do need to keep the file safe and duplicate though, and since my repo is private i could just use git I think, it is not recommended though). Region is Frankfurt, written in amazon zone speak, I think mainland Germany beats island Ireland, but I'm not sure. upload and download concurrencies are simultaneous operations, It's fast enough at 5, not sure if I need it though. Bucket is my storage location which is just a string, included in the URL. The real magic here happens with the differential flag, which makes sure only new and changed files are either up or downloaded, saving me bandwidth.

I've created two task, up when I've added something and down for when I need to sync up if working at a different computer. Sort of like how I do an npm and bower install to avoid having all those files in my repository.

End results

$ du -hs .git
928K    .git
$ git gc
Counting objects: 284, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (162/162), done.
Writing objects: 100% (284/284), done.
Total 284 (delta 74), reused 271 (delta 70)

As I said, I just deleted my whole .git folder to avoid doing the rebase work required, loosing all my history in the progress, but I do think it was worth it. Removing the duplicate media folder reduced my .git folder from 281MB to 928KB, and objects from 3151 to 284. Network transfers are now negligible, and build times on the server heavily reduced.

  1. iameven.com
  2. : Valid HTML
  3. : 2017
  4. : Nostalgia driven web design
  5. : Lynx
  6. : The ssstraight story
  7. : Tetris
  8. : Snake
  9. : Post install
  10. : Re-building this website
  11. : Post stats 2014
  12. : Optimizing for git
  13. : Digging through old files
  14. : Arduino Uno
  15. : Building this website
  16. : Getting myself a logo
  17. : Warm Echo
  18. : Recent events
  19. : WTF, Spotify?
  20. : Numusic and Nuart
  21. : U don't simply uplay
  22. : Playing with time lapse
  23. : WZ
  24. : Fine Day
  25. : Restoring iameven
  26. : Ableton push
  27. : Ghost - Light weight Wordpress
  28. : Everything on the net should be dated
  29. : Euro Offshore container
  30. : Niels Juels street
  31. : Kverneland Næringspark
  32. : Hiking
  33. : Fume Tests
  34. : More Lexi pictures, in Sandveparken
  35. : Trustbuddy
  36. : Lexi visiting the beach
  37. : A Bit Weird
  38. : Weekend trip to Sirdal
  39. : Old sketches
  40. : HÃ¥vard's post stats
  41. : Rant about airports
  42. : The Good Old Days
  43. : Beetle
  44. : 206
  45. : Phone sketches
  46. : Noroff 3DDA
  47. : Panorama pictures, Sola and Stavanger