Jump to navigation

iameven.com - Optimizing for git

- Even Alander

I've run into a huge git repo trap a couple of times where I just end up deleting my git repo to initialize a new one just to avoid getting into rebase.

Git works (in my simplistic view) by taking all the files in a directory storing them all in blob of some sort and store differences between changes in new blobs. This is awesome, all the versions of all the code is kept for every commit, creating a story of all the changes. This also means deleted code is still in git as the changes are stored, and the code is still there in an earlier blob. Rebase allows changing this history but it is sort of complex.

I use git for publishing this page, and everything that goes on line was kept in this repository. Since i build the page using a static site generator there is also some duplication happening. Git does a fairly good job of compressing and as long as the content is just text, I can generally ignore this. Without thinking about this too much I added all my media files; images, music and videos.

I noticed it took a while to publish, and found some commands to see how large the repository was.

$ du -hs .git
289M    .git
$ git gc
Counting objects: 3151, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2084/2084), done.
Writing objects: 100% (3151/3151), done.
Total 3151 (delta 1379), reused 626 (delta 323)

The network transfer was never too bad since git only uploads the difference (adding a video could take some time), but all the files get copied into a docker images which did take some time. The whole folder was essentially more than the double of 289MB, because the media folder was both in the src, and publish directory. This was hugely wasteful and I'm a sucker for optimization.

aws S3

amazon web services S3 (Simple Storage Service) is a really popular service for storing content for the web. Good uptime, great speeds and pretty cheap. I've looked into it before but found it a bit complicated to get it set up. Usually I gave up when I started struggling with access rights and such and not being sure if it's my code that is wrong or my settings at aws.

There is, I found, a grunt tool for s3: grunt-aws-s3.

// grunt.initConfig ({ ...
aws_s3: {
options: {
    accessKeyId: '<%= aws.AWSAccessKeyId %>',
    secretAccessKey: '<%= aws.AWSSecretKey %>',
    region: 'eu-central-1',
    uploadConcurrency: 5,
    downloadConcurrency: 5,
    bucket: 'iameven',
    differential: true
},
up: {
    files: [
        { expand: true, cwd: './media/', src: ['**'] }
    ]
},
down: {
    files: [
        { cwd: './media/', dest: '/', action: 'download' }
    ]
}

I get my keys from a separate json file, which I don't include in git to keep my storage safe (I do need to keep the file safe and duplicate though, and since my repo is private i could just use git I think, it is not recommended though). Region is Frankfurt, written in amazon zone speak, I think mainland Germany beats island Ireland, but I'm not sure. upload and download concurrencies are simultaneous operations, It's fast enough at 5, not sure if I need it though. Bucket is my storage location which is just a string, included in the URL. The real magic here happens with the differential flag, which makes sure only new and changed files are either up or downloaded, saving me bandwidth.

I've created two task, up when I've added something and down for when I need to sync up if working at a different computer. Sort of like how I do an npm and bower install to avoid having all those files in my repository.

End results

$ du -hs .git
928K    .git
$ git gc
Counting objects: 284, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (162/162), done.
Writing objects: 100% (284/284), done.
Total 284 (delta 74), reused 271 (delta 70)

As I said, I just deleted my whole .git folder to avoid doing the rebase work required, loosing all my history in the progress, but I do think it was worth it. Removing the duplicate media folder reduced my .git folder from 281MB to 928KB, and objects from 3151 to 284. Network transfers are now negligible, and build times on the server heavily reduced.

  1. iameven.com
  2. : Nostalgia driven web design
  3. : Lynx
  4. : The ssstraight story
  5. : Tetris
  6. : Snake
  7. : Post install
  8. : Re-building this website
  9. : Post stats 2014
  10. : Optimizing for git
  11. : Digging through old files
  12. : Arduino Uno
  13. : Building this website
  14. : Getting myself a logo
  15. : Warm Echo
  16. : Recent events
  17. : WTF, Spotify?
  18. : Numusic and Nuart
  19. : U don't simply uplay
  20. : Playing with time lapse
  21. : WZ
  22. : Fine Day
  23. : Restoring iameven
  24. : Ableton push
  25. : Ghost - Light weight Wordpress
  26. : Everything on the net should be dated
  27. : Euro Offshore container
  28. : Niels Juels street
  29. : Kverneland Næringspark
  30. : Hiking
  31. : Fume Tests
  32. : More Lexi pictures, in Sandveparken
  33. : Trustbuddy
  34. : Lexi visiting the beach
  35. : A Bit Weird
  36. : Weekend trip to Sirdal
  37. : Old sketches
  38. : HÃ¥vard's post stats
  39. : Rant about airports
  40. : The Good Old Days
  41. : Beetle
  42. : 206
  43. : Phone sketches
  44. : Noroff 3DDA
  45. : Panorama pictures, Sola and Stavanger