Jump to navigation

Optimizing for git - iameven.com

- Even Alander

I've run into a huge git repo trap a couple of times where I just end up deleting my git repo to initialize a new one just to avoid getting into rebase.

Git works (in my simplistic view) by taking all the files in a directory storing them all in blob of some sort and store differences between changes in new blobs. This is awesome, all the versions of all the code is kept for every commit, creating a story of all the changes. This also means deleted code is still in git as the changes are stored, and the code is still there in an earlier blob. Rebase allows changing this history but it is sort of complex.

I use git for publishing this page, and everything that goes on line was kept in this repository. Since i build the page using a static site generator there is also some duplication happening. Git does a fairly good job of compressing and as long as the content is just text, I can generally ignore this. Without thinking about this too much I added all my media files; images, music and videos.

I noticed it took a while to publish, and found some commands to see how large the repository was.

$ du -hs .git
289M    .git
$ git gc
Counting objects: 3151, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2084/2084), done.
Writing objects: 100% (3151/3151), done.
Total 3151 (delta 1379), reused 626 (delta 323)

The network transfer was never too bad since git only uploads the difference (adding a video could take some time), but all the files get copied into a docker images which did take some time. The whole folder was essentially more than the double of 289MB, because the media folder was both in the src, and publish directory. This was hugely wasteful and I'm a sucker for optimization.

aws S3

amazon web services S3 (Simple Storage Service) is a really popular service for storing content for the web. Good uptime, great speeds and pretty cheap. I've looked into it before but found it a bit complicated to get it set up. Usually I gave up when I started struggling with access rights and such and not being sure if it's my code that is wrong or my settings at aws.

There is, I found, a grunt tool for s3: grunt-aws-s3.

// grunt.initConfig ({ ...
aws_s3: {
options: {
    accessKeyId: '<%= aws.AWSAccessKeyId %>',
    secretAccessKey: '<%= aws.AWSSecretKey %>',
    region: 'eu-central-1',
    uploadConcurrency: 5,
    downloadConcurrency: 5,
    bucket: 'iameven',
    differential: true
},
up: {
    files: [
        { expand: true, cwd: './media/', src: ['**'] }
    ]
},
down: {
    files: [
        { cwd: './media/', dest: '/', action: 'download' }
    ]
}

I get my keys from a separate json file, which I don't include in git to keep my storage safe (I do need to keep the file safe and duplicate though, and since my repo is private i could just use git I think, it is not recommended though). Region is Frankfurt, written in amazon zone speak, I think mainland Germany beats island Ireland, but I'm not sure. upload and download concurrencies are simultaneous operations, It's fast enough at 5, not sure if I need it though. Bucket is my storage location which is just a string, included in the URL. The real magic here happens with the differential flag, which makes sure only new and changed files are either up or downloaded, saving me bandwidth.

I've created two task, up when I've added something and down for when I need to sync up if working at a different computer. Sort of like how I do an npm and bower install to avoid having all those files in my repository.

End results

$ du -hs .git
928K    .git
$ git gc
Counting objects: 284, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (162/162), done.
Writing objects: 100% (284/284), done.
Total 284 (delta 74), reused 271 (delta 70)

As I said, I just deleted my whole .git folder to avoid doing the rebase work required, loosing all my history in the progress, but I do think it was worth it. Removing the duplicate media folder reduced my .git folder from 281MB to 928KB, and objects from 3151 to 284. Network transfers are now negligible, and build times on the server heavily reduced.

  1. iameven.com
  2. : Fragment Identifiers
  3. : Valid HTML
  4. : 2017
  5. : Nostalgia driven web design
  6. : Lynx
  7. : The ssstraight story
  8. : Tetris
  9. : Snake
  10. : Post install
  11. : Re-building this website
  12. : Post stats 2014
  13. : Optimizing for git
  14. : Digging through old files
  15. : Arduino Uno
  16. : Building this website
  17. : Getting myself a logo
  18. : Warm Echo
  19. : Recent events
  20. : WTF, Spotify?
  21. : Numusic and Nuart
  22. : U don't simply uplay
  23. : Playing with time lapse
  24. : WZ
  25. : Fine Day
  26. : Restoring iameven
  27. : Ableton push
  28. : Ghost - Light weight Wordpress
  29. : Everything on the net should be dated
  30. : Euro Offshore container
  31. : Niels Juels street
  32. : Kverneland Næringspark
  33. : Hiking
  34. : Fume Tests
  35. : More Lexi pictures, in Sandveparken
  36. : Trustbuddy
  37. : Lexi visiting the beach
  38. : A Bit Weird
  39. : Weekend trip to Sirdal
  40. : Old sketches
  41. : HÃ¥vard's post stats
  42. : Rant about airports
  43. : The Good Old Days
  44. : Beetle
  45. : 206
  46. : Phone sketches
  47. : Noroff 3DDA
  48. : Panorama pictures, Sola and Stavanger
Logo