Jump to navigation

Optimizing for git - iameven.com

- Even Alander

I've run into a huge git repo trap a couple of times where I just end up deleting my git repo to initialize a new one just to avoid getting into rebase.

Git works (in my simplistic view) by taking all the files in a directory storing them all in blob of some sort and store differences between changes in new blobs. This is awesome, all the versions of all the code is kept for every commit, creating a story of all the changes. This also means deleted code is still in git as the changes are stored, and the code is still there in an earlier blob. Rebase allows changing this history but it is sort of complex.

I use git for publishing this page, and everything that goes on line was kept in this repository. Since i build the page using a static site generator there is also some duplication happening. Git does a fairly good job of compressing and as long as the content is just text, I can generally ignore this. Without thinking about this too much I added all my media files; images, music and videos.

I noticed it took a while to publish, and found some commands to see how large the repository was.

$ du -hs .git
289M    .git
$ git gc
Counting objects: 3151, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2084/2084), done.
Writing objects: 100% (3151/3151), done.
Total 3151 (delta 1379), reused 626 (delta 323)

The network transfer was never too bad since git only uploads the difference (adding a video could take some time), but all the files get copied into a docker images which did take some time. The whole folder was essentially more than the double of 289MB, because the media folder was both in the src, and publish directory. This was hugely wasteful and I'm a sucker for optimization.

aws S3

amazon web services S3 (Simple Storage Service) is a really popular service for storing content for the web. Good uptime, great speeds and pretty cheap. I've looked into it before but found it a bit complicated to get it set up. Usually I gave up when I started struggling with access rights and such and not being sure if it's my code that is wrong or my settings at aws.

There is, I found, a grunt tool for s3: grunt-aws-s3.

// grunt.initConfig ({ ...
aws_s3: {
options: {
    accessKeyId: '<%= aws.AWSAccessKeyId %>',
    secretAccessKey: '<%= aws.AWSSecretKey %>',
    region: 'eu-central-1',
    uploadConcurrency: 5,
    downloadConcurrency: 5,
    bucket: 'iameven',
    differential: true
},
up: {
    files: [
        { expand: true, cwd: './media/', src: ['**'] }
    ]
},
down: {
    files: [
        { cwd: './media/', dest: '/', action: 'download' }
    ]
}

I get my keys from a separate json file, which I don't include in git to keep my storage safe (I do need to keep the file safe and duplicate though, and since my repo is private i could just use git I think, it is not recommended though). Region is Frankfurt, written in amazon zone speak, I think mainland Germany beats island Ireland, but I'm not sure. upload and download concurrencies are simultaneous operations, It's fast enough at 5, not sure if I need it though. Bucket is my storage location which is just a string, included in the URL. The real magic here happens with the differential flag, which makes sure only new and changed files are either up or downloaded, saving me bandwidth.

I've created two task, up when I've added something and down for when I need to sync up if working at a different computer. Sort of like how I do an npm and bower install to avoid having all those files in my repository.

End results

$ du -hs .git
928K    .git
$ git gc
Counting objects: 284, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (162/162), done.
Writing objects: 100% (284/284), done.
Total 284 (delta 74), reused 271 (delta 70)

As I said, I just deleted my whole .git folder to avoid doing the rebase work required, loosing all my history in the progress, but I do think it was worth it. Removing the duplicate media folder reduced my .git folder from 281MB to 928KB, and objects from 3151 to 284. Network transfers are now negligible, and build times on the server heavily reduced.

  1. iameven.com
  2. : Guitar
  3. : Fragment Identifiers
  4. : Valid HTML
  5. : 2017
  6. : Nostalgia driven web design
  7. : Lynx
  8. : The ssstraight story
  9. : Tetris
  10. : Snake
  11. : Post install
  12. : Re-building this website
  13. : Post stats 2014
  14. : Optimizing for git
  15. : Digging through old files
  16. : Arduino Uno
  17. : Building this website
  18. : Getting myself a logo
  19. : Warm Echo
  20. : Recent events
  21. : WTF, Spotify?
  22. : Numusic and Nuart
  23. : U don't simply uplay
  24. : Playing with time lapse
  25. : WZ
  26. : Fine Day
  27. : Restoring iameven
  28. : Ableton push
  29. : Ghost - Light weight Wordpress
  30. : Everything on the net should be dated
  31. : Euro Offshore container
  32. : Niels Juels street
  33. : Kverneland Næringspark
  34. : Hiking
  35. : Fume Tests
  36. : More Lexi pictures, in Sandveparken
  37. : Trustbuddy
  38. : Lexi visiting the beach
  39. : A Bit Weird
  40. : Weekend trip to Sirdal
  41. : Old sketches
  42. : HÃ¥vard's post stats
  43. : Rant about airports
  44. : The Good Old Days
  45. : Beetle
  46. : 206
  47. : Phone sketches
  48. : Noroff 3DDA
  49. : Panorama pictures, Sola and Stavanger
Logo