Act I. Disaster Strikes
It was a regular Sunday afternoon. I'd spent the morning working on some open-source projects, before I decided to commit some time to my side project. I fired open a new tab and typed in the URI for our self-hosted GitLab instance ...
Damn! What the fuck :frowning:
Act II. Denial
Confused, I thought about the time and effort spent automating and safe-guarding this system. Six-months and not a single issue with GitLab before, how could this be?
- GitLab is running in a Docker container
- All files are stored on an AWS EFS share, replicated across multiple AZs
- The Docker container is torn down every night and created afresh, at 02:00
That's when it hit me. I knew it was risky when I set it up, but it'd been going so well!
Act III. Anger
Why didn't I just leave the Docker container running on the same image everyday? Why did I always have to pull "gitlab/gitlab-ee:latest" every night for the new fucking "shiny". They've fucked it, haven't they? They must have!
Act IV. Bargaining
OK. It's not their fault. Why was I pulling the latest every night? Silly.
After reading the logs, I'd decided that I can fix this. I've managed PostgreSQL before, I can get this running again.
First things first, though! I should take a backup ... I'm not making that mistake again :smirk:
It was copying at around 0.3Mib/s. Then it struck me: perhaps the data is fine, it's just EFS having some problems :relieved:
So I quickly scattered to the AWS Status Page
It's not that I don't trust you, Amazon; but we've been here before. I ditched their status page and ran to Twitter, the single-source of truth for Amazon outages. Sadly, nothing came up. I'm all alone!
Act V. Depression
Losing the will to code, I reach for my last resort and browse to CloudWatch :pensive:
I select every EFS metric, because I have no idea what any of them are ... ish :joy:
A few seconds later, a shiny graph is presented to me ... revealing only one thing:
What the hell are burst credits?
Act VI. Acceptance
I consider myself very proficient with Amazon Web Services, but this was my first foray into Elastic File System. Today, I learnt what burst credits are. Let me break it down for you:
When you create an Elastic File System, you are "gifted" 2TiB of "burst credits". These burst credits are used when you exceed your IOPS. Your IOPS are based on the amount of storage utilised. My GitLab instance is only hosting about 1.7GiB of data, which entitles me to a little over 50Kib/s on my EFS, so I've been bursting pretty much most of the time, utilising my gifted IOPS. Those have now expired and my EFS is completely unusable. It's going to take me about 12 hours to copy my data to an EBS volume to get my GitLab instance running again.
I'm currently copying my data and hopefully I can code tomorrow. It's not the end of the world, but after a quick search on Twitter ... it seems that not many people have had this issue, or even know burst credits are a thing, so I decided to write about it.
Act VII. Conclusion
GitLab, I'm sorry. I reacted in haste and it was all my own fault. You do great work and I love your software (Except the new floating menu). So thank you for making my developer life a little easier each day.
AWS ... although your EFS documentation does mention burst credits, I didn't find it all that simple to grok and, from what I can tell, your only "solution" to this problem is to:
Therefore, if your application needs to burst more (that is, if you find that your file system is running out of burst credits), you should increase the size of your file system.
There’s no provisioning with Amazon EFS, so to make your file system larger you need to add more data to it.
I don't really fancy writing 2TiB of random data to the EFS partition, perhaps you could allow me to purchase more burst credits?
Until next time 😫