Operational problemsĪn overarching theme of the bad stuff in our old setup is manual maintenance. If you made a post, you couldn't search for it until the day after that, which wasn't so great. But before the way we were doing that was we just had like a daily ETL that would index all the data into search. If you post on Reddit it should be indexable into search almost immediately. Since then, we're now at well over 100 engineers, a lot of whom are trying to use the data infra and they're in this weird position where we all share AWS resources and that leads to a lot of contention, as well as they want to make site features based on real time data like search indexing. There are only a handful of us doing that. Even fewer of those doing like DevOps or infrastructure type work. To give you some background on the organizational context behind why this was necessary, when I joined Reddit it was about 25 engineers. The problem statement was, what if we moved our primary data infrastructure to another account in us-east-1? That way, we get a little bit of isolation, we get the data infrastructure on its own, and it's now accessible to everybody else who's running in us-east-1 as well. Doing so, made a really high barrier for entry and meant that it was really hard for other teams to actually utilize our data. The other problem is that data transfer is really, really expensive and if any of you have tried to transfer data, even between AZs in AWS, let alone across AWS regions, you've probably run into this issue before. It was less so for App developers who were working off the Reddit infra in us-east. ![]() That turned out to be pretty useful for them to have the compute infrastructure near them. And if you asked me why that is, I actually don't know because a lot of this happened before I joined the company.īut the crucial problems that we ran into with that, were that the data infra was sort of oriented around data scientists, most of whom are based here in San Francisco. Specifically, the data infra was is in West and the Reddit infra was in East. To give you some background on what our data infrastructure looked like before, essentially we had all Reddit services in a single AWS account, but in different regions and the data infra was in one region while all of the rest of Reddit's infra was in another region. What are the lasting benefits that we've gained by managing our infrastructure using Terraform? How did Terraform help us execute this migration? What is this migration and why did we do it? Today's talk is mostly about three things. You guys are all very active and we appreciate that. Right now, we have over 330 million monthly active users, 12 million posts per month, and 2 billion votes per month. Since so many of you use Reddit, you probably have an idea of what the scale is like, but here's some numbers depicting exactly what the scale is actually like. I know DevOps and sysadmin are very popular on Reddit and I'm sure many of you look at those as well. ![]() But for the small number of you who don't, Reddit is a network of communities where individuals can find experiences and communities built around their interests and passions. My name's Krishnan and today we're going to be talking about how Reddit migrated its entire data infrastructure between two different AWS regions and how Terraform made that process easier.īefore we get started, a small show of hands: how many of you use Reddit? It's like almost everybody. Krishnan Chandra Senior software engineer, Reddit
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |