[CloudAcadmy] Managing Cloud Networking at Scale – Chalk Talk with Aviatrix
Join cloud experts Neel Kumar and Mike McLaughin from Aviatrix for a technical chalk talk on how you can solve some of the common issues that can occur when running cloud networking at scale. This group of chalk talks and technical demonstrations provides a practical reference for how to solve complex cloud networking challenges. First, we outline the common architectures and issues faced when scaling cloud architectures, then we workshop a transitive architecture use case defining best practices and design patterns. We discuss multi-cloud implementation, provider limits, hub and spoke architecture patterns, VPN and connectivity. Next, we set up a transitive controller in the AWS console with two instructional demos.
- Recognize and explain the common issues that occur when running complex cloud networks
- Describe and implement transitive architecture designs using a hub and spoke model
- Implement and maintain VPC connectivity at scale
This course will suit anyone running or planning to run cloud services at scale.
an understanding of Cloud networking and the AWS Virtual Private Cloud will help you gain the most from this Chalk Talk.
We recommend completing the AWS Networking & Content Delivery learning path in order to gain practical knowledge and hands-on experience if you are not familiar with cloud networking and the virtual private cloud.
First, we outline the common architectures and issues faced when scaling cloud architectures, then we workshop a transitive architecture and design pattern. Next, we set up a transitive hub in the AWS console with a hands-on demo, and discuss the following:
- Cloud Networking – The Common Journey
- The Common Patterns with VPC Design
- Designing a Transitive VPC Architecture
- Managing Network Security at Scale
- DEMO – Setting up a Transitive Controller
- DEMO – Setting up a Transitive Hub
– Hi, Andy Larkin here from Cloud Academy with an advanced networking chalk talk. And today I’m joined by Mike McLaughlin and Neel Kamal.
– Hi everyone, I’m Mike.
– Hi everyone, I’m Neel Kamal. I’m with Aviatrix.
– Okay, let’s go solve some problems. So, Neel, you must see a lot of challenges in the networking space, like the virtual private cloud. You start off quite simply with something and then, you know, after a while it can get more complicated. So, can you give me an idea of what sort of things you see in the market when people start building cloud solutions?
– Yeah, sure. You know, cloud and adoption of cloud and networking to go with the cloud is a journey. It starts at a place, but it doesn’t stay always like that. And the practitioners, the cloud engineers, the cloud architects, the network engineers, they have to deal with different challenges at different points in time.
– Yep, yep.
– A simple way of observing that period of, let’s say, five-year adoption, could be the following. Let’s look at a VPC. In the beginning, when you are just dipping your toes with AWS or Azure or Google, your requirement will be, you have these VPC’s.
– It always starts so simply, doesn’t it?
– Yeah, yeah, it does.
– No one else–
– It’s as simple as that, right? You have my VPC’s, and I have a few developers. And somebody suggested that we give cloud a go, and we test it out. So we got three VPC’s, we have a few applications there. I need my developers to have access to these VPC’s. And these accesses are, what? These are SSL VPN.
– Which is the easy way to start.
– That’s the easy way to start. You might be doing a jump host or something like that, but that’s all you need. And this is the starting point. Three months into it, now you realize that, well, I still have my AD on-prem, which is here. And this is your on-prem. And I have my databases here. And whatever applications I have built over there, I need them to be connected to my on-prem. This is problem number two, let’s call it P2. And this was my P1, right? Now, P2 can be solved multiple ways. You would build an IPSEC tunnel to on-prem, you might order your direct connect. You will have couple of those, you might have one on-prem, a few on-prem. You know, by this time you are at six to eight months into the journey in the cloud, you’re happy, your developers are able to access your applications are running fine. That’s when, now the security team caught wind of, oh, you’re using cloud. Well, cloud was not secured to begin with. So let me look into it. And they put their set of lenses, and they realize you are egressing into the internet without any control.
– So easy to do, right? You just start off thinking that, with no security policy in place, everybody’s going out through different VPC’s. Quite to easy to make that mistake.
– Yes. And, you know, they will try to solve it. This is my way of drawing internet, maybe not the best, but… Why do these VPC’s need internet? Well, you know, you still have your Puppet Chef, and you might have Microsoft Office 365 or Salesforce, or maybe iCall. Or even patching, you might have your Linux servers that you download, need to download patches from. So you would need internet access. Now, unfederated access is a big no-no for security guys. So they will say, well, you need to put controls over here. And that becomes your problem number three. And by this time, you as a network architect or network engineer are into, what, one and a half years into the journey? And you solve P3. And you might have the third or fourth, too, introduced into the mix. And you are happy, right? You have your developers access, you have your on-prem access, everything is secure, and now you have taken control of the internet, and things start settling down. But not really. Because then, you realize you need to transfer it over to operations team. Because it’s one and a half years since you took over. You are an architect, so you were not supposed to be running this 24/7. But you go have a meeting with the operations team and you go explain to them, well, this is the environment. We would like you to take it over. And they say, how do I look at what is connected to what? And what is the latency, and what is the bandwidth?
– No one ever writes it down, do they? You know, you’re just running with scissors, basically.
– That’s right, and they will say, well, Neel, you know how to do it, but we don’t, so we can’t take it over. And you are thinking in your mind that I need to move on to the next project, and I need to take vacations, and I need other things than 24/7 operations. So problem number four is, how do you put in a set of operational metrics? Metrics, P4, in this environment. Now, this is not easy, because it’s not one thing. You have your monitoring tools, you have your alerting tools, which is going into your alert management system, and you have your troubleshooting tools, and they need to be integrated with the cloud. And the traditional on-prem did not work with the cloud, so you might introduce new setup, too.
– So you’re thinking CloudTrail, CloudWatch.
– [Neel] CloudTrail, CloudWatch. So, let’s say that takes another six to eight months. Now you are into two, two and half year journey into the cloud.
– And that could be much shorter, right? It doesn’t have to be two years. Sometimes it can happen really fast.
– That’s true too, that’s true too. Depending upon the urgency, it can happen. But by this time, you feel like you have got things under control. Your developers are happy, your on-prem stuff is connected, your internet is secure, your access is controlled. Your operations team is running over it, you’re not being bothered. And then you realize that this is no more three VPC. This thing, which started as three VPC, one, two, has now become 300. And now you are hitting provider limits. The route table limits in AWS, the security policy limits and things like that. And by the time you have to either design these to be isolated so you can solve it, or figure out another way to get over the provider limit, you’re calling AWS support and asking them to increase these limits again and again, and it’s never sufficient. And so that’s problem number five.
– Plus, you’ve just got a whole lot of complexity in there.
– You have a lot of complexity, your people are asking. So this is about scaling.
– Yeah, I think the limits are always there. You know, the hard constraint people get to. They only find out about those limits when they’ve reached them. So you can’t really design for that if you’re likely to only discover it at the wrong time. But just the depth of management that’s required if you do suddenly find yourself going from, even 10 to 30 VPC’s. That’s a lot of security groups, that’s a lot of network access controllers.
– You know, that’s a massive overview, right?
– Exactly. And then this is the time when you start thinking about, well, I can’t manage this using CLIX anymore. I need Terraform, CloudFormation. Because going from three to 300 is all about automation. So it’s about another set of tools to automate everything, and then you have to have them call API of each one of these other two so that they can work automatically. And then it’s also about solving the provider limits, like I was saying, route table limits and security policy. This takes you into, from two year to three year mark. Now you are into three year, your hair a little bit gray, you have become the veteran in the cloud. You’re known to be that, which is great, but yet the journey of the cloud is now three year old, and you have just settled things down to run a scale-based, a good scale-based AWS enrollment. And that’s when the next thing hits you. This is when the CIO or the CEO or executive management decide, well, we can’t put all eggs in the same basket and it can’t be just about AWS. We need to also do Azure and Google Cloud, which, multi-cloud. And then you start thinking, well, I just got done with this in three years. It’s gonna take me another three years with Azure, another three years with Google. And now the world starts looking impossible. And that’s the fifth, or sixth problem. To do it times three. Multi-cloud.
– I can see that being a massive problem, because, you know, it makes a lot of sense to have a multi-cloud strategy. You’re right, you’re just gonna have to replicate everything you’ve done here over again, right? So you’re creating a lot of work for yourself.
– That’s right. And that’s the journey of the cloud engineer for network and network security. And many parts of this is complex, and you have some provider tools, the AWS has a few things which will help you, you have a partner set of tools that will help you, but this is a journey that everybody is taking.
– Well, certainly as an architect, I would be thinking about how to get the solution as quickly as I could, based on what the business is asking for. So I’m not going to be too concerned about the complexity of the network. That, as you say, is going to end up in the lap of the network engineers, or the sysops team, or devops team. So I love the fact we’re getting into this, like because I think everyone needs to be thinking about how they solve this up front so that they don’t end up in this situation, where you’re at P5, and suddenly being asked to do the impossible, and you know you’ve just got a bunch of spaghetti in your networks.
Sales Page: https://cloudacademy.com/course/managing-networking-at-scale-aviatrix/aviatrix-lecture-one/