In case you are not familiar, Amazon Web Services’ Virtual Private Cloud feature is a way to logically group the related resources of a particular application or organization into an isolated network zone – virtually of course. The VPC feature gives the complete flexibility to define the entirety of of the virtual network posture from how the components can interact with other AWS availability zones or regions and also restrict inbound and outbound connections.
A rather typical scenario here would be to place an Aurora MySQL database in the virtual private cloud and define inbound rules to that database to restrict access to devices that have signed into an organization’s own Virtual Private Network (VPN). The database would otherwise not be publicly accessible from the web despite living on Amazon’s cloud infrastructure.
VPCs are a powerful feature no doubt and its inbound and outbound rule restrictions can significantly enhance the security posture of an application or an organization having resources on the Amazon cloud. Well…that’s the good spiel of VPCs but me personally – hate them.
I have a personal preference for simplicity – which is why I like to design applications to be strictly serverless. Frankly, you can get a lot of the isolation benefits in a purely serverless model without the hassles of defining a Virtual Private Cloud with its NAT gateways and Elastic Network Interfaces and Security Groups and all that stuff that if it sounds daunting to you – well because it is.
But as with many things in the world – sometimes you have to work with what you have and organizations in particular are loath to radically change things quickly and existing production applications tend to remain untouched – this is all part of the deal in Software Engineering – so as much as I might dislike VPCs – I do work a lot with them and if you are here to learn how to have Lambdas work with them like I had to learn, hopefully this will be a good enough reference for you.
Why the Lambdas need to be in a VPC to access resources in them?
The over simplified explanation of how a Lambda works is when an invocation to a Lambda is triggered from some external source – say an API Gateway endpoint call – AWS fires up the Lambda from its pre packaged definition of it – by doing so it is basically assigning some virtual space somewhere in the AWS region where the Lambda is on some of its vast hardware empire and running it. Now if your Lambda is a standard non VPC Lambda – the Lambda will be spun up on any available network device in the AWS region – which is a great thing from a scalability point of view making Lambdas so powerful.
This though runs into a network firewall when you Lambda attempts to reach a resource inside a VPC – like our stereotypical use case – an attempt to access an Aurora Database that is part of the VPC. It will simply be refused access to it because if the various security implications involved and the way the VPC itself is designed. You can define inbound rules say defining the range of virtual ips allowed to your database, but unless it is completely open (which defeats the purpose of putting it inside a VPC) it will not work because the Lambda could get spun up anywhere.
So the Lambda in this case really needs to be defined as part of the Virtual Private Cloud where the resources it needs to have access to are defined.
Process of defining a Lambda inside a VPC
Right – so just before we get into the how – this post is not going to cover how to create a Virtual Private Cloud – something I will touch on in a future article as that is a rather involved effort that will distract from the specific heading of this post.
Assuming you already have a VPC setup in your account – there will be few things now available in your account…and also your region.
1. The name of the VPC
2. The Subnets for the VPC
3. The Security Groups for the VPC
The subnets define the range of IPs where AWS resources can be launched within the VPC – and this includes launching of Lambdas within the VPCs. The security group defines the rules of access for resources in the VPC and there is usually a default security group in addition to other ones defined. For example, there could be a security group that allows access to your Aurora Database from the IP range – the subnet of the VPC.
So if your Lambda needs access to your database , it must be configured so:
1. It is in the VPC where the database is
2. It has the subnets where it can be spun up
3. There is an associated security group that will grant it access to that database that it can use
Line up all these 3 configuration settings and then your Lambda will have access to the VPC database.
How to configure a Lambda to be a VPC via the AWS console
This is not my preferred option – in my experience manually doing stuff via the console gets as old as the second deployment – but there are always exceptions to the rule – and I too use this option when I need to configure legacy Lambdas that were not deployed as a part of a Cloudformation stack.
See the next section for how to define it in a Cloudformation yaml template.
For the manual configuration – the way to do this is to navigate to the Lambda’s configuration tab
And then select the VPC, subnets and security groups as described the section just prior to this one. You will then see the inbound and outbound rules that will apply to your lambda based on the security group you select.
Defining a VPC Lambda via Cloudformation
For a more scalable solution – prefer infrastructure as code – you can deploy updates seamlessly and also recreate your stack in other regions. All you need to ensure is that you place Cloudformation parameter placeholders that can be replaced at deployment time with your region specific VPC values by using the “–parameter-overrides” option during the cloudformation deployment to set the values.
Here is how it is defined in a Cloudformation Template.
AWSTemplateFormatVersion: '2010-09-09' Transform: AWS::Serverless-2016-10-31 Description: VPC Lambda Configuration Demo Parameters: VPCSecurityGroup: Type: String VPCSubnet1: Type: String VPCSubnet2: Type: String Resources: VPCLambda: Type: AWS::Serverless::Function Properties: Runtime: python3.9 CodeUri: ./vpc-lambda/ Handler: vpc-lambda.vpc_handler Description: Lambda with VPC access FunctionName: vpc-lambda VpcConfig: SecurityGroupIds: - Ref: VPCSecurityGroup SubnetIds: - Ref: VPCSubnet1 - Ref: VPCSubnet2
In the Cloudformation definition, there is no need to specify the VPC name as that is implied by the subnets and appropriately set by Cloudformation when it is deploying the Lambda.
The actual Lambda code itself needs no other special handling besides whatever else may be needed to actually access the resource in the VPC.
Just for completeness and sticking with the example use case of accessing a database – define the database connection as you would normally with all the parameters to whatever library you are using. In Python for example – the pymysql package can be used inside the Lambda.
db_connection = pymysql.connect(host=os.environ["HOST"], user=os.environ["USER"], password=os.environ["PASSWORD"], database='example', charset='utf8mb4', cursorclass=pymysql.cursors.DictCursor, autocommit=True)
Downsides to VPC Lambdas?
This a moot discussion as you kind of need to use VPC Lambdas to use stuff in your VPC with Lambdas, so you are accepting the downsides in favor of the possible pros or necessities of having a VPC in the first place. But still good to just be aware of some of the negatives.
For one, I typically find updates to VPC Lambdas to be significantly slower than updating regular Lambdas. In a related note which I haven’t actually researched or reviewed to confirm – I assume the cold start probably likewise takes longer for VPC Lambdas because AWS is restricted to the allowed subnets when spinning it up which I presume will take more time than spinning it up in the first and most easily available spot in the region.
Of more relevant note is the fact that certain AWS services will not be able to access your Lambda if it is inside a VPC without significant extra handling. For instance Cognito Triggers don’t seem to be work easily. I typically don’t try to fix this problem directly – and prefer to use a proxy standard Lambda which AWS services can invoke just fine – to which I can route the invocation to the VPC Lambda to. Extra hop, but depending on your use case this is probably acceptable for most cases in exchange for the ease of setup.