While re-designing our backend and moving to better architecture, we got to the point of thinking of our database and moving it from EC2 instance to some fully-managed databases, going Serverless all the way.
The fact that our backend will be fully deployed on Serverless architecture, we have to think about which database technology to use.
Our main backend will be built on AWS. So let’s investigate the available options on AWS.
AWS offers different types of databases including Relational, Key-value, In-memory, Document, Graph, Time series, and Ledger. Pick the one that fits your needs and start coding, you don’t have to manage or maintain these databases. what a nice life!
|Database Type||Use cases||AWS Service|
|Relational||Traditional applications, ERP, CRM, e-commerce||Aurora RDS Redshift|
|Key-Value||High-traffic web apps, e-commerce systems, gaming applications||DynamoDB|
|In-memory||Caching, session management, gaming leaderboards, geospatial applications||ElastiCache for Memcached ElastiCache for Redis|
|Document||Content management, catalogs, user profiles||DocumentDB|
|Graph||Fraud detection, social networking, recommendation engines||Neptune|
|Time series||IoT applications, DevOps, industrial telemetry||Timestream|
|Ledger||Systems of record, supply chain, registrations, banking transactions||QLDB|
In this article we are going to dive into presistent storage databases that are used for normal uses: Relational and Key-Value databases. And to be more precise Serverless ones and investigate if it’s applicable to use non-serverless database.
In order to talk about Amazon Aurora Serverless, we have to talk about Amazon Aurora. It is a MySQL and PostgreSQL-compatible relational database built for the cloud, that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.
Now we can say that Aurora Serverless is an on-demand, auto-scaling configuration for Amazon Aurora that automatically starts, scales, and shuts down database capacity with per-second billing for applications with infrequent, intermittent, or unpredictable usage patterns. Aurora Serverless offers database capacity without the need to provision, scale, and manage any servers. Aurora Serverless brings the power of a MySQL-compatible database built for the cloud to applications with unpredictable or intermittent usage patterns.
- Relational Database System – RDS.
- It’s built on top of the most known RDS:
- MySQL version 5.6 compatibility.
- PostgreSQL version 10.7 compatibility: which supports JSONB type, as you know, this type supports Indexes.
- No cooldown period for scaling up.
- Immediate Consistency: you always get the up-to-date data when querying.
- Data API: it’s simply RESTful API
- Max table size varys:
- for PostreSQL: 32TiB
- for MySQL: 64TiB
- Can’t load or Save data from and to S3
- Can’t invoke Lambda
- Doesn’t support restoring a snapshot from a MySQL DB instance.
- Doesn’t support Amazon RDS Performance Insights: you can’t monitor your database using this amazing RDS feature.
- Doesn’t support Backtrack: you are not able to move quickly to a prior point in time without needing to restore data from a backup.
- Huge Cold start: from users experience, some say it takes 10+ seconds to connect when it’s paused, others say about 30 seconds. So to solve this problem at least 1 ACU must be running 24/7 which leads to huge cost.
- It scales to zero capacity when there are no connections for a 5-minute period. (can be changed)
- And it don’t offer Free Tier.
Database capacity is measured in Aurora Capacity Units (ACUs) where 1 ACU has approximately 2 GB of memory with corresponding CPU and networking, similar to what is used in Aurora user-provisioned instances.
A good thing is it’s billed per second, so you only pay when it’s on and running. However, unluckily no free tier for Aurora.
||Database Capacity||Storage||I/O Rate|
|MySQL||$0.06 per ACU Hour|| $0.121 per GB-month||$0.242 per 1 million requests|
|PostgreSQL||$0.06 per ACU Hour|| $0.121 per GB-month||$0.242 per 1 million requests|
It also comes in Provisioned mode and the price is dependent on the instance type, and here are some of them:
Price Per Hour
|MySql||db.t3.small||$0.048||$0.121 per GB-month||$0.242 per 1 million requests|
|MySql||db.t3.medium||$0.097||$0.121 per GB-month||$0.242 per 1 million requests|
|MySql||db.r5.large||$0.352||$0.121 per GB-month||$0.242 per 1 million requests|
|PostgreSQL||db.t3.small||Not supported||$0.121 per GB-month||$0.242 per 1 million requests|
|PostgreSQL||db.t3.medium||$0.097||$0.121 per GB-month||$0.242 per 1 million requests|
|PostgreSQL||db.r5.large||$0.352||$0.121 per GB-month||$0.242 per 1 million requests|
Instance Type summary:
|Model||Core Count||vCPU||Mem (GiB)|
|db.r5.large (memory optimized)||1||2||16|
All instances have the following specs:
- 2.5 GHz Intel Scalable Processor
- Intel AVX, Intel AVX2, Intel Turbo
- EBS Optimized
- Enhanced Networking
For more info about instance types, visit this page.
Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It’s also a fully managed, multiregion, multimaster, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.
- Highly scalable.
- NoSQL database.
- Very low latency if designed the right way.
- Delivers single-digit millisecond performance at any scale.
- Predictable performance and cost.
- Trigger on CRUD operations.
- RESTful HTTP API: one of the marjor things needed in Serverless world as it’s hard to connect using Connection Pool.
- No limit for Data Storage: as your wallet can have.
- Difficult to support complex queries: you have to design your tables carefully in order to support the queries you need.
- Maximum local secondary indexes: 5
- Global secondary indexes per table: 20 (can be increased)
- Max attribute size: 400 KB
- Scan: 1MB per call
- Query: 1MB per call
- Hard Sorting: it always requires sorting key, otherwise you can’t implement sorting in DynamoDB.
- Bad Filtering: As for filtering, you need a hash key so you can filter quickly, else it will take much time using Scan.
- Poor Pagination: DynamoDB doesn’t support the known pagination using page and limit as in RDS. it returns data up to a maximum of some pre-set size and it also returns a key, you can get the next chunk of data using this key.
- Doesn’t support Ad-hoc query.
- Key values: that are used in hashing, must be sufficiently random to guarantee balanced access across the dataset.
- RESTful HTTP API only: there is no other way to connect.
- No Foreign keys: so no joins.
- Indexes cant be changed: After creating primary indexes for your table, you can’t change them.
- Costs more when using Global Secondary Key
- No regex when querying: you can’t query using Regex, nor query using Like for hash keys.
- Eventual Consistency: Immediate Consistency can be enabled for read operations.
DynamoDB charges for reading, writing, and storing data in your DynamoDB tables, along with any optional features you choose to enable. DynamoDB has two capacity modes and those come with specific billing options for processing reads and writes on your tables: on-demand and provisioned.
|25 WCUs||25 RCUs||25 GB|
|Price||$0.00065 per WCU||$0.00013 per RCU||$0.25 per GB|
|On-Demand|| Write requests units ||Read requests units|
|–||2.5 million read requests||25 GB|
|Price||$1.25 per million||$0.25 per million||$0.25 per GB|
If you want to know if DynamoDB is suitable for your use case, read this article, here is a summary:
Before deciding to use DynamoDB, you should be able to answer “Yes” to most of the following evaluation questions:
- Can you organize your data in hierarchies or an aggregate structure in one or two tables?
- Is data protection important?
- Are traditional backups impractical or cost-prohibitive because of table update rate or overall data size?
- Does your database workload vary significantly by time of day or is it driven by a high growth rate or high-traffic events?
- Does your application or service consistently require response time in the single milliseconds, regardless of loading and without tuning effort?
- Do you need to provide services in a scalable, replicated, or global configuration?
- Does your application need to store data in the high-terabyte size range?
- Are you willing to invest in a short but possibly steep NoSQL learning curve for your developers?
Some unsuitable workloads for DynamoDB include:
- Services that require ad hoc query access. Though it’s possible to use external relational frameworks to implement entity relationships across DynamoDB tables, these are generally cumbersome.
- Online analytical processing (OLAP)/data warehouse implementations. These types of applications generally require distribution and the joining of fact and dimension tables that inherently provide a normalized (relational) view of your data.
SQL in AWS Lambda:
When it comes to use pure MySQL or PostgreSQL in AWS Lambda, We have to think twice. Because, and as you know, SQL accepts persistence connections whereas AWS Lambda doesn’t use persistence connections nor connections pools as it’s stateless therefore we are going to face two major problems:
- Creating new connection every invocation means spending too much time i.e. increases latency and affects performance.
- Reaching max_connections in SQL: SQL allows number of connections known as max_connections, and because Lambda creates new connection every request, this will lead to reaching the maximum connections.
Unluckily, according to AWS Forum, this feature isn’t implemented. And there are no new updates for 3 years or more now.
This is where Aurora Serverless shines with its Data API where you use HTTP connections instead of presistent connections. “The Data API doesn’t require a persistent connection to the DB cluster. Instead, it provides a secure HTTP endpoint and integration with AWS SDKs”, AWS Aurora Data API doc stated.
On the other hand, Aurora Data API is slow compared to DynamoDB.
Luckily, some guys developed a Data API dialect in order to get AWS Aurora working with SQLAlchemy which is one of the best ORM for SQL in Python.
And on yet another hand, you can get SQL to run with Serverless, AWS Lambda, by following these two simple points:
- Let the connection pool be global in AWS Lambda to allow Lambda to reuse the connection when possible.
- Set the pool size to one: which is not practical but it works ?♂️
Others mentioned that they got it working by using MySQL Proxy which is “a simple program that sits between your client and MySQL server(s) and that can monitor, analyze or transform their communication. Its flexibility allows for a wide variety of uses, including load balancing, failover, query analysis, query filtering and modification, and many more”.
Mixing Technologies – Aurora and DynamoDB:
By using different db technologies for different purposes, you can get the best of both words. Maybe use DyanmoDb to store unstructured data and link that into a SQL table? Yes, sure. You can always mix SQL and NoSQL if you need to.
On the other hand, you have to pay for both, and deal with the drawbacks. For example: the huge cold start of Aurora Serverless that you have to fix anyway.
Mixing technologies in serverless is possible. Especially if you use HTTP requests as in DynamoDB and Aurora Serverless. Simply, when you need to query, make an HTTP request and that’s it.
The Proposed Database Technology to Use in Almeta
To choose a database technology, we need to analyze our needs in almeta.io:
- Ad-hoc queries: filtering, sorting, and pagintion are the main features for us as we want to offer the ability to sort, filter, and paginate the news based on different metrics, and later joining multiple news together in order to get the related news and so on. This suggests using MySQL or PostgreSQL. On the other hand, implementing these features in DynamoDB may require many GSI.
- Rapidly changing schema: due to the evolving of our AI services, the need to change the schema arises. As a result, we need JSON type, which suggests DynamoDB or PostgreSQL.
- Scalablility: Aurora Serverless has a rapid scalability, so is DynamoDB.
- Cost efficiency: Aurora Serverless is amazing but it’s pretty expensive due to the huge cold start as a result we have to keep at least 1 ACU on 24/7 which costs about $0.06 x 24 hours x 30 days = $43.2. On the other hand DynamoDB is cheaper and is included in AWS free tier.
According to the aforementioned points and despite the cost, AWS Aurora Serverless with PostgreSQL is what we need in order to support different types of queries including joins, sorting, filtering, and pagination. On the other hand, dealing with our unstructured JSON objects pushes us to use a mix of structured and unstructured db technologies.
Amazon’s doing its best to provide useful Serverless services including Aurora and DynamoDB.
However, when it comes to production, you must choose the best database that fits your needs. DynamoDB is one of the best NoSQL database, so simple to learn, and use. Aurora on the other hand is built on top of the leading RDS out there: MySQL and PostgreSQL. However, in production, you may keep an eye open on Aurora Cold Start especially if your website doesn’t have high traffic as it may lead to huge downtime.
- Auorara Serverless – the good, the bad, and the scalable
- Aurora – Limits
- Aurora Serverless – Limits
- How Aurora Serverless work
- Aurora – Pricing
- RDS – Instance Types
- DynamoDB – Limit
- DyanmoDB – Provisioned Pricing/
- DynamoDB – On-demand Pricing
- DynamoDB vs Cloud Store
- MongoDB vs DynamoDB
- Lamda and RDS
- Aurora Serverless – Data API
- Best Practices for Serverless and Connection pooling
- AWS Lamda with RDS Connection Pooling
- Manage RDS connections from AWS Lambda