Azure Table Storage Tips for the RDBMS Developer
Suppose you only have relational database management system (RDBMS) experience and are new to Azure Table storage. It's common to find yourself "thinking in SQL" and trying to solve database modeling requirements with a SQL approach before translating them to a key-value mindset.
In this tutorial, I'll cover some fundamentals of Azure Table storage to get you more familiar with this type of storage service, including property types and table limits, with your RDBMS background in mind (but I welcome IT pros from all backgrounds to use this guide and learn something new).
Then, I'll dive into common questions you might find yourself asking about how to use Azure Table storage and provide some best practices with several query examples based on potential real-world scenarios you might experience on the job.
Where code samples or references are applicable in this blog post, I'll be focusing on .NET and using the Azure SDK (specifically relating to the Microsoft.Azure.Cosmos.Table NuGet package).
If you have a RDBMS background and are new to Azure Table Storage, it's common to find yourself "thinking in SQL" and trying to solve database modeling requirements with a SQL approach before then trying to translate that to a key-value mindset. In this blog post, I'll cover some of the fundamentals of Azure Table Storage and dive into some common questions you might find yourself asking about Azure Table Storage. Where code samples or references are applicable in this blog post, we'll be focusing on .NET and using the Azure SDK (specifically relating to the Microsoft.Azure.Cosmos.Table nuget package).
You can use the table of contents below to more easily jump to the specific area of the guide you're interested in:
Table of Contents
- Azure Table Storage Explained
- Azure Table Storage FAQ for the RDBMS Developer
- What kind of NoSQL store is Azure Table storage?
- How do I create custom indexes on an Azure Table Storage table?
- How do I optimize querying by different properties if I can't have secondary indexes?
- Azure Table storage query example: How to search for an employee by employee id or domain username using PartitionKey and RowKey
- Can I query by properties other than the PartitionKey and RowKey?
- How can I insert/update multiple entities in an atomic operation?
- How do I store entities with different schema in the same table?
- How can I perform a LIKE query for values starting with a given value?
- Azure Table storage query example: How to retrieve a distinct list of employees using Range Query
- How do I enforce unique constraints?
- How can I include special characters in the PartitionKey or RowKey?
- How do I perform case-insensitive queries?
- How can I perform an "ORDER BY Date DESC" query?
- Azure Table storage query example: How to look up orders in both ascending and descending order of OrderDate
- How do I limit the fields I pull back in a query (SELECT )?
- Azure Table storage vs. CosmosDB
- How to Start Using Azure Table Storage + Additional Resources Recommendation
Azure Table Storage Explained
First, let's cover some of the fundamental constructs and features of Azure Table storage, including the main limitations and differences versus what you might be used to as an RDBMS developer.
Azure Table Storage Properties and Size Limits
- An individual table can contain a collection of entities with different properties.
- A table can contain up to 255 properties.
- Each property can be up to 64KB in size.
- Each entity can be up to 1MB in size.
- You can have as many tables as you want, up to the storage capacity of an Azure Storage Account (500TB).
- Every entity has a PartitionKey, RowKey, and a Timestamp. The Timestamp property is maintained server-side as the time the entity was last modified and is used to provide optimistic concurrency, so you cannot directly modify it.
- PartitionKey and RowKey are both strings and can each be up to 1KB in size, and the combination of both must be unique.
Azure Table Storage Indexes
- Only the PartitionKey and RowKey properties are indexed.
- Secondary indexes on other properties cannot be created.
- These values are indexed in ascending order.
- Having a solid understanding of your query requirements is important, as there are strategies for dealing with the apparent limitation on indexes, which I'll cover in some common scenarios below.
Azure Table Storage Transactions
- Atomic updates can be performed as long as the entities are all within the same table partition (same PartitionKey).
- You cannot perform cross-table or cross-partition transactions.
- A maximum of 100 operations can be performed within the same atomic operation.
- The total payload of the batch must be no more than 4MB.
Azure Table Storage High Availability / Data Redundancy Comparison Chart
Use the comparison chart below to choose the level of replication you may need for Azure Table storage:
Azure Table Storage Pricing
- Azure Table storage is very cost-effective.
- You’re charged based on the amount of storage and the number of storage transactions made against the service (e.g., an individual operation against the storage API).
- For example, in December 2021, for 1000GB of storage and 100 million storage transactions/month, the approximate costs in U.S. regions are as follows:
- LRS redundancy: $45.04/month
- RA-GRS redundancy: $75.04/month
- RA-GZRS (where offered): $126.54/month
Azure Table Storage FAQ for the RDBMS Developer
What kind of NoSQL store is Azure Table storage?
Azure Table storage is a NoSQL key-value PaaS data store designed to be a great option for highly scalable, highly available systems. It supports storing petabytes of data and a flexible data schema, meaning different entities in the same table can have different schemas.
References to NoSQL databases having "flexible schema" or having a "schemaless design" can give the impression database schema design is a thing of the past, and you can bypass it to focus more on the application code. The reality is, even in this NoSQL data world, schema design is very important and if you don't give it due care and attention, it can come back to bite you.
How do I create custom indexes on an Azure Table Storage table?
The only index existing on a table is on the PartitionKey and RowKey properties. Secondary indexes are not supported.
How do I optimize querying by different properties if I can't have secondary indexes?
Storing duplicate copies of the data with different PartitionKey and RowKeys properties is the way to go. Storage is inexpensive. You can use the in-partition batch support to insert/modify the copies in an atomic manner. Just bear in mind the limitations on batches—e.g., 100 operations max, 4MB limit on total payload size—but for most scenarios, this is not a concern.
Azure Table storage query example: How to search for an employee by employee id or domain username using PartitionKey and RowKey
In this scenario, we have the following data in the Employee table:
PartitionKey | RowKey | FirstName | LastName | EmployeeId | DomainUsername |
Employee | Id_012345 | Joe | Bloggs | 012345 | jbloggs |
Employee | Uname_jbloggs | Joe | Bloggs | 012345 | jbloggs |
Here, we're storing two copies of an Employee entity—all the custom properties are the same (FirstName, LastName, etc.). However, to facilitate our query requirements, we’ll use a different RowKey for each.
A query by Employee Id would be an efficient Point Query (a query for a single entity identified by its PartitionKey and RowKey).
A query by domain username would also be an efficient Point Query. The above query is a shortened convenience wrapper for single-row lookups by PartitionKey and RowKey. However, you can also use ExecuteQuerySegmentedAsync (as shown below), which is what you'd use for queries returning multiple entities (complete example).
Can I query by properties other than the PartitionKey and RowKey?
Yes. But just like in an RDBMS, when you query on a field that is not indexed, it will perform a scan. If you're searching on a custom property within a specific partition, it will be a Partition Scan. If you're searching across all partitions, a Table Scan will be the most inefficient query to perform.
How can I insert/update multiple entities in an atomic operation?
Using the previous employee example, how do you go about creating both entities atomically when adding a new employee to the database? Well, as shown in the code below, you can perform multiple operations as part of a TableBatchOperation—just recall the limitations I mentioned in the fundamentals section: a maximum of 100 operations per batch and all entities must have the same PartitionKey and a maximum batch payload size of 4MB.
How do I store entities with different schema in the same table?
There's nothing special here—just create a new entity with different properties and insert it into the same table.
The following example demonstrates this for Order and OrderItem entities, storing the order header and order item entities in the same table.