Rethink database design with Microsoft Azure

I am definitely not a database expert, so what I am writing here is only from my experience migrating all my applications to Microsoft Azure. If you think that I am doing something wrong or you know a better way, please feel free to leave a comment, so that I can learn more about living in the Cloud.

In this blog post I will write a little bit about what I have been using for a very long time before moving to Azure. Then I will write about my experience with SQL Azure and Azure Table Storage, specifically the problems that I have when migrating from SQL Azure (or MS SQL in general) to Table Storage. You can go directly to the last part if you would like.

Life before Azure

We (me and one of my friend) have subscribed for a hosting plan for around 80 USD a year. This plan covered unlimited shared hosting for websites and surprisingly unlimited MS SQL databases as well. Hence, we got used to store every structural data into SQL databases including telemetry and logs (which I soon realized not a good idea at all).

Inside the application, I always used Entity Framework Code First and Linq to access the data. This made my code look really nice, but adding layers of abstraction sometimes make bugs harder to be identified.

The shared host and shared SQL server did not have proper virtualization (basically multiple web applications on the same IIS and multiple database on the same SQL server with different credentials), so a bad bug on one application could consume all the resources for other applications. Besides that, the shared SQL server also looked quite scary: we could see all the databases of other users (although we cannot open those databases). This did not only slow down SQL Management Studio every time we opened the server to see the databases, but this also meant everyone else using the same SQL server could see all our databases.

Moving to Azure

So when Windows Azure (now Microsoft Azure) got the new portal with Azure Website and SQL Azure becoming much easier to access, we decided to move on.

In the first few months, I have been trying a lot to minimize the cost on the Azure Websites and other resources, but had overlooked SQL Azure, partly because of the danger of changing database codes and the more complex pricing scheme of SQL Azure. Then I learned more and more about NoSQL as it gets more popular, and I started to gain more interest in Azure Table Storage. At that time I realized that I have been wasting money every month for storing data unnecessarily in SQL Azure.

SQL Azure vs. Azure Table Storage

This will not be a thorough comparison between SQL and NoSQL because honestly I am not an expert in NoSQL yet. Also, I will not list here the general pros and cons between SQL and NoSQL databases (e.g. ACID vs. BASE). This is more about what I have learned when migrating some of my apps’ databases from SQL Azure to Azure Table Storage.

What Azure Table Storage can do better

  • Much cheaper. Storing GBs of data inside a Table Storage is still much cheaper than storing a hundred MBs in SQL Azure.
  • Flexible data structure. You can insert any object to a Table Storage and every property will become a column. No migration is needed when you add or remove properties to/from a TableEntity object (can be either a good or a bad thing).
  • Fast for appropriate query (single row or single partition query for example). Try it yourself. This depends on how your application query data from the database.
  • Distributed and scalable. You can store 150 GBs in a SQL Azure database, but you can store thousands of TBs in an Azure Table Storage.

What might be a problem using Azure Table Storage

There are 2 and always 2 indices: Partition key and Row key. You must be careful in your database design because you will not be able to create new index or change the index column afterward. These 2 keys are the only indices you have and will affect any database query in your application, so most of the considerations start from here.

  • Random column filtering is limited. You must always specify a partition key in your query; otherwise, a full table scan is triggered. It means you cannot simply do a cross partition query without forcing the database to go through each data row, which has terrible performance on large database.
    One possible way to overcome this if you need to have multiple different indices is to implement a separate index table (very much like mimicking what SQL servers do), and have a scheduled jobs to update these indices periodically.
    Another way (which I am using for one of my apps) is to implement a hybrid database structure: use one Azure SQL database to store all the indexable data (ID, date time, foreign key, object type, etc.) and use an Azure Table Storage to store all other columns. So all queries into the Table Storage is just single row lookup (and can be done asynchronously as well).
  • Sort order cannot be defined at runtime. If you are used to the ORDER BY command in SQL (or OrderBy in Linq), you should be aware that there is no such thing in Azure Table Storage. All the row are sorted by increasing lexicon order of the Partition key and then the Row key. If you want to sort by other columns, you have to do it in memory, which means you have to be aware of the query constraint above: always specify Partition key. However, there are tricks on designing the Row key, so that your rows will automatically in your desired order (e.g. decreasing order of inserting time http://blog.liamcavanagh.com/2011/11/how-to-sort-azure-table-store-results-chronologically/). Using a hybrid database structure as above is also another solution, where all the ordering is done in the SQL database.
  • Primary keys and Row keys are strings, but not any strings. You should be aware that there are certain restrictions in the characters of the keys. You should read about all the restrictions here (http://msdn.microsoft.com/en-us/library/dd179338.aspx) and design a proper DAL right from the start.
  • No Entity Framework support at the moment. You will have to change your DAL if you are using Entity Framework to access the databases. You may already know that EF7 will support Azure Table Storage, but I believe the architecture changes from EF6 to EF7 may also require proper migration in your existing DAL logic.

Conclusion

There are many benefits associating with Azure Table Storage (and NoSQL database in general). However, you should also be aware of the limitations comparing to other relational databases, and have a proper design, so as not to making a mess down the road.

And if you have any thing to add, or any better solution for my problems above, please feel free to leave a comment.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.