AWS DynamoDB at Buzzvil

[Tech Blog] AWS DynamoDB at Buzzvil

738
SHARE

One of the main products of Buzzvil is Honeyscreen which is a lockscreen rewards app. Honeyscreen users can get points by participating in advertisement campaigns on our lockscreen or offerwall, or by simply unlocking the device. Points can be used to purchase goods from the store. As this so-called “points system” is the core of this product, it is very important that we save accurate data about how many points a user has deposited and spent.

Originally we used MySQL RDS to run the points system. However, as the scale of the service grew, the number of points-related requests to the database increased exponentially, which caused a very heavy load on the database. Since this kind of heavy load may cause troubles, we searched for an alternative that can handle this situation, and found DynamoDB – a NoSQL database provided by AWS. The biggest advantage of DynamoDB is that it is very easy to scale. So we are currently running the points system on DynamoDB to manage it in a more flexible way. In this article, I want to talk about how we designed the functions and structures of tables in DynamoDB to suit the points system’s needs.

There were two kinds of tables in the original MySQL database. The simplified version of their structures are as follow:

Amount table

This is the table where changes to a user’s points are recorded. In other words, this table has the history of a user’s deposits and withdrawals. So if a user click the “details” button in the app, the app sends a request to this table.

click the image to open in full size

Sum table

The sum of points a user has at this moment are saved in this table. So when the app shows the total points on the main page, it sends a request to this table.

click the image to open in full size

 

Amount table

click the image to open in full size

 

Sum table

click the image to open in full size

 

However, soon we realized that designing tables this way causes many problems. The most critical one is that DynamoDB does not support atomic transactions. So tables cannot guarantee their data’s mutual consistency. So there may be a situation when a data is inserted to “Amount” table but the data in “Sum” table is not updated because transactions failed in the middle. Because of this, it became clear that we need to integrate two tables into one. The first try is as follows:

Amount_sum table

click the image to open in full size

In contrast to two original tables, a single item has information about both the amount and the sum of a user’s points in this table. When some points are need to be deposited, the process is as follows: First, the client gets the most recent item of the user to get the current sum. In order to get the most recent item, the “limit” option is set to 1 and the ScanIndexForward is set to false. This way, the query gets a item by scanning the partition in reverse chronological order since items in a partition are in order according to their sort key values – date. Second, it calculates a new sum value by adding the amount that it is about to deposit. Last, it inserts a new item with new values into the table. In short, it first reads from the table and then inserts a new item. Since reading the table does not change data in table, the only operation that changes the table’s state is inserting a new item. When we had two tables, there were two such operations – insert (to “Amount” table) and update (to “Sum” table). However, now we have only one so the transaction is atomic.

But we still have some problems left. First is a problem of overwriting. Because the table’s partition key and sort key are user_id and date, if there are concurrent requests to one user, the item inserted by one request overwrites the other. This problem can be easily solved by using DynamoDB’s conditional write feature. If we set a condition on insert function to insert items only when there is no existing combination of the same user_id and date on the table, the problem of overwriting is prevented. If there is already a same combination on the table, we can make it retry after increasing “date” by 1 second (or even millisecond) until the operation succeeds. This may cause a minor error in data, but we can guarantee that all items are inserted and not overwritten.

But we still have some problems left. First is a problem of overwriting. Because the table’s partition key and sort key are user_id and date, if there are concurrent requests to one user, the item inserted by one request overwrites the other. This problem can be easily solved by using DynamoDB’s conditional write feature. If we set a condition on insert function to insert items only when there is no existing combination of the same user_id and date on the table, the problem of overwriting is prevented. If there is already a same combination on the table, we can make it retry after increasing “date” by 1 second (or even millisecond) until the operation succeeds. This may cause a minor error in data, but we can guarantee that all items are inserted and not overwritten.

The second problem concerns with consistency of data in the table. For example, let’s say there are two concurrent requests to deposit 100 points to a user. That user’s up-to-date “Sum” value is 1000. If we follow the process explained above, two requests read the sum value 1000 from the table almost at the same time. Then each request calculates a new sum value 1100 by adding 100, which is the “Amount” it is about to deposit, to 1000 it read from the table. After that, they inserts their items which contain a sum value of 1100. As result, although there were two requests each depositing 100 points, the user end up with the sum value increased only by 100. This problem occurs when requests are made concurrently in a parallel way, but not when they are sent serially. Therefore this is problem of isolating transactions. This can be solved by assigning “version” attribute to item as below.

Amount_sum_version table

click the image to open in full size

 

This table uses version, instead of date, as sort key. Version can be any type as long as it plays the same role, but in this example let’s say its type is Number and it starts from 0. Now let’s take the same example as above using this table. Two requests each depositing 100 points are sent at the same time. They both read from the table simultaneously, each getting a sum value of 1000 and a version value of, let’s say 21. Then they both try to insert item that has a sum value of 1100 and a version value of 22. However, since our insert function utilizes “conditional write” feature hence there cannot be more than one item of a same user_id – version combination in the table, one of those requests fails. To retry, the failed request reads the value from the table again, getting a sum value of 1100 and a version value of 22 this time. Then it inserts a new item with a sum 1200 and a version 23. If it fails again, it repeats the same process until it succeeds (or until it meets the limit that the programmer has set). This way the so-called isolation problem can be solved while still maintaining chronological order of items since the higher version is always created after the lower version.

Now all the problems with creating a new item in the table are solved. But there is a remained issue about reading the table’s data. In the table structure described above, it is very inefficient to read overall data in chronological order because they are scattered to millions of user_id partitions. Possible ways are to scan the whole table and then sort them in order of time or to iterate every user_id partition to get data within the time range. In order to get the timeline of data more efficiently, we added one more attribute and one Global Secondary Index (GSI) as below.

Amount_sum_version_scatter table

click the image to open in full size

Scatter-date-index

click the image to open in full size

It will be very simple if we set an arbitrary attribute as the partition key of our GSI and data as the sort key and then assign a same value to that arbitrary attribute to gather all data in one partition. However, this will cause the so-called “hot partition” problem, which is throttling of the database when every item is written in a same partition. To prevent such a problem, the scatter value is assigned a random value within a certain range. This way, items are distributed to several partitions so the possibility of throttling is decreased significantly. After setting up the table like this, we can query data by iterating through these “scatter” partitions. The reason why this is much more efficient way compared to querying every user_id partition is that the number of “scatter” partitions (100 or less) is smaller than number of users and the number of “scatter” partitions is completely under control of the programmer.

Conclusion

So far was the introduction to how we use DynamoDB to manage Honeyscreen’s points system. Since there is plenty of well-written information on DynamoDB, this article only dealt with problems occurred when implementing the points system. DynamoDB has it own advantages and disadvantages. However, DynamoDB is constantly improving and overcoming its weaknesses with help of AWS and other developers. For example, the “conditional write” feature was introduced in 2014. You can check the AWS official blog to learn about new services and features. In Buzzvil, we are utilizing a variety of services provided by AWS. We will keep you posted.

Comments