When we started TrueAccord in 2013, we used MySQL to store our data in pretty traditional way. As business requirements came in, we found ourselves continuously migrating our table schemas to add more columns and more tables. Before MySQL 5.6, these schema changes would lock down the database for the entire duration of a change causing a brief downtime. When the company was smaller and just starting out, this was tolerable, but as we grew the increase in schema complexity was getting harder to manage via SQL migration scripts.
We were looking for an alternative, something like Big Table, the key-value store that I used back at Google. Using a key-value store enables storing an entire document as a value, and thus eliminating the need for migrations. We investigated several publicly available key-value stores, but none of them met our major requirements at the time. As a small engineering team, we wanted a hosted fully managed database solution, so that backups and server migrations are taken care of for us. Additionally we wanted security features like encryption at rest. DynamoDB came the closest to matching our requirements, but was missing encryption at rest.
We came across this old post from FriendFeed that describes at a high-level design that meets our requirements which inspired our implementation. First, we chose to use MySQL (now Aurora) managed by Amazon RDS as our backing datastore. This solves the requirement for a hosted, managed, encrypted database, and this is a battle-tested database. Then for the key-value interface (to avoid schema migrations), we built a thin library called Repos that provides a key-value interface implemented on top of MySQL. Now we have something that allows us to move quickly on top of a reliable datastore.
Each repo represents a map from a UUID (key) to an arbitrary array of bytes representing the value. Each repo is stored in MySQL using two tables. The first table is the log table. Every time we wanted to insert or update an entity, we will insert it to this table.
|pk||bigint(20)||Auto incremented primary key|
|uuid||binary(16)||Unique id for each entry|
|format||char(1)||Describes the format of the entry_bin column.|
We always append to this table, never updating an existing row. By doing so, we get the full history of every object. This has proven to be really handy for debugging why a change has occurred, and when.
The format column can take two possible values: ‘1’ means the value in entry_pb is a serialized protocol buffer, and ‘2’ means it is compressed using Snappy (a compression scheme that aims for high speed and reasonable compression)
To optimize look-ups, we have another table, the “latest” table, with the following format:
|parent_pk||bigint(20)||PK of this entry in the log table.|
|uuid||binary(16)||The unique id of the entry(here it is a primary key)|
|format||Char(1)||Describes the format of the entry_bin column.|
Whenever we insert an element to the log table, we also upsert it to this table so it always has the latest inserted element. We do this as a transaction to ensure the tables are always in sync.
Secondary Index Implementation
The first hurdle when going in this route is secondary indexes. For example, if your Repo maps a user id to his account information (email, hashed password, full name), how would you look up an account by email? To do so, we implemented index tables. An index table maps the values in the key value store to a primitive value that MySQL can index. A single repo may have multiple indexes, and each one goes to its own table. Index tables have the following layout:
|parent_pk||bigint(20)||PK of this entry in the log table.|
|uuid||binary(16)||Random id for each entity (here it is a primary key)|
|value||*||The indexed value (for example, the email address of the user)|
We always insert to the secondary index. Therefore, over time, the index will contain stale values. To solve that, when querying, we join the uuid and parent_pk with the latest value and return the result only if there is a match.
For example, if we have a person with id “idA” and he changed his email, the log table would look like this:
|pk||uuid||time_msec||value (format, entry_bin)|
The latest table, would have only the updated row:
|parent_pk||uuid||value (format, entry_bin)|
The email index table would have the email value, for each version of the object:
Now, to find an account whose latest email value is “firstname.lastname@example.org”, the Repos library would build a query similar to this:
SELECT l.uuid, l.format, l.entry_bin FROM latest AS l, email_index AS e WHERE e.value = email@example.com" AND e.uuid = l.uuid AND e.parent_pk = l.parent_pk
Our Repo library provides a nice Scala api for querying by index. For example,
Would return all the accounts that have this email address.
Using Table Janitor to Manage Our Tables and Indexes
The table janitor is a process implemented as an Akka actor that runs on our JVMs. This actor is responsible for two main tasks:
- Ensuring that the underlying MySQL tables are created.It does this by reflecting all of the Repos and indices defined in the code and then creating the corresponding MySQL tables. This makes adding a new repo or adding an index as simple as just defining it in the code.
- Ensuring that the indices are up to date. This is necessary since when a new index gets added, there may still be servers that run old version of the code and do not write into the new index. The table janitor regularly monitors the log tables and (re-)indexes every new record. Adding an index to an existing repo is easy – we just declare it in the code.
How we do Analytics
We use AWS data pipeline to incrementally dump our log tables into S3. We then use Spark (with ScalaPB) for Bigdata processing. We also upload a snapshot of it to Google’s Bigquery. As all our repos use Protocol buffers as their value type, we can automatically generate Bigquery schemas for each repo.
Pros and Cons of Our Approach
By writing repos and have all our database access go through it, we get a lot of benefits:
- Uniformity: having all our key-value maps being repos has the advantage that every optimization and every improvement applies to all our tables. For example, when we build a view that shows an object history, it works for all of our repos.
- Schema evolution is free when using protocol buffers as values. We can just add optional fields, rename existing fields, or convert an optional to a repeated and it just works.
- Security: storing data securely on RDS is a breeze. Encryption at rest? Click a checkbox. Require data encryption in transit? SSL is supported by default.
- Reliability: We never had the RDS MySQL (later Aurora) instances go down (besides rare scheduled maintenance windows which require the instances to be rebooted). We have never lost data. Additionally we can recover the database to any given snapshot in time with RDS by replaying binary logs on top of a snapshot.
- Ease of use: adding a Repo or an index is trivial. All of our ~60 or so Repos work in exactly the same way, and accessed through the same programmatic interface, our engineers can easily work with any of them using the same programming interface.
- Optimization/Monitoring/debugging: Since MySQL is a mature and well-understood technology, there is a plethora of documentation on how to tune it, how to debug problems. In addition, AWS provides a lot of metrics for monitoring how an RDS instance is doing.
However, there are also downsides:
- Storing binary data in MySQL limits what can be done using the command line MySQL client. We had to write a command line tool (and a UI) to look up elements by key so we can debug. For more complex queries, we use Spark and BigQuery for visibility into our data.
- Being a homegrown solution, we occasionally had to spend time tuning our SQL queries when our repos grew in size. On the positive side, scaling up due to business growth is a good problem to have and fixing it for one repo, made an improvement for all others.
- JDBC has Multiple Layers: JDBC/HikariCP/Mysql connector: we had quite a few issues where it was tricky to pinpoint the source of the problem.
Alternatives: What the Future Looks Like
As much as we’d like our homegrown solution, we are continuously thinking what our next storage solution will be like.
- Current versions of both MySQL and Postgres come with built-in support for indexing JSON documents.
- Google now offers a publicly hosted version of Bigtable.
- We are moving towards having our data represented as a stream of events which may benefit from a different data store.
The Repos implementation has enabled our engineering team to quickly develop a lot of new functionality, as well as iterating over the data schema. By implementing on top of RDS, we have the peace of mind that our data is safe and our servers are up to date with all the security patches. At the same time, having full control over the implementation details of repos allowed us to quickly implement additional security measure so we can satisfy the stringent requirements of card issuers and other financial institutions, without sacrificing development speed.