When we started working on TrueAccord, we had a limited understanding of various technical aspects of the problem. Naturally, one of those unclear aspects was the data model: what data entities we will need to track, what will be their relationships (one-to-one, one-to-many, and so on), and how easy it is going to be to change the data model as business requirements become known and our domain expertise grows.
Using Protocol Buffers to model the data your service uses for storage or messaging is great for a fast-changing project:
- adding and removing fields is trivial, turning an optional field into a repeated field and so on. If we modeled our data using SQL, we will be constantly migrating our database schema.
- the data schema (the proto file) serves as an always up-to-date reference documentation for the service’s data structures and messages. People from different teams can easily generate parsers for almost every programming language, and access the same data.
We started by using Google’s official Protocol Buffers generator for Java, but that introduced a few issues. First, as we code in Scala, we make extensive use of Scala’s Option[T] type which represent a value that is either of type T or is missing. This led us to write code that looks like this “val x: Option.[T] = if (proto.hasX) Some(X) else None” to translate the Java protocol buffer into our application’s internal data structures. Another Javaism we wanted to avoid is creating Builders and working with nested builders. We wanted to have a nice syntax for updating nested fields in our messages. Lastly, a lot of our messages describe mutually-exclusive situations. We had comments in our protofiles saying something along the lines of “// at most one of these fields may be set:”. We started to convert them to one-ofs which were introduced in Protocol Buffers 2.6.0, but we wanted the generated code to have a Scala type that will capture the mutual-exclusivity, and allow us to pattern match on the one-of and have the compiler warn us when we forget to handle a case.
Other protocol buffer generators for Scala solve the optional fields issue, but do not offer a nice syntax for updating nested fields or deal with the one-ofs. So we went ahead and created our own Protocol Buffer code generator for Scala, ScalaPB.
What’s unique about ScalaPB is that it is written as a Google’s protocol buffer compiler plugin. This means that the heavy lifting of parsing the proto definition is done by the official protocol buffer compiler. Anecdotally, some of our code would not even compile by the existing tools (since they make use of their own proto parsers). This also allow us to take advantage of changes in the proto language without developing our own parser for it.
To work with nested fields, ScalaPB uses a functional programming concept called lenses to create a pleasant way to mutate messages or create new ones (messages in ScalaPB are actually immutable, so when we use the terms “update” or “mutate” we mean creating a copy of the original message with some modifications, while the original message stays intact). This allows us to update nested fields like this: person.update(_.address.city := “San Francisco”). Without lenses, using only methods that are available on case-classes you would write person.copy(address = copy(person.address.copy(city = “San Francisco”))).
To solve the mutual exclusivity problem, ScalaPB generates a sealed trait for each one-of type, and each one-of option is a case class that extends that trait. This is a standard idiom to represent a “union” type.
If you also had to deal with mutually exclusive fields and wanted to have a more functional PB generator, ScalaPB is probably for you. Take a look, and let us know what you think.