Overview
The Universal Database was an infrastructure I had dreamed of for quite some time after working with a company where our information came from a wealth of different sources and workstations. Product information came from our ERP, eCommerce information came from Shopify, and contextual working data came from a many different Airtable workstations. Most operating procedures occurred in the workstations I had set up, so building a system that could utilize Airtable’s infrastructure and integrate natively was the preference over something like an external SQL server. Airtable has rudimentary data flow features built-in, but nothing that could support the scale of our operations. I identified the points where Airtable could be exploited to maximize scalability of data flow and used them to build one final database to rule them all.
The Kink in the Armor
Airtable’s API has a pretty strict rate limit of 5 actions per second; I have my apps using the API to move and edit data all the time in my workstations, and there is not enough bandwidth to sustain a frequent full data pull alongside that. There is, however, one very important caveat: The rate limit is enforced strictly on a per-Base (database) basis. While I can’t pull data from my workstations without interrupting their usual API operations, I can use Airtable’s sync feature to sync their data to additional bases that I use as “Launch Pads” to draw the data from. Theoretically, if you wanted to maximize efficiency, you could sync segments of one workstation’s data to a multitude of different Launch Pads and use a multi-threaded approach to query them all at once and scale your efficiency infinitely. Doing this is not against the terms of service, but it’s an exploitative approach that should be discussed with Airtable before pursuing. Personally I only use one launch pad per workstation for the sake of separating my data pulling from my base operations. You might be asking why we can’t just sync every base to each other, and the issue is that the syncs don’t converge on unique primary keys, so the records of your database are all duplicated and you hit the 50k record limit extremely fast. Plus, the feature is kind of a pain to work with in that it requires lots of API or automation driven linking and is simply not worth the headache.
The Harvest
Every Airtable workstation is assigned a Launch Pad base that is effectively just a proxy. My app queries all these launch pads simultaneously, gathering mass amounts of data in a short amount of time. Meanwhile, my app is also using GraphQL to query the relevant values of interest from Shopify, as well as parsing the automatic exports from our ERP. The Universal Database itself is also being queried, as we will update only the values that need changing. Once all the data has been collected, every key/value pair is thrown into a JSONObject that is ready to be compared to the previous database iteration.
Adjudication
Airtable fields come in a few different varieties, so we need to make sure we’re comparing them with a valid methodology. The Adjudicator is a lot like overloading a .equals() method, though we will also be using the key to gain information about an object, rather than just the value. The current comparisons are as follows:
- Number Comparison – If object is a long, integer, double, or float, convert it to BigDecimal (a robust number container in Java) and compare it to the original. Converting to BigDecimal obviates issues like 3.4 not equaling 3.40.
- Date Comparison – If the field header contains a “Date” string, parse the string into a java Date object and compare it with the original.
- Attachment Comparison – Attachments in Airtable are handled as JSONObjects with descriptor values like URL, filename, and size. All attachment field headers drawn from Launch Pads are equipped with a special emoji that will be used in this procedure to identify attachments and proceed with the following methodology: iterate through the attached objects and deem them unequal if the combination of the filename and size are different between the new candidate and the original. We cannot compare the URL, because when we provide the URL from the launchpad in the first place, Airtable downloads that image and rehosts it, giving it a new URL in the Universal Database. (Note: You could also identify attachments without the emoji by testing for a certain key structure)
- String Comparison – All other data structures can be compared via typical string .equals(), including arrays
All key/values candidates that are different from the original are pushed to a new JSONObject that will serve as the patching record. If this record has any fields after adjudication, it means a difference has been found, and it is loaded into the patch-pile.
Bundle SKU Handling
In order to sell multiple different products together, say for example a left sectional with two middle pieces and a corner, we use a combinative SKU like “SKU*Left + SKU*Middle x 2 + SKU*Corner”. When I said I wanted this project to be the final database to rule them all, I knew I had to accommodate these troublesome bundle SKUs. The method of parsing them is quite simple: Split the SKU by the reserved text ” + ” and further split those disparate SKUs by ” x ” to determine the multiplier. From here I created a list of additive fields whose values must be the sum of the disparate parts and their multipliers. Price, for example, is going to be SKU*Left’s price + SKU*Middle’s price * 2 + SKU*Corner’s price. Other special cases may be handled through logic as well, such as the inventory type; we chose to default the bundle’s inventory type to the weakest value of the group. Standard values are simply copied from one of the disparate SKUs, and the result is a dynamic and robust catalog of Bundle SKUs – not even our ERP could do this!
Propagation
Within two minutes, the Universal Database has been updated with fields from a multitude of different Airtable workstations, our inventory ERP, and our eCommerce platform. This base can be synced to all the workstations, who now have access to all the data they’d ever need. Every 5 minutes they “send” their own data, and receive back everyone else’s. The utility of being able to access information from any workstation in the company has had a profound effect on the structures and tools I’ve been able to create for my teams. There’s no more need for custom apps transferring data here and there, no more need for specific hackjob syncs between bases that become a nightmare as things scale. The world is at our fingertips.