The Hard Count: To Bill is to Build a Distributed System.

To bill customers is to build a distributed system. Plan accordingly.

Prolegomenon

Twelve years ago, I worked for a regional startup in the exurbs of what my brother-in-law calls “the One Day City.” My contemporaries and I were all using AWS for the first time. The biggest local shops ran mutual funds or sold insurance using bare metal and the JVM. The smaller ones, never funded by more than a regional VC, were using various high level languages (Ruby, Python, Clojure), almost always deployed on the same three-tier (app, memcache, MySQL) architecture.

But one summer, on a top floor in the highest building in town, fed by union-made pizza and in a reading club that no longer exists¹, a dozen of us read papers on distributed systems. We started with “The Byzantine Generals Problem”, then moved onto “Paxos Made Simple” and a sirenic parade of papers on live implementations: Dynamo, BigTable, Chubby, Cassandra.

I think for all of us, it felt like reading notes from a future that had not yet been evenly distributed. For a decade already then, but in places that might as well have been another country, businesses were deploying software on thousands of servers at a time. But east of the Ohio and west of the Delaware, we were running our small-sized applications on tens of almost snowflaked machines. It felt grandiose to call anything we built a distributed system. Indeed, the only distributed service we used regularly was S3.

So Gillian Welch sings of seeing a band on tour:

But I watched them walk
through the bottom land
and I wished that I played
in a rock and roll band

What’s present was already past

The decade since, however, has brought the use, deployment, and design of distributed systems to a great many programmers, as Mark Cavage announced in his 2013 paper, “There’s Just No Getting around It: You’re Building a Distributed System”. This change has come with some rather typified expectations:

To get or process data, you will interact with many nodes running the same software,
These nodes will be masterless or else coordinated by a (usually Paxos/raft/Zab) designated master,
Management of said nodes is ideally somebody else’s problem (AWS, the SRE team, etc), but practically, it’s still your problem, too, and
All of this software will be new.

For years in my own work, I found those expectations validated. I shipped sessionization with Python and Hadoop, suffered through Pig and Hive, helped operationalize a time series database backed by HBase, and eventually was working with Kafka, Zookeeper, and various distributed stream processors and data stores.

By the time I took a job in billing in 2020, I thought I’d seen it all. But in fact, I hadn’t seen anything yet.

A Billing Archipelago

As with any new job, I spent my first weeks in billing trying to understand my work’s data and where it came from. It was no easy thing. Every person I talked to had a different answer:

For salespeople, everything that mattered lived in Salesforce (“the CRM”). The end result of their work was a final quote, which described what the customer was buying and for how much.
For the order management team in finance, everything that mattered was in Dropbox (where the actual sales orders, consisting of the final quote plus contractual terms, were stored as PDFs).
For accounts receivable, also in finance, it was complicated…but at least most invoices were generated in Netsuite (“the ERP”), sometimes after a lot of work in Excel, using usage data from the data warehouse…and those sales order PDFs.
For engineering, some things were in the main configuration DB (“control-plane”), customer data such as credit cards were stored in Stripe, provisioning data was stored in a “compliance” DB, and usage data was in a “usage” DB.
Revenue accounting, again in finance, was probably the most complicated: to close books monthly and quarterly in the ERP (aka “Enterprise Resource Planning”), they depended on data from Salesforce, Stripe, our bank, and the internal data warehouse.

All told except for tax, the flow of data looked something like this:

Billing: it starts with prices and lawyers and ends in accounting.

I had expected to find at least a couple different systems, but I was quickly approaching a half-dozen, many with human-mediated replication processes in between them! Much of this complexity pertained to sales-assisted billing², a domain that seemed to be fast approaching the space of Problems No Software Engineer Should Try to Solve. But even for self-service billing (i.e. “sign up with a credit card on a website and be billed automatically”),³ it was clear that plenty of the distribution was either intrinsic or the engineers’ own doing.

There wasn’t a canonical distributed system in sight. But it was starting to feel quite unpleasantly…distributed. The data flowed, emerged, and coalesced along a chain of isolated islands, rarely even in the same cloud provider, loosely connected, and usually ferried by humans.

A working example: self-service signups and invoicing

If you’ve come this far you’ll want a more specific example.

We’ll pick the simplest story, and one for which a programmer is at least most substantially responsible: the story of a customer signing up on your website for a paid plan, supplying a credit card, and being billed at the end of the month based on their usage charges. Let’s also assume you happen to use a modern payment processor and billing vendor, such as Stripe.

What are the systems of record and flows of data that happen in order to go from signup to money collected? I imagine the steps like this:

The customer – let’s call her Alice – visits the website’s plans and pricing page. She reads the terms and clicks to sign up for the monthly “Starter” plan, which includes a fixed fee plus additional usage-based charges.
As part of her signup, Alice also supplies her full name, credit card, and postal code. These are collected from the website, but they’re actually sent directly to the payment processor using Stripe Elements.
The payment processor stores Alice’s credit card and returns a token to the website’s frontend. The frontend then sends this token, along with Alice’s email address and the plan she selected, to the site’s backend.
The backend creates a new customer and subscription for Alice in your billing vendor (again, Stripe). It also creates the records needed locally, so that anything in your own infrastructure can know that Alice has access to the various features (we might say “entitlements”) that come with the “Starter” plan.
Every day, the backend sends new usage data to Stripe via its Usage API.
One month from when Alice signed up, Stripe creates an invoice. Tax is assessed (let’s assume via Stripe’s new tax offering), and then Stripe charges her credit card.
At some point, possibly after the invoice is issued, or possibly only after it’s paid, your accounting team recognizes the revenue from Alice’s invoice.

It’s subtle, but even just in this signup flow, the moving pieces are already many, and chances are high that you’ve not persisted everything you need to, or handled all your edge cases. What happens to Alice when you launch new commercial terms or new pricing for your “Starter” plan? Does she remain on her old plan, or is there a flow to move her onto the new pricing? How do you guard against cases where you create Alice’s subscription in Stripe, but then fail to provision her account in your own database? (Or vice versa, if you do things in the opposite order.) How do you handle the fact that Alice’s subscription starts at 1:05PM, but your usage data is aggregated only with daily resolution?

These complications expand as additional business cases come into scope, including:

tax exemptions, remmittances, and reporting
trials
upgrading and downgrading
cancellation
failed invoices (“non-payment”)
refunds
discounts and credits
introduction of new plans and sunsetting of old plans
addition or removal of features (we often call these “entitlements”) to and from existing plans

The good news in all of this is that if you work in billing, you’re likely to have plenty of work to do. The bad news is that you, or your predecessors, probably haven’t thought enough about the different systems of record from the get-go. You might not have a place where you keep a product catalog (“but isn’t the website’s pricing page enough?”). You might not have machine readable sales orders for your sales-assisted customers (“can’t the order management team just read the documents generated by Salesforce?”). And so on.

In fact, we’ve reached expectations essentially opposite of the four ones typified above:

Instead of interacting with many nodes running the same software, you’ll mostly interact with a fairly small number of nodes running different software. (Small because most billing problems, at least outside of utilities, are not high scale – different because the systems of record are different software.)
Instead of relying on the distributed system to provide coordination, you will be doing the coordination. Most importantly, you will be in charge of making your operations idempotent and concurrent safe across multiple systems of record.
Instead of having an SRE team to manage these services (or managing them yourself), you will be interacting primarily with vendors to manage your system of record. These vendors will generally be long-established companies… the kind who rarely update documentation, have just recently replaced SOAP with REST, and are quite comfortable rolling out a 2 hour maintenance window that maybe expands to 4 or 8 hours.
You guessed it! Even if your vendors are SaaS based, most of the time, you will not be running the most current version of their software. This happens because on the one hand, Enterprise SaaS tends to support years of earlier versions, and (relatedly) because upgrading usually has impacts reaching across many parts of a company, from sales to engineering to multiple functions in finance. So, be prepared to be many years behind a vendor’s latest release. It happens more often than you’d think.

Conclusion

Thus we close at last with our statement at the beginning: Whether you want it to be or not: to bill customers is to build a distributed system. It won’t be the sort of distributed system you expect, and it probably won’t be one you’ll feel excited to put on your resume. But it will have great challenges, and even some of them will be technical.

We’ll talk about the first of these – defining a product catalog – in the next installment. You’ll find it in about 2 weeks under “The Hard Count” on the main page, but if you prefer not to poll, you can also subscribe to the free site mailing list or site rss feed.

It was a chapter of the long-since archived a NOSQL summer. But I think Papers We Love does something similar today, and more yet.
↰
Throughout this series, I’ll use the term “sales-assisted” to describe any billing where an actual salesperson is instrumental in making the sale. You might also see this called “enterprise”, “business tier”, or “invoiced” billing.
↰
Throughout this series, I’ll use the term “self-service” to describe billing for customers who sign up for a SaaS offering online, who generally pay by credit card, and who handle all upgrades, downgrades, and cancellations through the company’s website. You might also hear people call this called “digital,” “online” or “credit-card” billing.
↰