Enterprise software delivery is hard, but there’s a better way. (part 2)

Jun 11

Part 2: What’s the ideal?

In an ideal world, your enterprise customers would simply trust your multi-tenant SaaS product to handle everything. Unfortunately, for many enterprise use cases, this wishful thinking collides with harsh realities: data sets that can’t be moved cost-effectively, sensitive data that’s too risky to hand over, and regulatory constraints that make data movement impossible. Add to that the rising importance of private AI, where data gravity and security are non-negotiable, and it’s clear: simply “trusting SaaS” is not viable for a significant portion of the enterprise market.

Our previous post in this series walked through enterprise software delivery models and why they are so difficult and expensive to implement. We walked through the compounding problems of deployment, observability, operations, and customer control, as well as the additional effects of managed cloud services and customer security and operational requirements.

In this post, we will:

break down why a gap exists between multi-tenant SaaS and customer-hosting
explore an idealized solution that will turn out to be subtly wrong
settle on the right model for how delivering software into customer environments ought to work.

Our upcoming and final post in this series (part 3) will dive into the technical details of the approach we’re taking at Tensor9 to help you deliver your software to enterprise customers.

An impractical ideal: customers just trust SaaS

Many SaaS vendors wishfully think that customers will just trust their standard multi-tenant (or possibly single tenant) product. But for many enterprise use cases, this model is simply not viable. This is typically due to:

Data gravity: Heavy or large datasets are expensive and slow to move to the vendor's SaaS environment. This is a common issue for companies in big data analytics, machine learning model training, and genomics.
Sensitive data & intellectual property: Moving highly sensitive, proprietary, or business-critical data outside the customer's own controlled environment introduces risks many are unwilling to take. This is particularly true for verticals like financial services (banking, insurance, investment firms) with sensitive financial records, manufacturing with proprietary design and operational data, and technology companies developing core intellectual property.
Regulated data: Legal and regulatory frameworks (like GDPR in Europe for personal data, HIPAA in the US for healthcare, PCI-DSS for payment information, and various national data sovereignty laws) often impose strict requirements on data residency, processing, and security. This makes it challenging or illegal to move certain data to a generic third-party SaaS environment. This heavily impacts vendors selling to healthcare, financial services, telecommunications, and government.
Private AI: Increasingly, enterprises want to leverage the power of Artificial Intelligence on their own proprietary datasets. These datasets are often the crown jewels of the company and cannot leave their secure perimeter. "Private AI" or "AI on your data" initiatives mean that the AI models (whether developed by the vendor or the customer) must run where the data resides – within the customer's secure environment. Asking these customers to move petabytes of sensitive data to a vendor's SaaS for AI processing is often a non-starter. This trend affects any software vendor incorporating AI/ML capabilities for industries like enterprise search, life sciences (drug discovery, genomics), financial services (fraud detection, risk modeling), and advanced R&D sectors. See this article by Equinix CTO Justin Dustinzadeh for developed thoughts on this matter.

If you’re serving these types of customers, the "just trust our SaaS" model can be an immediate disqualifier for a significant portion of your addressable market. These customers require solutions that respect their data boundaries.

An ideal that is subtly wrong

Another ideal solution: a single-tenant vendor-hosted product could be "automatically transformed" into a customer-hosted product. This transformation would aim to hide the various problem combinations and shield you from customer requirements. You would, in theory, have a single product and believe you are operating a vendor-hosted product.

This is subtly wrong because it obscures the underlying constraints. We've seen this mistake before with technologies like DCOM and Java RMI. The illusion of local objects, where remote objects behaved exactly like local ones, was not high-fidelity enough because it tried to completely obscure the underlying network reality. One obvious reason this illusion breaks down is that network latency in commodity network settings is significantly higher and far more variable than local in-process method dispatch. The industry has, thankfully, largely given up on that "everything is a local object" approach. Network calls are now generally made explicit in code, and application developers are responsible for handling the latency, performance, and availability variability inherent in distributed systems.

The key challenge with this approach is that you simply can’t assume underlying customer environments are identical. There are many idiosyncrasies across the public cloud vendors (not to mention on-prem or airgapped environments) that make an “automatic transformation” impossible, and day 2 operations of your app across customers introduces a host of complexities. For example:

Some customers won’t be able to use your public cloud of choice. Maybe their data set is too large to move, and/or expensive to access across public clouds. Or maybe they have invested heavily in one particular cloud, and have built muscles around operating it. Or maybe they have a custom, discounted pricing plan with their cloud provider. Whatever the reason, this requires you to replace your use of managed services (e.g., AWS S3, AWS RDS) with the closest equivalents available in the customer’s environment.
Some customers want to dictate software update schedules. This constraint requires that you negotiate a customer control versus support cost tradeoff with those customers. The more outdated versions of software that are out in the wild, the higher your support costs.
Some customers want to control what data is transmitted back to you. This requires you to negotiate with customers what data leaves their environment, and requires that you give customers an enforcement mechanism such as a firewall and/or audit logs.
Some customers won’t want to have their software phone home to you, the vendor. This requires you to have a different support/operations model for disconnected and air-gapped scenarios.

It is unreasonable to try to hide from these customer-driven constraints, so this model won’t work.

The right ideal

So what would a better ideal look like?

It would still involve a single-tenant vendor-hosted product architecture that is transformed into a customer-hosted product. However, critically, the transformation would abstract the various problem combinations (see part 1) rather than trying to completely hide them. The abstraction would expose customer constraints to you when necessary, allowing you to make informed tradeoffs and concessions regarding your product's behavior in various customer environments.

For example:

If a customer doesn’t want to use your public cloud of choice, the ideal solution would surface that incompatibility to you and present you with options. The ideal solution would make it easy for you to try Google Cloud Storage in lieu of AWS S3, or Google Cloud SQL in lieu of AWS RDS – and would make it easy for you to understand the performance implications of that replacement.
If a customer wants change-management controls, the ideal solution would provide a mechanism for them to approve, schedule, or reject software updates from a dashboard. For you, the platform would clearly track the version status of each customer, helping you manage the support costs and risks associated with maintaining multiple versions in the wild.
If a customer needs to control what data is transmitted back to you, the ideal solution would provide them with a clear interface to inspect and filter all egress traffic. It would allow them to set granular, auditable policies on what telemetry (e.g., specific logs, metrics, events) is permitted to leave their environment, while making it clear to your support team what data they are not receiving. Further, you would be able to easily learn if a code change violated the negotiated data egress contract between you and your customers.
If a customer doesn’t want their software to phone home to you at all, the ideal solution would allow you to package updates and operational commands into a bundle that can be securely transferred and applied by the customer themselves. It would also provide a mechanism for the customer to export relevant health and performance data, so they can share it with your support team through offline channels, enabling support for even fully air-gapped environments.

The ideal is for you to have a single core codebase that can intelligently adapt to individual customers’ operational needs, without needing to take 6-12 months to rearchitect your product. You don’t have to manually build and maintain dozens of disparate product variants themselves; instead, the platform handles the adaptation, empowering you to make a judgement when there is a fundamental tradeoff to deal with. This approach acknowledges that customer environments are different and provides you with the levers to manage those differences, without the pain of a complete rewrite for each major customer segment. You can still leverage managed services in your own SaaS offering, while the transformation layer provides well-defined interfaces and extension points to handle variations in customer environments, security needs, and choice of underlying cloud or on-prem infrastructure.

Tensor9 takes steps toward this ideal

This ideal was our north star when designing Tensor9.

We’ve built a platform that helps you take steps towards ideal, by enabling you to securely deploy and manage your existing product across a variety of customer-hosted environments - from different public clouds to on-premises and even air-gapped settings - without the typical complexity and cost. Instead of customers using a shared instance, they interact with their own dedicated private version. Tensor9 continuously monitors your existing SaaS stack in your cloud and continuously synchronizes it to each customer's environment. Tensor9 is highly opinionated about embracing customer constraints and never hides them - so when there are incompatibilities between your stack and a customer’s environment, Tensor9 surfaces the incompatibility to you as a trade-off to make.

Here’s the high level picture:

This approach allows you to deploy, observe, and operate your product across customer-hosted environments. It also meets customers where they are, giving them the controls they need to meet their security and operational requirements:

Deploy: Tensor9 enables you to deploy your product to customer environments with the same deployment tooling you use today (e.g. AWS CodePipelines, GitHub Actions, CircleCI, Octopus Deploy). A low-cost digital twin mirrors the deployment state of each customer’s stack. Tensor9 synchronizes changes in your stack to your digital twins, triggering deployments to customers.

Observe: Tensor9 enables you to observe customer environments by synchronizing logs, metrics, and hardware failures back to your digital twin. This enables you to observe, debug, and support customers as if they were using SaaS. For example, if you typically use CloudWatch to alert you when a load balancer has unhealthy services behind it, you can continue to do so against your digital twin.

Operate: Tensor9 enables you to operate customer environments remotely using the support tooling you already use today. This goes beyond observability, meaning that changes you make to your digital twins affect customer environments. For example, if you typically use Ansible to issue commands to your fleet of containerized services or virtual machines, you can continue to do so against your digital twin.
Customer Control: Customers have customizable change management controls and can set firewall policy to control the data, if any, that leaves their environment. All data that leaves the customer’s environment goes into an audit log so customers can independently verify the data egress policy for themselves.

In the next blog post, we’ll describe how Tensor9 works. We’ll get into technical detail on:

how digital twins work
integrations with familiar CI/CD, observability, and support tooling
how customer constraints are surfaced to you
how it helps you meet customers where they are by giving them controls and auditability.

Michael Ten-Pow, CEO https://www.linkedin.com/in/mtenpow/