Things change quickly at a startup, software architecture included. The tech stack and infrastructure that scales your business will look nothing like what you started with on Day 1. Our Co-founder and CTO Vinod Chandru (the whatever-it-takes guy!) discusses initial design choices for Kloudless’ architecture and how it evolved with the growing needs of the company.
What factors were considered in choosing the right design, infrastructure, tech stack for Kloudless on Day One?
We did not choose the right design, infrastructure, or tech stack on Day One. We made a best guess as to what it would be based on industry best practices for building a scalable web application service to handle API requests, with sane choices like a Django web server, PostgreSQL database, and Redis cache running in AWS. Plenty of other businesses have similar tech stacks. Early on, we learned from blog posts about Instagram’s stack, which was similar at the time.
How has this changed as the business has grown?
We tried to optimize for some scale early on by separating out the components in the stack. The Kloudless stack is responsible for receiving API requests either on the backend or when it receives API requests to other third-party services. We made the decision to have that entire stack be asynchronous throughout, so each component is decoupled from each other and there are less bottlenecks in the entire network stack at least. That has changed as we evolved the scalability of the stack.
For example, we needed to track all the activity that is happening in an account, so we initially just saved all of that that in our database. But with very large customers, trying to track all activity in a Google Drive or a SharePoint account and storing all that data in a database results in a lot of excess load on the database, so we ended up rearchitecting that. We improved various items on the scalability side, and that’s the part that has really changed the most in being able to scale to support hundreds of millions of API requests per Kloudless cluster.
I think one design decision that was the right choice early on was to optimize for building the entire solution so that it can be run by anyone, not just us. As a result, we have been able to deploy on-premises, which is the customer’s own environment, as well as packaging the solution into customer deployments when they deploy to their own customers. This has been a major benefit that allowed us to avoid having to spend the time and money certifying with regulatory compliance requirements. We simply provide customers with the Kloudless Docker container if they have any security needs.
In terms of what else has changed, we initially started with a Cloud Storage API, so a lot of the functionality behind the scenes was geared towards tracking files. That’s evolved a lot since we expanded beyond Cloud Storage into other API categories like Calendar. We grew to have more stringent requirements for real time activity monitoring as well as more functionality behind the scenes to enable features that help with a variety of use cases. We built out more UI tools as well. Overall, the biggest part that has changed was how we scaled and managed the Kloudless cluster.
What were some of the milestones that signaled change was needed?
The obvious milestones are usually customer-oriented, where we have a high-paying customer or a new prospect that indicates a certain feature is required for their product. Stepping back from this for a bit, since the Kloudless APIs are functioning as a middle layer, a customer’s product often isn’t able to be built without Kloudless supporting certain functionality.
Customers who use Kloudless for business-critical features naturally end up a lot more invested in the products in terms of their own engineering time, their expenditure on the Kloudless stack, and their own product road map. If we are unable to support certain functionality or requirements around latency and performance, that becomes a big concern. This is what drives some of these architectural changes.
These were the types of things that we thought about as we scaled the cloud version. It’s a multi-tenant system, so we support a lot of folks on it. That resulted in us rearchitecting how we scaled to multiple regions.
Can you give us a summary of what the kloudless technology stack look like today?
The API server and developer portal use Django, and handle API requests using background processes. We use a PostgreSQL DB and Redis for caching and inter-server communication.
Is this different between the Cloud and the Self-hosted versions of Kloudless?
No, we offer the same Kloudless Docker container to our customers that we run in our cloud.
What are some of the big architecture challenges today?
It’s not so much challenges as it is a continued area of improvement. A big part of why customers use Kloudless is to track activity occurring in third-party accounts so that they can sync data over into their applications or otherwise process the changes that are happening in a users connected cloud storage account such as a calendar account, or perhaps to process Slack messages that arrive.
One of the core goals of Kloudless is to provide that activity feed in a timely manner with very low latency or as little latency as possible given architectural restrictions around how each service makes that data available. For example, some services support webhooks. That’s just a way for the services we support, like Dropbox, to notify Kloudless of a change. Google notifies Kloudless that some data has changed that uses Google Drive, so that we can stay on top of those changes. Otherwise, if the Kloudless server needs to check for changes specifically, then we have to poll for changes.
Polling take up a lot more resources and in some cases we have to check the entire set of data for any changes and map it to our internal representation of some metadata around the state of the users account. We have those capabilities but for several systems we continue to get requests around improving the reliability and performance of this kind of activity monitoring. The core goal of Kloudless is to fill in the gaps where they exist in third-party systems that may not provide all of the features that the Kloudless API indicates as required for a certain capability.
When Kloudless takes steps to fill in the gaps, the level we are able to do so varies based on service. Thankfully we have a lot of experience at this point with the most popular services and frequently review how to improve this implementation. We’ll continue to do that in the new release of our activity monitoring API in the next couple of weeks.
Another major area of improvements is iterating on more connecters faster to improve our internal connector platform. We want to improve it to the point where we can integrate very quickly with apps. We’ve definitely improved on the speed at which we’re able to plug third-party APIs into our backend and we have the necessary building blocks to layer on the functionality Kloudless provides very quickly.