In this post, we’ll discuss an edge case with the Box API that occurs when refreshing OAuth Bearer Tokens. It is often overlooked since it is only likely to arise when several concurrent API requests are made.
The Kloudless unified Storage API enables developers to access users’ files in any cloud storage provider with a single implementation. Accessing data in a user’s Box account requires the same implementation as for any other supported cloud storage service. The first step in this process is obtaining access to the end-user’s cloud storage account.
Since the Kloudless API is an abstraction layer, it also abstracts the authentication mechanism each service uses. Developers building with Kloudless only need to follow the Kloudless OAuth flow, regardless of the specifics of authenticating with a particular service.
We’re not the only application provider to choose OAuth. As you may have guessed, OAuth 2.0 is the most common standard for token-based authentication we see. Applications can obtain access to data in storage services like Box by guiding users through a three-legged OAuth 2.0 authorization flow. The result is usually a Bearer token with a short expiry time as well as a Refresh token with a longer expiration that can be used to obtain future Bearer tokens. Here’s an example response from the Box API docs:
Edge cases when refreshing Box Bearer tokens
The Box OAuth flow is documented well and one of the better implementations we’ve seen. Even so, there are some pitfalls to watch out for when accessing the Box API. One of these is when refreshing tokens.
New Bearer tokens can be obtained by making a request with a valid Refresh token. However, using a new Box Bearer token obtained via a refresh immediately invalidates any other tokens that were obtained with that Refresh token. It also invalidates that refresh token as well.
It is important to note that the newly obtained token must first be used in an API request before Box invalidates the other ones. This means:
- Merely obtaining a token is insufficient to guarantee that it remains valid. Another process could obtain a refreshed token and actually use it first. Box tries to guard against this scenario by providing the same set of tokens to refresh attempts with a specific Refresh token.
- The previous token can continue to be used till it either expires or the new token is used.
Usually, this does not pose a concern. However, highly concurrent workloads spread across different machines can turn this into a larger issue that needs special consideration.
Consider the case where there are multiple distinct processes on separate machines performing concurrent API requests to Box. This may be the case when crawling data in a Box account, or when processing a large number of files. The Box bearer token expires after an hour, so it is inevitable that a refresh is needed at some point during processing. When the token expires, the most straight-forward implementation would be to attempt to refresh the token, save the new token to a database, and then proceed with API requests.
This is in fact the default implementation Kloudless originally used as well. However, a couple of issues begin to appear:
- All processes notice the token expire at the same time, causing each to attempt to refresh the token.
- A large number of database writes occur as each process attempts to save its newly refreshed token to the database.
- Some of the actual refresh attempts are slightly delayed from when the application logic realizes it must refresh the token, causing the process to fetch a newly refreshed token that was just saved to the database by a different process and refresh it once again instead.
- In a variation of the scenario above, the first process that fetched the newly refreshed token proceeds to use that token, causing it to become valid and invalidating all others, including the most recent token saved to the database by the second process. This causes the token that is now valid to no longer exist in the database and thus be lost forever. This is the worst possible outcome: the end-user now needs to re-authorize access to their Box account.
We eventually determined an implementation that works around the issues above:
- Randomize when token refreshes occur, shortly before expiration, rather than exactly at the expiration time.
- When attempting to save a new bearer token to the database:
- First ensure the database is accessible and writable to.
- Then use the token in an API request to guarantee it is valid and others are invalidated.
- Save the token to the database atomically.
This has proven to be a more robust solution that avoids loss of token data when interacting with the Box API, even under highly concurrent workloads.
Alternate authorization mechanisms
Check out this previous blog post for an alternate form of server-to-server authorization using JSON Web Tokens that is supported by both Box and Kloudless. The advantage with this approach is that it removes the need to use a refresh token altogether. Bearer tokens can be obtained at any time using a JWT assertion generated by the application.