Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache Storm Handling #740

Closed
jjxtra opened this issue Dec 27, 2019 · 3 comments
Closed

Cache Storm Handling #740

jjxtra opened this issue Dec 27, 2019 · 3 comments

Comments

@jjxtra
Copy link

jjxtra commented Dec 27, 2019

The case where the caching factory function needs to be executed is vulnerable to a cache storm, where many requests for the same key will execute the factory function multiple times.

Imagine this scenario for a busy API server that caches expensive SQL queries using only an in memory cache.

1] Cache expires for key 'ExpensiveQuery'.
2] The 'ExpensiveQuery' key is a very common request, so right after it expires, within the span of a few seconds, hundreds of requests come in to request 'ExpensiveQuery'.
3] Hundreds of database calls all execute 'ExpensiveQuery' taking the database server to it's knees.
4] Maybe these queries succeed after a long time, probably they will timeout and hit a retry policy, further exacerbating the problem unless timeout is handled specially in the policy.

Ideally, any subsequent requests for a key would simply block or return an existing task for that key. Polly caching relies on a get and then later a put in the case where the cache value is not found. Since this is not atomic, unfortunately the factory method is executed multiple times until the key finally gets put into the cache.

The solution to a cache storm is to prevent multiple factory methods from executing simultaneously, and having subsequent requests simply get a task from the cache that represents the currently executing factory method. When the task completes, anyone who was waiting on it gets the exact same result.

There are at least two options:

1] Change Polly caching implementations to prevent simultaneous factory methods from executing for the same key at a given time, which does change existing behavior.

2] Add a new Polly caching interface with a contract that guarantees that only one factory method per key executes at any given time:

// Interface that ensures only one cache factory method executes per key simultaneously
public interface IGetOrCreateCacheAsync
{
    // get an existing cache item if it exists, or call the factory method and add it to the cache, ensuring that only one factory method per key is executed simultaneously
    Task<bool, T> GetOrCreateAsync<T>(string key, Func<Task<(bool, T)>> factory, CancellationToken cancelToken = default);
}

Please let me know your thoughts and if I have misread the code and missed something in the caching where this kind of behavior is prevented...

EDIT As I think about this more, the GetOrCreateAsync pattern perhaps better applies at a higher level then caching, as any Polly operation that could be tied to an identical key would be an ideal candidate to prevent multiple factory methods from executing simultaneously, regardless of whether caching is in the policy or not.

@reisenberger
Copy link
Member

Thanks @jjxtra for the detailed description. This looks to me like a duplicate of #657. (Please say if you disagree.)

@jjxtra
Copy link
Author

jjxtra commented Jan 3, 2020

It is similar, the one thing I would ask is if there is a way to use a key per policy execution? In this way you could prevent multiple policy executions per key on the same machine, this would be a better solution than simply fixing the caching policy by itself.

We could close as duplicate and I'll add this bit of info to the other issue.

@reisenberger
Copy link
Member

if there is a way to use a key per policy execution?

Closing: moving commentary to #657; made some commentary on the above also on #657 . @jjxtra Also aware of the discussion here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants