Cache Storm Handling #740

jjxtra · 2019-12-27T21:42:36Z

The case where the caching factory function needs to be executed is vulnerable to a cache storm, where many requests for the same key will execute the factory function multiple times.

Imagine this scenario for a busy API server that caches expensive SQL queries using only an in memory cache.

1] Cache expires for key 'ExpensiveQuery'.
2] The 'ExpensiveQuery' key is a very common request, so right after it expires, within the span of a few seconds, hundreds of requests come in to request 'ExpensiveQuery'.
3] Hundreds of database calls all execute 'ExpensiveQuery' taking the database server to it's knees.
4] Maybe these queries succeed after a long time, probably they will timeout and hit a retry policy, further exacerbating the problem unless timeout is handled specially in the policy.

Ideally, any subsequent requests for a key would simply block or return an existing task for that key. Polly caching relies on a get and then later a put in the case where the cache value is not found. Since this is not atomic, unfortunately the factory method is executed multiple times until the key finally gets put into the cache.

The solution to a cache storm is to prevent multiple factory methods from executing simultaneously, and having subsequent requests simply get a task from the cache that represents the currently executing factory method. When the task completes, anyone who was waiting on it gets the exact same result.

There are at least two options:

1] Change Polly caching implementations to prevent simultaneous factory methods from executing for the same key at a given time, which does change existing behavior.

2] Add a new Polly caching interface with a contract that guarantees that only one factory method per key executes at any given time:

// Interface that ensures only one cache factory method executes per key simultaneously
public interface IGetOrCreateCacheAsync
{
    // get an existing cache item if it exists, or call the factory method and add it to the cache, ensuring that only one factory method per key is executed simultaneously
    Task<bool, T> GetOrCreateAsync<T>(string key, Func<Task<(bool, T)>> factory, CancellationToken cancelToken = default);
}

Please let me know your thoughts and if I have misread the code and missed something in the caching where this kind of behavior is prevented...

EDIT As I think about this more, the GetOrCreateAsync pattern perhaps better applies at a higher level then caching, as any Polly operation that could be tied to an identical key would be an ideal candidate to prevent multiple factory methods from executing simultaneously, regardless of whether caching is in the policy or not.

The text was updated successfully, but these errors were encountered:

reisenberger · 2020-01-03T23:35:41Z

Thanks @jjxtra for the detailed description. This looks to me like a duplicate of #657. (Please say if you disagree.)

jjxtra · 2020-01-03T23:37:16Z

It is similar, the one thing I would ask is if there is a way to use a key per policy execution? In this way you could prevent multiple policy executions per key on the same machine, this would be a better solution than simply fixing the caching policy by itself.

We could close as duplicate and I'll add this bit of info to the other issue.

reisenberger · 2020-01-04T00:02:21Z

if there is a way to use a key per policy execution?

Closing: moving commentary to #657; made some commentary on the above also on #657 . @jjxtra Also aware of the discussion here

reisenberger added the duplicate label Jan 3, 2020

reisenberger mentioned this issue Jan 3, 2020

Limit concurrency of Polly.Cache calling its delegate per corresponding key #657

Closed

reisenberger closed this as completed Jan 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Storm Handling #740

Cache Storm Handling #740

jjxtra commented Dec 27, 2019 •

edited

Loading

reisenberger commented Jan 3, 2020

jjxtra commented Jan 3, 2020 •

edited

Loading

reisenberger commented Jan 4, 2020

Cache Storm Handling #740

Cache Storm Handling #740

Comments

jjxtra commented Dec 27, 2019 • edited Loading

reisenberger commented Jan 3, 2020

jjxtra commented Jan 3, 2020 • edited Loading

reisenberger commented Jan 4, 2020

jjxtra commented Dec 27, 2019 •

edited

Loading

jjxtra commented Jan 3, 2020 •

edited

Loading