Skip to content
This repository has been archived by the owner on Jan 21, 2022. It is now read-only.

Metric Types

PlamenDoychev edited this page Apr 20, 2018 · 12 revisions

There are two metric types in Abacus:

  • discrete, also known as stateless, "historical" and "log-like"
  • time-based, also known as stateful

Below you can find the characteristics of the two metric types in Abacus:

Discrete

These metrics are stateless. The usage records are submitted in a log-like manner to Abacus. When you request an aggregated result from Abacus, it goes through the history of events and performs calculations, based on the defined formulas. Discrete metrics are usually quite simple and deal with simple numbers in both measures and metrics.

Sample Discrete Metric

{
        name: 'storage',
        unit: 'GIGABYTE',
        type: 'discrete',
        meter: ((m) => new BigNumber(m.storage).div(1073741824).toNumber()),
        aggregate: ((a, prev, curr, aggTwCell, accTwCell) => new BigNumber(a || 0).add(curr).toNumber())
}

In this example, the storage value is converted from Bytes to Gigabyte by division. Then, when the end time of the current measure is between the “from” and “to” timestamps, the accumulate function returns the maximum amount of memory that has been used so far. In this case, “to” is the time until which the resource usage must be accumulated. Currently Abacus predefines the “from” and “to” times to refer to the start and end of a month. Although the dimension of the measures is predefined to contain “name” and “unit” only, they are sufficient to meter any kind of resource usage. In addition, the accumulate and aggregate functions can be used to perform different kinds of calculations.

Time-Based

Time-based metrics are stateful. Abacus stores the state of the resource instance and uses it to calculate the result on request.

The linux-container plan contains the gigabytes per hour. The usage is ongoing and grows over time.

It is important to note that time-based metrics often use compound data structures to keep track of the usage. For example, you submit the previous and the current measures of the container resource to calculate the GB/h usage.

Sample Time-Based Metric

The example basic linux container metering plan is a time-based plan. It measures memory consumption over time.

To start application A with an instance of 1 GB, a Resource Provider submits these measures:

current_running_instances: 1,
current_instance_memory: 1073741824,
previous_running_instances: 0,
previous_instance_memory: 0

To update application A with 1 instance of 1 GB to 2 instances of 2 GB, a Resource Provider submits measures:

current_running_instances: 2,
current_instance_memory: 2147483648,
previous_running_instances: 1,
previous_instance_memory: 1073741824

To stop application A, a Resource Provider submits the following measures:

current_running_instances: 0,
current_instance_memory: 0,
previous_running_instances: 2,
previous_instance_memory: 2147483648

The algorithm works like this:

  • When the application had consumed memory in the past before it was stopped (or will consume in the future after it is started), it would add negative consumption
  • When the application had not consumed memory in the past before it was started (or will not consume in the future after it is stopped), it would add positive consumption

The plan works with out-of-order data submission and guarantees correctness, given there is no missing usage submission. This basically means that the previous usage has to be submitted together with the current one.

Furthermore it works only within the time-window, meaning that the calculated numbers would be wrong if:

  • The usage is for the period outside of from (start of the month) and to (end of the month)
  • The earliest event usage submitted for that time period ('from' -> 'to') is not a start (with previous values set to 0)

Internally, the metrics use a compound data structure consisting of:

  • consuming: the latest GB (event time)
  • consumed: the "memory balance" that the app has consumed. The number is relative to the time boundary as described above.

Example 1:

Let's go through the formula with a simple example:

  1. If the time period is from the 1st to the 30th of a given month, we have start=1 and end=30
  2. An application starts consuming 1 GB on the 20th of the given month.
  3. consumed will be the amount that the app is not consuming (from the start of the month till 20th).
  4. From the 20th till the end of the month the app will consume = 20 - 10 * direction(+1) = 10

If a Report Consumer requests a report on the 30th, then consumed will be the amount that the app has been consuming (start of the month till the 30th) + the amount that the app would be idle * direction(-1) / 2 = (10 - 30 + 0) * -1 / 2 = 10.

Example 2:

If there is a stop event on the 25th: consuming = 0, then consumed will be: the previous consumed - the amount that the app has been consuming (start -> 25th) + the amount that the app would be idle (25th -> end) = 10 - 25 + 5 = -10 * direction(-1) = 10

If a Report Consumer requests a report on the 30th, since consuming is 0, we will calculate consumed as (10) / 2 = 5.

Example 3:

Let's use a real example of a submission:

  1. An hour window from: 1467280800000 (Thu Jun 30 2016 03:00:00 GMT-0700 (PDT)) and to: 1467284400000 (Thu Jun 30 2016 04:00:00 GMT-0700 (PDT))
  2. event time: 1467283200000 (Thu Jun 30 2016 03:40:00 GMT-0700 (PDT))
  3. consuming = 1 GB
  4. A Report Consumer requests a report at the end of the time window (to)
  5. The application has been consuming 1 GB for 20 minutes: 1 GB * 20 minutes / 1 hour = 0.33333 GB/h

The result of this submission in the pipeline would be:

  1. consuming = 1
  2. consumed = 1 * ((1467280800000 - 1467283200000) + (1467284400000 - 1467283200000)) = -1200000
  3. since: 1467283200000 (used to keep track of the most up-to-date consuming)

The consumed would be negative because this is relative to the from and to window. If the event time is > 1/2 of the window, it will results in a negative number. This is fine, because when on report generation, the summarize function would make sense of the number.

If a Report Consumer requests a summary at the end of the window to: 1467284400000 (Thu Jun 30 2016 04:00:00 GMT-0700 (PDT)), we will get:

  1. consumed = current consuming * -1 * ((1467280800000 - 1467284400000) + (1467284400000 - 1467284400000)) = 3600000
  2. summary = (current consumed + consumed) / 2 / 3600000 = (-1200000 + 3600000) / 2 / 3600000 = 0.33 GB/h

That's exactly the amount the instance has consumed in the window: 20 / 60 = 0.33333 GB/h

Carry-over

The time-based usage metrics are carried over into each new monthly database partition by the cf-renewer application. It transfers the active resource consumption from the previous month into the current one.

Warning:

The cf-renewer application supports only plans with "pure" time-based metrics. This means that any usage documents with a metering plan that has both discrete and time-based metrics will be ignored!

Clone this wiki locally