From 068b9a8951a8e80343027b8432b77d5962b63add Mon Sep 17 00:00:00 2001 From: Roman Blanco Date: Mon, 17 Oct 2016 09:55:34 +0200 Subject: [PATCH] Improved readability for Capacity and Utilization docs --- ..._and_utilization_collection_explanation.md | 41 +++++++++++-------- 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/architecture/capacity_and_utilization_collection_explanation.md b/architecture/capacity_and_utilization_collection_explanation.md index 1b0dabd8..96bfd9d3 100644 --- a/architecture/capacity_and_utilization_collection_explanation.md +++ b/architecture/capacity_and_utilization_collection_explanation.md @@ -1,23 +1,23 @@ # Capacity And Utilization -1. A capture request is kicked off by the schedule worker every so often (configurable, I think 10 minutes with a 50 minute threshold or something). -1. The capture request finds all VMS and Hosts (and Storages, but those are different) enabled for capture (configurable). -1. For each target it then queues up a capture. -1. A collector worker will pick up one of these work items, collect the data since the last capture, and write new records in the metrics table. -1. Then the worker queues up a rollup (explained below). -1. A processor worker will pick up one of these rollup work items and queue up ANOTHER rollup for the next stage of the rollup chain. +1. A capture request is kicked off by the schedule worker every so often (configurable, 10 minutes with a 50 minute threshold). +1. The capture request finds all VMs and Hosts (and Storages, but those are different) enabled for capture (configurable). +1. For each target it then queues up a capture. +1. A collector worker will pick up one of these work items, collect the data since the last capture, and write new records in the metrics table. +1. Then the worker queues up a rollup (explained below). +1. A processor worker will pick up one of these rollup work items and queue up ANOTHER rollup for the next stage of the rollup chain. 1. This continues until we hit the end of the chain. -After enabling the Capacity & Utilization Collector Role, data collection begins immediately. However, the first collection begins 5 minutes after the CFME Server is started, and every 10 minutes after that. Therefore, the longest the collection will take after enabling the Capacity & Utilization Collector CFME Server Role is 10 minutes. The first collection from a particular management system may take a few minutes since CFME is gathering data points going one month back in time +After enabling the Capacity & Utilization Collector Role, data collection begins immediately. However, the first collection begins 5 minutes after the CFME Server is started, and every 10 minutes after that. Therefore, the longest the collection will take after enabling the Capacity & Utilization Collector CFME Server Role is 10 minutes. The first collection from a particular management system may take a few minutes since CFME is gathering data points going one month back in time ## Rollups -There are two types of rollups, **time-based** and **infrastructure-based**. -* **Time-based** rollups go from realtime -> hourly -> daily. -* **Infrastructure-based** rollups for hourly go from VM -> Host -> EmsCluster -> ExtManagementSystem -> MiqRegion (and maybe to MiqEnterprise?). +There are two types of rollups, **time-based** and **infrastructure-based**. +* **Time-based** rollups go from realtime → hourly → daily. +* **Infrastructure-based** rollups for hourly go from `Vm` → `Host` → `EmsCluster` → `ExtManagementSystem` → `MiqRegion` (and maybe to `MiqEnterprise`?). ### Example -Say we do a capture on a VM and we get back data with timestamps between 4:05 and 4:15. This would cause records to be written in the metrics for each interval. Then, it would put a rollup on the queue for that VM for the 4:00 hour (time-based). A processor worker will pick up that queue item, gather all of the real-time records for that VM for the 4:00 hour, and write a rollup hourly record for that VM. Then it will queue up 2 more rollups. One rollup is for the parent Host of that VM for the 4:00 hour (infrastructure-based), and another is for that VM for the day (time-based). +Say we do a capture on a VM and we get back data with timestamps between 4:05 and 4:15. This would cause records to be written in the metrics for each interval. Then, it would put a rollup on the queue for that VM for the 4:00 hour *(time-based)*. A processor worker will pick up that queue item, gather all of the real-time records for that VM for the 4:00 hour, and write a rollup hourly record for that VM. Then it will queue up 2 more rollups. One rollup is for the parent Host of that VM for the 4:00 hour *(infrastructure-based)*, and another is for that VM for the day *(time-based)*. Below is the full tree of rollups that will occur: @@ -36,7 +36,7 @@ Vm (realtime collected) ~~~ -That is the simplest description. In reality, there are some nuances that should be mentioned. +That is the simplest description. In reality, there are some nuances that should be mentioned. * **We collect data that spans hours** (e.g. we collect 3:50-4:15), so a separate rollup is put on the queue for each hour in question, and the chain begins separately for each. @@ -46,14 +46,21 @@ That is the simplest description. In reality, there are some nuances that shoul * **Storages are slightly different. Their information is collected from our storage scans**, so if you've never scanned a Storage, you won't get any data. Storages rollup directly to their EMS, I think. Also, they are run on a different schedule. -* **Cloud rollups** (coming soon) will go from Vm -> Availability Zone -> ExtManagementSystem -> MiqRegion. +* **Cloud rollups** *(coming soon)* will go from `Vm` → `Availability Zone` → `ExtManagementSystem` → `MiqRegion`. -##Notes on Testing Rollups +## Notes on Testing Rollups * **Rollups are automatic behind the scenes and are triggered by a collection**. Therefore, if you try to manually inject data, you are not really running a collection and thus won't get rollups. -* **A capture of a Vm can be kicked off in a rails console with vm.perf_capture("realtime")**. The rollups on the queue can be executed without a worker by just delivering them from the queue If you want to fake creating rollups, you can just do vm.perf_rollup_to_parent("realtime", start_time, end_time), which queues them up and starts the chain. +* **A capture of a Vm can be kicked off in a rails console with `vm.perf_capture("realtime")`**. The rollups on the queue can be executed without a worker by just delivering them from the queue -* **Database tables are not ordered sets of data, so if you did a straight query they are not guaranteed to appear in any particular order.** In addition, due to the nature of multiple workers, data may get written in different orders, especially if records have to be updated. It may be helpful to order by timestamp and filter against resource_type, resource_id, capture_interval_name. + ``` ruby + q = MiqQueue.find + q.delivered(*q.deliver) + ``` + If you want to fake creating rollups, you can just do `vm.perf_rollup_to_parent("realtime", start_time, end_time)`, which queues them up and starts the chain. -##TODO: Notes on why we use Postgres inheritance, and why metrics and metrics_rollups are in separate tables. +* **Database tables are not ordered sets of data, so if you did a straight query they are not guaranteed to appear in any particular order.** In addition, due to the nature of multiple workers, data may get written in different orders, especially if records have to be updated. It may be helpful to order by timestamp and filter against `resource_type`, `resource_id`, `capture_interval_name`. + + +## TODO: Notes on why we use Postgres inheritance, and why metrics and metrics_rollups are in separate tables.