generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Jie WU <wujie@google.com> add request metrics Signed-off-by: Jie WU <wujie@google.com> rename api and metrics fix go mod Adding metrics handler Signed-off-by: Jie WU <wujie@google.com> Adding metrics handler Signed-off-by: Jie WU <wujie@google.com> add request metrics rename api and metrics fix mod Updated request metrics to be handled in server processing loop Signed-off-by: Jie WU <wujie@google.com> Updated request metrics to be handled in server processing loop Signed-off-by: Jie WU <wujie@google.com> fix go mod Signed-off-by: Jie WU <wujie@google.com> fix go mod Signed-off-by: Jie WU <wujie@google.com> remove preconfigured buffered response Signed-off-by: Jie WU <wujie@google.com> Add streamed response Signed-off-by: Jie WU <wujie@google.com> Handle latency with response Signed-off-by: Jie WU <wujie@google.com> refactor Signed-off-by: Jie WU <wujie@google.com> fmt Signed-off-by: Jie WU <wujie@google.com> fmt Signed-off-by: Jie WU <wujie@google.com> fmt Signed-off-by: Jie WU <wujie@google.com> refactor server Signed-off-by: Jie WU <wujie@google.com> metrics auth add docs and go mod tidy
- Loading branch information
Showing
13 changed files
with
684 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Documentation | ||
|
||
This documentation is the current state of exposed metrics. | ||
|
||
## Table of Contents | ||
* [Exposed Metrics](#exposed-metrics) | ||
* [Scrape Metrics](#scrape-metrics) | ||
|
||
## Exposed metrics | ||
|
||
| Metric name | Metric Type | Description | Labels | Status | | ||
| ------------|--------------| ----------- | ------ | ------ | | ||
| inference_model_request_total | Counter | The counter of requests broken out for each model. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> ` | ALPHA | | ||
| inference_model_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> ` | ALPHA | | ||
| inference_model_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> ` | ALPHA | | ||
|
||
## Scrape Metrics | ||
|
||
Metrics endpoint is exposed at port 9090 by default. To scrape metrics, the client needs a ClusterRole with the following rule: | ||
`nonResourceURLs: "/metrics", verbs: get`. | ||
|
||
Here is one example if the client needs to mound the secret to act as the service account | ||
``` | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: inference-gateway-metrics-reader | ||
rules: | ||
- nonResourceURLs: | ||
- /metrics | ||
verbs: | ||
- get | ||
--- | ||
apiVersion: v1 | ||
kind: ServiceAccount | ||
metadata: | ||
name: inference-gateway-sa-metrics-reader | ||
namespace: default | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRoleBinding | ||
metadata: | ||
name: inference-gateway-sa-metrics-reader-role-binding | ||
namespace: default | ||
subjects: | ||
- kind: ServiceAccount | ||
name: inference-gateway-sa-metrics-reader | ||
namespace: default | ||
roleRef: | ||
kind: ClusterRole | ||
name: inference-gateway-metrics-reader | ||
apiGroup: rbac.authorization.k8s.io | ||
--- | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
name: inference-gateway-sa-metrics-reader-secret | ||
namespace: default | ||
annotations: | ||
kubernetes.io/service-account.name: inference-gateway-sa-metrics-reader | ||
type: kubernetes.io/service-account-token | ||
``` | ||
Then, you can curl the 9090 port like following | ||
``` | ||
TOKEN=$(kubectl -n default get secret inference-gateway-sa-metrics-reader-secret -o jsonpath='{.secrets[0].name}' -o jsonpath='{.data.token}' | base64 --decode) | ||
kubectl -n default port-forward inference-gateway-ext-proc-pod-name 9090 | ||
curl -H "Authorization: Bearer $TOKEN" localhost:9090/metrics | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.