Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API infrastructure to support InstructLab tab display of Crucible metric statistics #129

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

dbutenhof
Copy link
Collaborator

@dbutenhof dbutenhof commented Nov 6, 2024

Type of change

  • Refactor
  • New feature
  • Bug fix
  • Optimization
  • Documentation Update

Description

The InstructLab tab currently supports graphing of multiple Crucible metrics when a run row is expanded. This provides backend API support to extend that to pull and display statistics for those metrics. This involves some refactoring of the multigraph API for commonality, along with minor adjustment of the UI action code to avoid breakage.

This is chained from #122 (Crucible service) -> #140 (unit test framework) -> #146 (crucible unit tests) -> #123 (ilab API) -> #155 (API unit tests) -> #158 (functional test framework) -> #124 (ilab UI) -> #153 (date picker) -> #125 (multi-run graphing API) -> #127 (multi-run graphing UI) -> #129 (statistics aggregation)

Related Tickets & Documents

PANDA-626: CPT dashboard backend, support metrics summary aggregation

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.

Testing

Manually testing against a local deployment. The UI behavior is unchanged.

Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Dec 18, 2024
@dbutenhof dbutenhof removed the Stale label Dec 19, 2024
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jan 18, 2025
@dbutenhof dbutenhof removed the Stale label Jan 20, 2025
dbutenhof and others added 11 commits January 28, 2025 10:55
This encapsulates substantial logic to encapsulate interpretation of the
Crucible Common Data Model OpenSearch schema for the use of CPT dashboard API
components. By itself, it does nothing.
This uses `black`, `isort` and `flake8` to check code quality, although
failure is ignored until we've cleaned it up (which has begin in
PR cloud-bulldozer#139 against the `revamp` branch).

Minimal unit testing is introduced, generating a code coverage report.
The text summary is added to the Action summary page, and the more
detailed HTML report is stored as an artifact for download.

NOTE: The GitHub Action environment is unhappy with `uvicorn` 0.15;
upgrading to the latest 0.32.x seems to work and hasn't obviously
broken anything else.
`crucible_svc.py` test coverage is now at 97%. While the remaining 3% is
worth some effort later, the law of diminishing returns will require A
significant additional effort; and since subsequent ILAB PRs will change
some of the service code anyway it's good enough for now.
Provide the `api/v1/ilab` API endpoint to allow a client to query
collected data on a Crucible CDM OpenSearch instance through the
`crucible_svc` service layer. It is backed by the Crucible layer added
in cloud-bulldozer#122, so only the final commit represents changes in this PR.
This covers 100% of the ilab.py API module using `FastAPI`'s `TestClient`.

This proved ... interesting ... as the FastAPI and Starlette versions we use
are incompatible with the underlying httpx version ... TestClient init fails
in a way that can't be worked around. (Starlette passes an unknown keyword
parameter.)

After some experimentation, I ended up "unlocking" all the API-related
packages in `project.toml` to `"*"` and letting `poetry update` resolve them,
then "re-locked" them to those versions. The resulting combination of modules
works for unit testing, and appears to work in a real `./local-compose.sh`
deployment as well.
This adds a mechanism to "can" and restore a small prototype ILAB (Crucible
CDM) Opensearch database in a pod along with the dashboard back end, front
end, and functional tests. The functional tests run entirely within the pod,
with no exposed ports and with unique container and pod names, allowing for
the possibility of simultaneous runs (e.g., a CI) on the same system.

This also has utilities for diagnosing a CDM (v7) datastore and cloning a
limited subset, along with creating an Opensearch snapshot from that data
to bootstrap the functional test pod.

Only a few functional test cases are implemented here, as demonstration. More
will be added separately.
This relies on the ilab API in cloud-bulldozer#123, which in turn builds on the crucible
service in cloud-bulldozer#122.
The `fetchILabJobs` action wasn't updating the date picker values from the API
response unless a non-empty list of jobs is returned. This means that on the
initial load, if the default API date range (1 month) doesn't find any jobs,
the displayed list is empty and the date range isn't updated to tell the user
what we've done.

I've seen no ill effects in local testing from simply removing the length
check, and now the date picker is updated correctly.
When graphing metrics from two runs, the timestamps rarely align; so we add a
`relative` option to convert the absolute metric timestamps into relative
delta seconds from each run's start.
This adds the basic UI to support comparison of the metrics of two InstructLab
runs. This compares only the primary metrics of the two runs, in a relative
timeline graph.

This is backed by cloud-bulldozer#125, which is backed by cloud-bulldozer#124, which is backed by cloud-bulldozer#123,
which is backed by cloud-bulldozer#122. These represent a series of steps towards a complete
InstructLab UI and API, and will be reviewed and merged from cloud-bulldozer#122 forward.
This PR is primarily CPT dashboard backend API (and Crucible service) changes
to support pulling and displaying multiple Crucible metric statistics. Only
minor UI changes are included to support API changes. The remaining UI changes
to pull and display statistics will be pushed separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants