Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public API / documentation for # generations executed #161

Open
chimaerase opened this issue Nov 12, 2024 · 2 comments
Open

Public API / documentation for # generations executed #161

chimaerase opened this issue Nov 12, 2024 · 2 comments

Comments

@chimaerase
Copy link

chimaerase commented Nov 12, 2024

The number of generations executed (not planned as in the generations param) is prominently featured in TPOTEstimator's output, but doesn't appear to be accessible via its public API. I am testing progressively longer hyperparameter optimization of several ML models using TPOT2, and it would be useful to get this information directly from public / documented API that I have some assurance will be durable over time.

The best / only way I've found to get this so far is from estimator._evolver_instance.generation. estimator.evaluated_individuals seems to have the first generation each model was introduced, which isn't informative in recording when / why my optimization hit an early stop limit.

@perib
Copy link
Collaborator

perib commented Jan 3, 2025

New individuals are created every generation. The number of generations executed would be the highest generation number in estimator.evaluated_individuals. (Though sometimes the last generation may not be evaluated if the timeout happens to be after creating a new population but before evaluation, so technically the highest generation with either a score or eval error. If both score and eval error are non, the pipeline was not evaluated).

Every generation new individuals must be created.

@chimaerase
Copy link
Author

This seems like very important information given how prominently the generation numbers are displayed in the output. And it's important for my use case.

Given the number of special cases you describe, my suggestion is to implement this in the code so users don't have to know & write/maintain their own code against these details, which i assume could reasonably change during maintenance over the lifetime of the package.

E.g. something simple like:

@property
def generations_attempted() -> int:
   # find max generation number with either a score or eval error

@property
def generations_completed() -> int:
   # find max generation number with all scores

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants