-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pre-generated prompts option for benchmark #1091
base: main
Are you sure you want to change the base?
Add pre-generated prompts option for benchmark #1091
Conversation
f8d1bdc
to
0b19f52
Compare
0b19f52
to
243e7ea
Compare
@@ -232,6 +240,9 @@ def run_benchmark(args, batch_size, prompt_length, generation_length, max_length | |||
# use random tokens instead of generating a prompt using the model and then tokenizing it | |||
tokens = np.random.randint(100, size=(batch_size, prompt_length)) | |||
prompt = [tokenizer.decode(tokens[0])] * batch_size | |||
elif args.use_prompt_set: | |||
prompt = get_prompt_by_length(prompt_length) | |||
tokens = tokenizer.encode_batch(prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different tokenizers can encode prompts into different prompt lengths. Some additional work is needed to get the desired prompt length. You can see an example of how to do this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so basically, we will check the tokens
length against requested prompt_length
and we will add/trim if it is needed. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct.
@@ -83,6 +83,14 @@ def generate_prompt(model, tokenizer, prompt_length, use_graph_capture) -> str: | |||
generator.generate_next_token() | |||
return tokenizer.decode(generator.get_sequence(0)) | |||
|
|||
# Use prompt length to get pre-defined prompt | |||
def get_prompt_by_length(prompt_length): | |||
json_path = "prompts.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of uploading another copy of prompts.json
, can we download it from here and save it to disk using requests
or urllib
instead? That way, only one location has to be updated when adding other prompts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to rely on external file? If that changes, we may have problem with this. Besides, in the benchmark environment, it seems logical to use local files instead of relying on internet connectivity. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If connectivity is a concern in your benchmark environment, then I think it's fine to include a copy of the file.
During benchmarking, we wanted to have pre-generated prompts that have been prepared for better benchmark result. Hence, It can be handy during benchmarking. In our test, we wanted to focus only token generation and sampling on SLM.