-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda / DirectML question #1037
Comments
Which model are you using? When you download a model or use the model builder you'll have a genai_config.json file in the folder for it, and that file specifies which provider to use. We are working on being able to specify the provider at runtime but currently we will wind up with models that only run on one particular provider (due to having cuda specific ops that don't exist on cpu for example). |
do you know when it will be released? I would like to plan updates for my software, currently I'm using qwen 2.5 0.5b and 1.5b and I think llama3.2 1b all instructs |
@RyanUnderhill hey I just saw this https://github.com/microsoft/onnxruntime-genai/releases/tag/v0.5.1 is there or can you provide manual for setting upr provider at runtime in C# if possible? thanks |
The example here can be a start if you want to try it out. |
@skyline75489 it's not, If I understand correctly it should be possible determine provider at runtime in latest version, what you adressed is same old way.. can you take a look again on this thread and answer my question properly? thanks ✌️ I see there some OgaHandle wich was not present in previous versions and no docs saying what it does but nevertheless point is how to specify provider at runtime 🤷🏼♂️ I don't know if my user has cuda or directx installed because it's mac user. |
@RyanUnderhill any news to this topic? |
Hi there, you made fantastic framework for llms. But what I find very confusing is how to run this on cuda and direct ml. I simply don't know how to do it in C#..
I there any example? Second question is, do I have to provide different model per cuda, cpu and directml or can it run seamlessly? Or is there a way to convert model to support all or combination of providers? as far as I know onnx it self provides seamles support that's why it's a bit confusing.
My use case is to deploy to user's device a model and based on his capabilities to choose the provider which can provide best performance. But not in the opposite direction, because I expect my user to know nothing about the ML it self.
Thank you ✌️
The text was updated successfully, but these errors were encountered: