Implement make_opt_flags function for XPU

by YangKai0616 - opened 9 days ago

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

+221

-53

YangKai0616

9 days ago

No description provided.

Add support for XPU to run gpt-ossc1e53ae4

Update test_matmul.py to support UT of XPUa32f88a2

YangKai0616

5 days ago

•

edited 3 days ago

This PR submits intel-xpu-backend-for-triton PR content to support running the MXFP4 version of GPT-OSS using transformers on XPU. The current torch2.8+triton3.4 version for XPU does not support running directly with kernels-community/triton_kernels. Therefore, this PR is implemented based on torch2.9+triton3.5 for XPU.

I tested this PR on A100 with no errors.

YangKai0616 changed pull request status to open 5 days ago

YangKai0616

5 days ago

@marcsun13 , pls help review. Thanks!

YangKai0616

5 days ago

@MatrixYao

pcuenq

kernels-community org 3 days ago

cc @danieldk @drbh

marcsun13

kernels-community org 3 days ago

Hey @YangKai0616 , thanks for the PR ! Happy to merge this but I would be better to upstream the modification in triton directly as you implemented this based on triton 3.5 and torch 2.9. When the version will be out, I will most likely update the kernels, so any modification done previously will be overwritten and you will have to make a new PR. Another solution would be to wait until I do the changes when the new version are released wdyt ? Right now, most users won't be able to use this unless they manage to build triton from source.

MatrixYao

2 days ago

Thx very much @marcsun13 for your kind review. The thing a bit tricky is that intel triton backend hasn't upstreamed to triton yet, so currently triton xpu backend can work compatible with triton, but not built-in in triton. This reality leads that it's hard(or impossible) for us to upstream it to triton directly. So, to make transformers GPT OSS can work on XPU, we need do the labor to upstreaming xpu related opt_flag manually into this kernel repo. I think it's inevitable for Intel to pay the pain before we upstreamed xpu into triton. So, is it possible to do below:

You help merge this PR
You update per upstream triton for the 3.5 changes when needed (without need to consider XPU)
We Intel engineers will monitor(actually we already did) model healthiness on xpu, and will submit new PRs if we found XPU not work for your review.

Thx very much

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment