No description provided.

@frreiss Please take a look at the model card for query rewrite and comment.

IBM Granite org

Thanks for creating this model card!

Review comments follow:

  • Please refer to this intrinsic as an intrinsic, not an adapter. The current set of LoRA and aLoRA adapters are implementations of the intrinsic. Given the difficulty that users have had loading LoRA adapters, it is likely that we will be releasing non-LoRA implementations of some of our intrinsics, for example by supervised fine tuning of of Granite 4.0 Micro.
  • The quickstart example should be self-contained. There should be step-by step instructions for downloading the LoRA adapter, installing vLLM, starting vLLM, and installing granite-common. These instructions should be followed by a self-contained code snippet that users can copy and paste into a Python file or Jupyter notebook. It is ok to reference the general-purpose notebook in addition to providing these instructions, but a reference to the notebook is not a substitute for providing self-contained "getting started" instructions in the model card.
  • Please do not tell users to use files in our regression test data directory tests/granite_common/intrinsics/rag/testdata. The contents and locations of these files may change at any time. For all of our intrinsics, the io.yaml file of record is located in the same directory as the LoRA or aLoRA adapter files. For example, the IO configuration file for the query rewrite intrinsic implemented as a Granite 3.3 8b LoRA adapter is located at https://huggingface.co/ibm-granite/rag-intrinsics-lib/blob/main/query_rewrite/lora/granite-3.3-8b-instruct/io.yaml. Some of our gpt-oss intrinsics have different IO configurations, so it is very important to provide clear instructions to use the YAML file from the LoRA adapter's directory. The example model input of record for the each intrinsic should be in a code block inside the model card of the intrinsic. Please do not reference the test case input file, even if that file is currently identical to the what is in the model card.
  • The title for the second quickstart section should be "Example using Hugging Face Transformers". Hugging Face (not "HuggingFace") is a corporation in New York City. "Transformers" is the name of a library that Hugging Face maintains. We need to ensure that we use our business partners' trademarks correctly.
  • The quickstart example of Transformers usage should not perform any manipulation of string prompts and should not show raw string outputs of the model. We do not want to encourage new users to interact with our intrinsics in this way. As with the OpenAI quickstart example, the Transformers example should describe the input and output of the intrinsic in terms of OpenAI-compatible chat completion requests and should show how to apply our standard input and output processing to these requests. There is example code for running all of our intrinsics on Transformers in the notebook at this URL. You may if you wish include a reference to this notebook in lieu of having a quickstart example for Transformers usage.
    If you would like, you may also include an additional advanced example for advanced users who wish to manipulate low level prompts and low-level model outputs. This additional example should be proceeded with text that indicates at a minimum the following:
    • Users should only do this kind of raw string manipulation if they know what they are doing. There are many subtle ways to produce a wrong result when interacting directly with low-level APIs.
    • The input to the tokenizer's chat template is different for different base models. For example, gpt-oss-20b does not accept a documents parameter.
    • The raw string input prompt format of the intrinsic's LoRA adapter is different for different base models.
    • The raw string output format is different for different base models, with different special tokens and different output "channels", sometimes even within the same model family.
    • Without constrained decoding, the raw output format of the intrinsic is not guaranteed to be valid JSON that adheres the expected schema. Some of the other intrinsics in this library do not produce valid JSON at all if constrained decoding is not enabled.
  • The section on "robustness to system prompts" probably needs to be updated to reflect the existence of gpt-oss variants of this intrinsic.
  • The evaluation section references an image file that appears not to be present.

@frreiss I modified the file, in light of your comments. Please take a look at the new version.

kgreenewald changed pull request status to merged

Sign up or log in to comment