jerryzh168 commited on
Commit
52e01da
·
verified ·
1 Parent(s): 8ecebaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -2
README.md CHANGED
@@ -98,9 +98,10 @@ lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-int4wo-hqq --tas
98
  Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch size 1 performance with vllm.
99
  For batch size N, please see our [gemlite checkpoint](https://huggingface.co/jerryzh168/phi4-mini-int4wo-gemlite).
100
 
101
- # Install latest vllm to get the most recent changes
102
  ```
103
- pip install git+https://github.com/vllm-project/vllm.git
 
104
  ```
105
 
106
  # Download dataset
@@ -108,6 +109,9 @@ Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/
108
 
109
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
110
  # benchmark_latency
 
 
 
111
  ## baseline
112
  ```
113
  python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
@@ -122,6 +126,9 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
122
 
123
  We also benchmarked the throughput in a serving environment.
124
 
 
 
 
125
  ## baseline
126
  Server:
127
  ```
 
98
  Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch size 1 performance with vllm.
99
  For batch size N, please see our [gemlite checkpoint](https://huggingface.co/jerryzh168/phi4-mini-int4wo-gemlite).
100
 
101
+ # Download vllm source code and install vllm
102
  ```
103
+ git clone git@github.com:vllm-project/vllm.git
104
+ VLLM_USE_PRECOMPILED=1 pip install .
105
  ```
106
 
107
  # Download dataset
 
109
 
110
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
111
  # benchmark_latency
112
+
113
+ Run the following under vllm source code root folder:
114
+
115
  ## baseline
116
  ```
117
  python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
 
126
 
127
  We also benchmarked the throughput in a serving environment.
128
 
129
+
130
+ Run the following under `vllm` source code root folder:
131
+
132
  ## baseline
133
  Server:
134
  ```