-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Fix KeyError: 1 When Using LoRA adapters
#5164
opened May 31, 2024 by
BlackBird-Coding
Loading…
[Kernel] Pass a device pointer into the quantize kernel for the scales
#5159
opened May 31, 2024 by
tlrmchlsmth
Loading…
[Core] Optimize gpu_memory_utilization parameter to be applied on the GPU memory available after peak_memory
#5158
opened May 31, 2024 by
alexm-neuralmagic
Loading…
[Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to reduce binary size
#5157
opened May 31, 2024 by
tlrmchlsmth
•
Draft
[Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py
#5151
opened May 31, 2024 by
dashanji
Loading…
[KERNEL] int8 quantization kernel refactoring & optimization WIP
#5146
opened May 31, 2024 by
ZelboK
Loading…
add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088
#5145
opened May 30, 2024 by
alexm-neuralmagic
Loading…
[Bugfix] Fix KV head calculation for MPT models when using GQA
#5142
opened May 30, 2024 by
bfontain
Loading…
[Frontend]token_ids are useless param sent to the logit_bias_logits_processor.
#5139
opened May 30, 2024 by
Etelis
Loading…
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU
#5137
opened May 30, 2024 by
tlrmchlsmth
Loading…
[Feature][Frontend]: Add support for
stream_options
in ChatCompletionRequest
#5135
opened May 30, 2024 by
Etelis
Loading…
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier
#5131
opened May 30, 2024 by
sroy745
Loading…
[Misc] Simplify code and fix type annotations in
conftest.py
#5118
opened May 30, 2024 by
DarkLight1337
Loading…
[Kernel] Add
w4a16
support for compressed_tensors
models
#5116
opened May 30, 2024 by
dsikka
Loading…
[Speculative Decoding] Enable arbitrary model inputs
#5101
opened May 29, 2024 by
abhigoyal1997
•
Draft
1 of 8 tasks
[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint
#5074
opened May 27, 2024 by
youkaichao
Loading…
[CI/Build][Misc] Add scripts that performs a fair comparison between vLLM and alternatives (TGI and TRT)
#5073
opened May 27, 2024 by
KuntaiDu
Loading…
1 of 4 tasks
Previous Next
ProTip!
Adding no:label will show everything without a label.