vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 20.3k

Code
Issues 844
Pull requests 263
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

263 Open 1,794 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Bugfix] Fix illegal memory access for lora

#5169 opened May 31, 2024 by sfc-gh-zhwang • Draft

[Bugfix] Fix KeyError: 1 When Using LoRA adapters

#5164 opened May 31, 2024 by BlackBird-Coding

Loading…

[Kernel] Pass a device pointer into the quantize kernel for the scales

#5159 opened May 31, 2024 by tlrmchlsmth

Loading…

[Core] Optimize gpu_memory_utilization parameter to be applied on the GPU memory available after peak_memory

#5158 opened May 31, 2024 by alexm-neuralmagic

Loading…

[Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to reduce binary size

#5157 opened May 31, 2024 by tlrmchlsmth • Draft

[Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py

#5151 opened May 31, 2024 by dashanji

Loading…

[KERNEL] int8 quantization kernel refactoring & optimization WIP

#5146 opened May 31, 2024 by ZelboK

Loading…

add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088

#5145 opened May 30, 2024 by alexm-neuralmagic

Loading…

[Kernel] Update Cutlass fp8 configs

#5144 opened May 30, 2024 by varun-sundar-rabindranath

Loading…

[Bugfix] Fix KV head calculation for MPT models when using GQA

#5142 opened May 30, 2024 by bfontain

Loading…

[CI/Build] Test buildkite monorepo plugin

#5140 opened May 30, 2024 by dgoupil

Loading…

[Frontend]token_ids are useless param sent to the logit_bias_logits_processor.

#5139 opened May 30, 2024 by Etelis

Loading…

[Core] Remove unnecessary copies in flash attn backend

#5138 opened May 30, 2024 by Yard1

Loading…

[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU

#5137 opened May 30, 2024 by tlrmchlsmth

Loading…

[Feature][Frontend]: Add support for stream_options in ChatCompletionRequest

#5135 opened May 30, 2024 by Etelis

Loading…

[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier

#5131 opened May 30, 2024 by sroy745

Loading…

[Misc] Simplify code and fix type annotations in conftest.py

#5118 opened May 30, 2024 by DarkLight1337

Loading…

[Kernel] Add w4a16 support for compressed_tensors models

#5116 opened May 30, 2024 by dsikka

Loading…

New CI template on AWS stack

#5110 opened May 29, 2024 by khluu

Loading…

[Speculative Decoding] Enable arbitrary model inputs

#5101 opened May 29, 2024 by abhigoyal1997 • Draft

1 of 8 tasks

[CI/Build] Simplify OpenAI server setup in tests

#5100 opened May 29, 2024 by DarkLight1337 • Draft

[Misc] Add vLLM version getter to utils

#5098 opened May 29, 2024 by DarkLight1337

Loading…

New vllm CLI

#5090 opened May 28, 2024 by EthanqX

Loading…

[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint

#5074 opened May 27, 2024 by youkaichao

Loading…

[CI/Build][Misc] Add scripts that performs a fair comparison between vLLM and alternatives (TGI and TRT)

#5073 opened May 27, 2024 by KuntaiDu

Loading…

1 of 4 tasks

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly