vLLM
Speculative Decoding in the Wild
How vLLM, SGLang, and TensorRT-LLM implement Eagle speculation: tree attention, KV cache tricks, and CUDA graph trade-offs.
How vLLM, SGLang, and TensorRT-LLM implement Eagle speculation: tree attention, KV cache tricks, and CUDA graph trade-offs.