LLMs
Speculative Decoding in the Wild
How vLLM, SGLang, and TensorRT-LLM implement Eagle speculation: tree attention, KV cache tricks, and CUDA graph trade-offs.
Lean In: As Code Becomes Cheap
As code becomes cheap, leverage shifts from scaling through culture to scaling through specifications. Lean 4 fundamentals, context engineering, and a verified proof that speculative decoding preserves the target distribution.