New Arrivals/Restock

vLLM in Practice: A Developer’s Guide to High-Performance Inference, Scalable Serving, and Efficient Large Language Model Deployment Kindle Edition

flash sale iconLimited Time Sale
Until the end
01
18
29

US$4.03 cheaper than the new price!!

Free shipping for purchases over $99 ( Details )
Free cash-on-delivery fees for purchases over $99
Please note that the sales price and tax displayed may differ between online and in-store. Also, the product may be out of stock in-store.
Used  US$2.68
quantity

Product details

Management number 222055232 Release Date 2026/05/04 List Price US$2.68 Model Number 222055232
Category

This book provides a clear and practical introduction to working with vLLM, a modern framework designed for efficient large language model inference and serving. Written for developers, engineers, and technical practitioners, it focuses on building a strong understanding of how to deploy and optimize models in real-world environments.Starting with the fundamentals of large language model inference, the book explains how vLLM improves throughput and memory efficiency through advanced scheduling and execution strategies. Readers will explore core concepts such as tokenization pipelines, batching techniques, and latency optimization, all presented in a structured and accessible manner.As the material progresses, the focus shifts toward hands-on implementation. You will learn how to configure vLLM for different workloads, integrate it into existing systems, and manage performance across a variety of deployment scenarios. Practical examples illustrate how to balance resource usage with responsiveness, making it easier to build scalable AI-powered applications.The book also addresses important operational considerations, including monitoring, debugging, and maintaining reliability in production systems. By the end, readers will have a solid foundation for using vLLM effectively, whether for experimentation, prototyping, or full-scale deployment.This guide is intended for those who want a focused, technically grounded resource without unnecessary complexity, providing a reliable pathway into modern LLM serving workflows. Read more

XRay Not Enabled
Language English
File size 441 KB
Page Flip Enabled
Word Wise Not Enabled
Print length 152 pages
Accessibility Learn more
Screen Reader Supported
Publication date March 27, 2026
Enhanced typesetting Enabled

Correction of product information

If you notice any omissions or errors in the product information on this page, please use the correction request form below.

Correction Request Form

Product Review

You must be logged in to post a review