# LitServe
**Repository Path**: July1921/LitServe
## Basic Information
- **Project Name**: LitServe
- **Description**: https://github.com/Lightning-AI/LitServe
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: 2-zmq
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-02-24
- **Last Updated**: 2025-02-24
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Easily serve AI models Lightning fast ⚡
Lightning-fast serving engine for AI models.
Easy. Flexible. Enterprise-scale.
----
**LitServe** is an easy-to-use, flexible serving engine for AI models built on FastAPI. It augments FastAPI with features like batching, streaming, and GPU autoscaling eliminate the need to rebuild a FastAPI server per model.
LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-specific multi-worker handling.
✅ (2x)+ faster serving ✅ Easy to use ✅ LLMs, non LLMs and more
✅ Bring your own model ✅ PyTorch/JAX/TF/... ✅ Built on FastAPI
✅ GPU autoscaling ✅ Batching, Streaming ✅ Self-host or ⚡️ managed
✅ Compound AI ✅ Integrate with vLLM and more
[](https://discord.gg/WajDThKAur)

[](https://codecov.io/gh/Lightning-AI/litserve)
[](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
# Quick start
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
```bash
pip install litserve
```
### Define a server
This toy example with 2 models (AI compound system) shows LitServe's flexibility ([see real examples](#examples)):
```python
# server.py
import litserve as ls
# (STEP 1) - DEFINE THE API (compound AI system)
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
# setup is called once at startup. Build a compound AI system (1+ models), connect DBs, load data, etc...
self.model1 = lambda x: x**2
self.model2 = lambda x: x**3
def decode_request(self, request):
# Convert the request payload to model input.
return request["input"]
def predict(self, x):
# Easily build compound systems. Run inference and return the output.
squared = self.model1(x)
cubed = self.model2(x)
output = squared + cubed
return {"output": output}
def encode_response(self, output):
# Convert the model output to a response payload.
return {"output": output}
# (STEP 2) - START THE SERVER
if __name__ == "__main__":
# scale with advanced features (batching, GPUs, etc...)
server = ls.LitServer(SimpleLitAPI(), accelerator="auto", max_batch_size=1)
server.run(port=8000)
```
Now run the server via the command-line
```bash
python server.py
```
### Test the server
Run the auto-generated test client:
```bash
python client.py
```
Or use this terminal command:
```bash
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
```
### LLM serving
LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)).
For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe).
```
litgpt serve microsoft/phi-2
```
### Summary
- LitAPI lets you easily build complex AI systems with one or more models ([docs](https://lightning.ai/docs/litserve/api-reference/litapi)).
- Use the setup method for one-time tasks like connecting models, DBs, and loading data ([docs](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
- LitServer handles optimizations like batching, GPU autoscaling, streaming, etc... ([docs](https://lightning.ai/docs/litserve/api-reference/litserver)).
- Self host on your own machines or use Lightning Studios for a fully managed deployment ([learn more](#hosting-options)).
[Learn how to make this server 200x faster](https://lightning.ai/docs/litserve/home/speed-up-serving-by-200x).
# Featured examples
Use LitServe to deploy any model or AI service: (Compound AI, Gen AI, classic ML, embeddings, LLMs, vision, audio, etc...)
## Examples
Toy model: Hello world
LLMs: Llama 3.2, LLM Proxy server, Agent with tool use
RAG: vLLM RAG (Llama 3.2), RAG API (LlamaIndex)
NLP: Hugging face, BERT, Text embedding API
Multimodal: OpenAI Clip, MiniCPM, Phi-3.5 Vision Instruct, Qwen2-VL, Pixtral
Audio: Whisper, AudioCraft, StableAudio, Noise cancellation (DeepFilterNet)
Vision: Stable diffusion 2, AuraFlow, Flux, Image Super Resolution (Aura SR),
Background Removal, Control Stable Diffusion (ControlNet)
Speech: Text-speech (XTTS V2), Parler-TTS
Classical ML: Random forest, XGBoost
Miscellaneous: Media conversion API (ffmpeg), PyTorch + TensorFlow in one API, LLM proxy server
[Browse 100+ community-built templates](https://lightning.ai/studios?section=serving)
# Features
State-of-the-art features:
✅ [(2x)+ faster than plain FastAPI](#performance)
✅ [Bring your own model](https://lightning.ai/docs/litserve/features/full-control)
✅ [Build compound systems (1+ models)](https://lightning.ai/docs/litserve/home)
✅ [GPU autoscaling](https://lightning.ai/docs/litserve/features/gpu-inference)
✅ [Batching](https://lightning.ai/docs/litserve/features/batching)
✅ [Streaming](https://lightning.ai/docs/litserve/features/streaming)
✅ [Worker autoscaling](https://lightning.ai/docs/litserve/features/autoscaling)
✅ [Self-host on your machines](https://lightning.ai/docs/litserve/features/hosting-methods#host-on-your-own)
✅ [Host fully managed on Lightning AI](https://lightning.ai/docs/litserve/features/hosting-methods#host-on-lightning-studios)
✅ [Serve all models: (LLMs, vision, etc.)](https://lightning.ai/docs/litserve/examples)
✅ [Scale to zero (serverless)](https://lightning.ai/docs/litserve/features/streaming)
✅ [Supports PyTorch, JAX, TF, etc...](https://lightning.ai/docs/litserve/features/full-control)
✅ [OpenAPI compliant](https://www.openapis.org/)
✅ [Open AI compatibility](https://lightning.ai/docs/litserve/features/open-ai-spec)
✅ [Authentication](https://lightning.ai/docs/litserve/features/authentication)
✅ [Dockerization](https://lightning.ai/docs/litserve/features/dockerization-deployment)
[10+ features...](https://lightning.ai/docs/litserve/features)
**Note:** We prioritize scalable, enterprise-level features over hype.
# Performance
LitServe is designed for AI workloads. Specialized multi-worker handling delivers a minimum **2x speedup over FastAPI**.
Additional features like batching and GPU autoscaling can drive performance well beyond 2x, scaling efficiently to handle more simultaneous requests than FastAPI and TorchServe.
Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/benchmarks) (higher is better).
These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...).
***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/vLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom vLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.
# Hosting options
LitServe can be hosted independently on your own machines or fully managed via Lightning Studios.
Self-hosting is ideal for hackers, students, and DIY developers, while fully managed hosting is ideal for enterprise developers needing easy autoscaling, security, release management, and 99.995% uptime and observability.
| Feature | Self Managed | Fully Managed on Studios |
|----------------------------------|-----------------------------------|-------------------------------------|
| Deployment | ✅ Do it yourself deployment | ✅ One-button cloud deploy |
| Load balancing | ❌ | ✅ |
| Autoscaling | ❌ | ✅ |
| Scale to zero | ❌ | ✅ |
| Multi-machine inference | ❌ | ✅ |
| Authentication | ❌ | ✅ |
| Own VPC | ❌ | ✅ |
| AWS, GCP | ❌ | ✅ |
| Use your own cloud commits | ❌ | ✅ |
# Community
LitServe is a [community project accepting contributions](https://lightning.ai/docs/litserve/community) - Let's make the world's most advanced AI inference engine.
💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)
📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)