Can I Run This LLM?

Lower precision reduces VRAM requirements but may affect quality

Number of inputs to process simultaneously

Sequence Length: 2,048

Max tokens to process (affects KV cache)

4K
16K
32K
64K
128K

0.0%

Ready

0 GB

of 12 GB VRAM

with FP16 precision

Mode: Inference

;