Lower precision reduces VRAM requirements but may affect quality
Number of inputs to process simultaneously
Sequence Length: 2,048
Max tokens to process (affects KV cache)
0.0%
0 GB
of 12 GB VRAM
with FP16 precision
Mode: Inference