ai:formats-faq
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
ai:formats-faq [2023/11/26 03:01] – [LLM Format Comparison] vastly improve GGML/GGUF info naptastic | ai:formats-faq [2023/12/16 16:13] (current) – [What hardware works?] naptastic | ||
---|---|---|---|
Line 26: | Line 26: | ||
* Intel: Newer than (???) | * Intel: Newer than (???) | ||
* AMD: Zen architecture. | * AMD: Zen architecture. | ||
- | * (there might be a way around some of this by compiling things from source, but... please no. If you don't have and can't get a new enough CPU, this hobby is too expensive for you.) | + | * William Schaub has [[https:// |
* Most users have more CPU-attached DRAM than GPU-attached VRAM, so more models can run via CPU inference. | * Most users have more CPU-attached DRAM than GPU-attached VRAM, so more models can run via CPU inference. | ||
* CPU/DRAM inference is orders of magnitude slower than GPU/VRAM inference. (More info needed.) | * CPU/DRAM inference is orders of magnitude slower than GPU/VRAM inference. (More info needed.) | ||
Line 44: | Line 44: | ||
=====How much DRAM/VRAM do I need?===== | =====How much DRAM/VRAM do I need?===== | ||
- | **Just to keep things clear** I will use the term **DRAM** to refer to CPU-attached RAM, which is generally DDR4 or DDR5, and **VRAM** to refer to GPU-attached RAM. Being much closer | + | **Just to keep things clear** I will use the term **DRAM** to refer to CPU-attached RAM, which is generally DDR4 or DDR5, and **VRAM** to refer to GPU-attached RAM. Being much closer |
(Or, "what models am I limited to?") | (Or, "what models am I limited to?") | ||
Line 89: | Line 89: | ||
* Quantization can vary within the same model. | * Quantization can vary within the same model. | ||
* The filename tells you what you need to know about a model before you download it. The end will be something like: | * The filename tells you what you need to know about a model before you download it. The end will be something like: | ||
- | * Qx_0 use the same bpw as they say on the tin for all tensors. | + | * Q♣_0 |
- | * Qx_K_S | + | * Q♣_K_S |
- | * Qx_K_M | + | * Q♣_K_M |
- | * Qx_K_L | + | * Q♣_K_L |
* exl2 - GPU only. New hotness. (Why though?) At time of writing (2023-11-11), | * exl2 - GPU only. New hotness. (Why though?) At time of writing (2023-11-11), |
ai/formats-faq.1700967694.txt.gz · Last modified: 2023/11/26 03:01 by naptastic