ai:formats-faq
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
ai:formats-faq [2023/11/26 03:07] – [How much DRAM/VRAM do I need?] make it more approachable naptastic | ai:formats-faq [2023/12/16 16:13] (current) – [What hardware works?] naptastic | ||
---|---|---|---|
Line 26: | Line 26: | ||
* Intel: Newer than (???) | * Intel: Newer than (???) | ||
* AMD: Zen architecture. | * AMD: Zen architecture. | ||
- | * (there might be a way around some of this by compiling things from source, but... please no. If you don't have and can't get a new enough CPU, this hobby is too expensive for you.) | + | * William Schaub has [[https:// |
* Most users have more CPU-attached DRAM than GPU-attached VRAM, so more models can run via CPU inference. | * Most users have more CPU-attached DRAM than GPU-attached VRAM, so more models can run via CPU inference. | ||
* CPU/DRAM inference is orders of magnitude slower than GPU/VRAM inference. (More info needed.) | * CPU/DRAM inference is orders of magnitude slower than GPU/VRAM inference. (More info needed.) | ||
Line 89: | Line 89: | ||
* Quantization can vary within the same model. | * Quantization can vary within the same model. | ||
* The filename tells you what you need to know about a model before you download it. The end will be something like: | * The filename tells you what you need to know about a model before you download it. The end will be something like: | ||
- | * Qx_0 use the same bpw as they say on the tin for all tensors. | + | * Q♣_0 |
- | * Qx_K_S | + | * Q♣_K_S |
- | * Qx_K_M | + | * Q♣_K_M |
- | * Qx_K_L | + | * Q♣_K_L |
* exl2 - GPU only. New hotness. (Why though?) At time of writing (2023-11-11), | * exl2 - GPU only. New hotness. (Why though?) At time of writing (2023-11-11), |
ai/formats-faq.1700968055.txt.gz · Last modified: 2023/11/26 03:07 by naptastic