Differences

This shows you the differences between two versions of the page.

--- ai:formats-faq [2023/11/26 03:01] – [LLM Format Comparison] vastly improve GGML/GGUF info naptastic
+++ ai:formats-faq [2023/12/16 16:13] (current) – [What hardware works?] naptastic
@@ Line 26: / Line 26: @@
       * Intel: Newer than (???)
       * AMD: Zen architecture.
-      * (there might be a way around some of this by compiling things from source, but... please no. If you don't have and can't get a new enough CPU, this hobby is too expensive for you.)
+      * William Schaub has [[https://blog.longearsfor.life/blog/2023/11/26/building-pytorch-for-systems-without-avx2-instructions/|this blog post]] for people who don't have AVX2 support. He adds: "I ended up doing the same for torchaudio and torchvision because it turns out that the C++ API ended up mismatched from the official packages.  it's the same process except no changes needed in the cmake config."
     * Most users have more CPU-attached DRAM than GPU-attached VRAM, so more models can run via CPU inference.
     * CPU/DRAM inference is orders of magnitude slower than GPU/VRAM inference. (More info needed.)
@@ Line 44: / Line 44: @@
 =====How much DRAM/VRAM do I need?=====
-**Just to keep things clear** I will use the term **DRAM** to refer to CPU-attached RAM, which is generally DDR4 or DDR5, and **VRAM** to refer to GPU-attached RAM. Being much closer and on the PCB as the GPU itself, GDDR can connect with a much faster, wider bus. High-bandwidth memory goes even faster and wider.
+**Just to keep things clear** I will use the term **DRAM** to refer to CPU-attached RAM, which is generally DDR4 or DDR5, and **VRAM** to refer to GPU-attached RAM. Being much closer to the GPU itself, and on the same circuit board, GDDR can connect with a much faster, wider bus. [[https://en.wikipedia.org/wiki/High_Bandwidth_Memory|High-bandwidth memory]] goes even faster and wider.
 (Or, "what models am I limited to?")
@@ Line 89: / Line 89: @@
     * Quantization can vary within the same model.
     * The filename tells you what you need to know about a model before you download it. The end will be something like:
-      * Qx_0 use the same bpw as they say on the tin for all tensors.
+      * Q♣_0 use the same bpw as they say on the tin for all tensors.
-      * Qx_K_S ("small") uses x bpw for all tensors.
+      * Q♣_K_S ("small") uses ♣ bpw for all tensors.
-      * Qx_K_M ("medium") uses more bpw for some specific tensors.
+      * Q♣_K_M ("medium") uses more bpw for some specific tensors.
-      * Qx_K_L ("large") uses more bpw for a larger set of tensors.
+      * Q♣_K_L ("large") uses more bpw for a larger set of tensors.
   * exl2 - GPU only. New hotness. (Why though?) At time of writing (2023-11-11), TheBloke isn't publishing exl2 quants.