User Tools

Site Tools


ai:formats-faq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ai:formats-faq [2023/11/26 03:01] – [LLM Format Comparison] vastly improve GGML/GGUF info naptasticai:formats-faq [2023/12/16 16:13] (current) – [What hardware works?] naptastic
Line 26: Line 26:
       * Intel: Newer than (???)       * Intel: Newer than (???)
       * AMD: Zen architecture.       * AMD: Zen architecture.
-      * (there might be a way around some of this by compiling things from source, but... please no. If you don't have and can't get a new enough CPU, this hobby is too expensive for you.)+      * William Schaub has [[https://blog.longearsfor.life/blog/2023/11/26/building-pytorch-for-systems-without-avx2-instructions/|this blog post]] for people who don't have AVX2 support. He adds: "I ended up doing the same for torchaudio and torchvision because it turns out that the C++ API ended up mismatched from the official packages.  it's the same process except no changes needed in the cmake config."
     * Most users have more CPU-attached DRAM than GPU-attached VRAM, so more models can run via CPU inference.     * Most users have more CPU-attached DRAM than GPU-attached VRAM, so more models can run via CPU inference.
     * CPU/DRAM inference is orders of magnitude slower than GPU/VRAM inference. (More info needed.)     * CPU/DRAM inference is orders of magnitude slower than GPU/VRAM inference. (More info needed.)
Line 44: Line 44:
 =====How much DRAM/VRAM do I need?===== =====How much DRAM/VRAM do I need?=====
  
-**Just to keep things clear** I will use the term **DRAM** to refer to CPU-attached RAM, which is generally DDR4 or DDR5, and **VRAM** to refer to GPU-attached RAM. Being much closer and on the PCB as the GPU itself, GDDR can connect with a much faster, wider bus. High-bandwidth memory goes even faster and wider.+**Just to keep things clear** I will use the term **DRAM** to refer to CPU-attached RAM, which is generally DDR4 or DDR5, and **VRAM** to refer to GPU-attached RAM. Being much closer to the GPU itself, and on the same circuit board, GDDR can connect with a much faster, wider bus. [[https://en.wikipedia.org/wiki/High_Bandwidth_Memory|High-bandwidth memory]] goes even faster and wider.
  
 (Or, "what models am I limited to?") (Or, "what models am I limited to?")
Line 89: Line 89:
     * Quantization can vary within the same model.     * Quantization can vary within the same model.
     * The filename tells you what you need to know about a model before you download it. The end will be something like:     * The filename tells you what you need to know about a model before you download it. The end will be something like:
-      * Qx_0 use the same bpw as they say on the tin for all tensors. +      * Q♣_0 use the same bpw as they say on the tin for all tensors. 
-      * Qx_K_S ("small") uses bpw for all tensors. +      * Q♣_K_S ("small") uses ♣ bpw for all tensors. 
-      * Qx_K_M ("medium") uses more bpw for some specific tensors. +      * Q♣_K_M ("medium") uses more bpw for some specific tensors. 
-      * Qx_K_L ("large") uses more bpw for a larger set of tensors.+      * Q♣_K_L ("large") uses more bpw for a larger set of tensors.
   * exl2 - GPU only. New hotness. (Why though?) At time of writing (2023-11-11), TheBloke isn't publishing exl2 quants.   * exl2 - GPU only. New hotness. (Why though?) At time of writing (2023-11-11), TheBloke isn't publishing exl2 quants.
ai/formats-faq.1700967694.txt.gz · Last modified: 2023/11/26 03:01 by naptastic