Video Memory Guide
What local large models can be run with different graphics memories?
From 6GB, 8GB, 12GB, 24GB to 48GB, explain how model parameter volume, quantized version, KV cache and system overhead combine to determine whether it can be loaded.
Read article
Apple chip
How does Apple Unified Memory affect local LLM?
Explain why the total memory on Mac cannot be used as video memory, and how to choose the suitable model for 16GB, 32GB, 64GB, and 128GB machines.
Read article
Quantify
Q4, Q5, Q6, Q8 How should I choose quantification?
The more common GGUF quantification of memory usage, quality loss and speed trade-off helps users understand the three preferences of quality priority, balance and long context.
Read article
programming model
How to choose a local LLM suitable for programming?
From the four scenarios of code generation, interpretation, reconstruction and long context, explain why programming purposes cannot just look at model size and download volume.
Read article
multimodal
How to run local vision model and multi-modal model?
This article introduces the additional issues of graphics memory, image encoder, context, and reasoning back-end support that need to be considered in visual models compared to text models.
Read article
Tool selection
What are the differences between Ollama, LM Studio, and llama.cpp?
Explain to ordinary users the installation experience, model management, performance tuning and applicable groups of three common local running methods.
Read article