Local LLM

Blog hub

Local LLM Blog

Practical guides around what users actually search for: what fits in your VRAM, how to choose models on Mac, how quantization works, which coding and vision models to run locally.

Video Memory Guide

What local large models can be run with different graphics memories?

From 6GB, 8GB, 12GB, 24GB to 48GB, explain how model parameter volume, quantized version, KV cache and system overhead combine to determine whether it can be loaded.

Read article

Apple chip

How does Apple Unified Memory affect local LLM?

Explain why the total memory on Mac cannot be used as video memory, and how to choose the suitable model for 16GB, 32GB, 64GB, and 128GB machines.

Read article

Quantify

Q4, Q5, Q6, Q8 How should I choose quantification?

The more common GGUF quantification of memory usage, quality loss and speed trade-off helps users understand the three preferences of quality priority, balance and long context.

Read article

programming model

How to choose a local LLM suitable for programming?

From the four scenarios of code generation, interpretation, reconstruction and long context, explain why programming purposes cannot just look at model size and download volume.

Read article

multimodal

How to run local vision model and multi-modal model?

This article introduces the additional issues of graphics memory, image encoder, context, and reasoning back-end support that need to be considered in visual models compared to text models.

Read article

Tool selection

What are the differences between Ollama, LM Studio, and llama.cpp?

Explain to ordinary users the installation experience, model management, performance tuning and applicable groups of three common local running methods.

Read article