Quick answer (why it happens)
If LM Studio isn’t using your GPU, it’s usually one of these:
- You’re running a CPU-only runtime (or the wrong backend)
- Your GPU drivers aren’t installed correctly (common after Windows updates)
- The model is running in a way that falls back to CPU (context too big, VRAM too low, incompatible settings)
- You’re expecting GPU acceleration on a setup that doesn’t support it (especially some AMD paths, older GPUs, or certain backends)
This guide walks you through how to verify GPU usage first, then fix the root cause.
1) Confirm LM Studio is actually using CPU (not “feels slow”)
Before changing anything, confirm what’s happening.
Check Windows Task Manager
- Press Ctrl + Shift + Esc → Task Manager
- Go to Performance → GPU
- Start generating text in LM Studio
- Watch:
- GPU 3D might not move much (that’s normal)
- Look for Compute / CUDA / Graphics_1 (varies by driver)
- Watch Dedicated GPU memory (VRAM). If it stays flat, you’re likely CPU-only.
Watch VRAM and GPU load with a better tool (recommended)
- For NVIDIA: nvidia-smi (comes with drivers)
- Open Command Prompt and run:
nvidia-smi - While LM Studio generates, check if LM Studio (or its backend process) appears and uses memory.
- Open Command Prompt and run:
- For AMD: use AMD Software: Adrenalin performance metrics (or GPU-Z).
If VRAM usage is near zero and your CPU spikes, continue below.
2) Common reason #1: Wrong runtime / backend selected
LM Studio can run models using different backends. If you’re on CPU, you either:
- selected a CPU runtime, or
- GPU runtime failed and silently fell back.
Fix: Re-check LM Studio settings (high impact)
In LM Studio:
- Go to Settings (or Model / Runtime settings, depending on your version)
- Look for Acceleration / Backend / Device
- Choose:
- NVIDIA (CUDA) if available and you have an NVIDIA GPU
- Metal (macOS only—not relevant here)
- AMD support depends on the build/backend; sometimes it’s ROCm (often Linux-first), or it may still run via CPU on Windows
If you don’t see any GPU option at all, jump to Drivers and GPU requirements below.

3) Common reason #2: NVIDIA driver or CUDA stack issues (Windows)
If you have an NVIDIA GPU (RTX series especially), this is usually fixable quickly.
Fix checklist (NVIDIA)
- Update to the latest NVIDIA driver
- Use GeForce Experience or download from NVIDIA directly.
- Restart Windows
- In Windows Settings → System → Display → Graphics
- Add LM Studio (or the LM Studio backend executable if listed)
- Set to High performance (forces discrete GPU on laptops)
Laptop warning: iGPU vs dGPU
On many laptops you have Intel integrated graphics + NVIDIA GPU. Windows may run LM Studio on the iGPU unless you force it.
Symptoms: GPU usage stays low; dedicated VRAM doesn’t increase; CPU is high.
Fix: the “Graphics → High performance” setting above is the simplest.
4) Common reason #3: VRAM is too low for your model + context
Even with correct drivers, LM Studio may still run CPU-only if:
- the model is too large
- the context length is too high
- offloading can’t fit into VRAM
Rule of thumb (very rough)
- 7B model (4-bit): often OK on 6–8GB VRAM
- 13B model (4-bit): typically needs 10–12GB+ VRAM for comfortable GPU offload
- Context size increases memory a lot (especially at 8k+)
Fix: Use these settings for speed
Try in this order:
- Use a smaller model (7B or even 3B)
- Use 4-bit quantized GGUF models (common for local inference)
- Reduce context length (e.g., 2048–4096)
- Reduce batch size (if configurable)
- Turn off heavy extras (like large system prompts or long chat history)
If your goal is “fast responses” more than “maximum intelligence,” a well-chosen 7B model can feel dramatically better.
5) Common reason #4: Your GPU isn’t supported (or is too old)
Not all GPUs are equal for local LLM inference.
NVIDIA
- Modern RTX cards are best.
- Older GTX cards can work, but results vary, and VRAM is often the limit.
AMD on Windows
AMD GPU acceleration for local LLM apps on Windows can be inconsistent depending on:
- backend support,
- ROCm availability (often easier on Linux),
- model format/backends.
If you’re on AMD and LM Studio has no working GPU option, you may:
- still run fast on CPU with smaller quantized models, or
- consider trying the Windows-friendly local runtime options discussed in your post:
- Ollama vs LM Studio: https://logixcontact.site/run-llm-locally-windows-ollama-vs-lm-studio-2026/
6) Make sure Windows isn’t putting LM Studio in “Power Saving” mode
This sounds silly, but it matters a lot on laptops.
Fix
- Control Panel → Power Options → High performance (or Best performance)
- Windows 11: Settings → System → Power & battery → Power mode: Best performance
Then rerun LM Studio and re-check GPU/VRAM activity.
7) Speed settings that actually move the needle (2026)
Once GPU is working, you can squeeze extra performance.
Best speed wins
- Choose a smaller model that still meets your needs (7B often ideal)
- Lower context (don’t default to maximum)
- Prefer 4-bit GGUF for local runs if you’re resource limited
- Keep prompts tight; avoid huge chat history
What not to over-optimize
- Tiny prompt tweaks won’t fix a CPU-only situation.
- If VRAM is maxed, performance will tank—solve VRAM fit first.

8) Troubleshooting: “It says GPU, but it’s still slow”
If GPU usage shows activity but generation is still slow, check:
A) VRAM maxed out
If VRAM is near 100%, you may be swapping or partially offloading.
- Lower context
- Smaller model
- Different quantization
B) Thermal throttling
On laptops, sustained loads can throttle CPU/GPU quickly.
- Improve cooling
- Use performance mode
- Plug in power
C) Background apps using GPU
Chrome tabs, video apps, games—close them and test again.
9) Recommended internal links (good for SEO + session depth)
Add these links in the blog to keep users on-site:
- Run LLM locally (Ollama vs LM Studio):
https://logixcontact.site/run-llm-locally-windows-ollama-vs-lm-studio-2026/ - Run a private AI assistant locally:
https://logixcontact.site/run-private-ai-assistant-locally-2026-guide/ - Optional relevance:
- Prompt quality: https://logixcontact.site/get-better-answers-from-chatgpt/
FAQ (helps SEO + featured snippets)
Why is LM Studio using CPU instead of GPU?
Usually because the GPU backend isn’t enabled/available, drivers aren’t installed correctly, or the model + context can’t fit in VRAM so it falls back to CPU.
How do I check if LM Studio is using my NVIDIA GPU?
Use Task Manager → Performance → GPU and confirm Dedicated GPU memory rises during generation, or run nvidia-smi and check that LM Studio (or its backend process) is consuming VRAM.
What model size should I use for GPU acceleration?
Start with a 7B model in 4-bit. If you have 8GB VRAM, 7B is usually the sweet spot. Larger models often require 12GB+ VRAM for comfortable offload depending on context.
Does LM Studio support AMD GPU on Windows?
It depends on the backend and your setup. If you can’t select an AMD GPU option, you may be limited to CPU on Windows, or you may need a different toolchain/platform.
Conclusion (what to do next)
If LM Studio is slow, don’t guess—verify GPU usage, then fix it systematically:
- Confirm CPU-only using Task Manager / VRAM
- Ensure the correct GPU backend is selected
- Update drivers and force High performance GPU on Windows
- Pick a model + context that actually fits your VRAM
If you want, paste:
- your GPU model (e.g., RTX 4060 8GB / RX 7600),
- LM Studio version,
- the model you’re running (name + size + quantization),
and I’ll tailor the “best settings” section to your exact hardware and recommend the best model(s) to hit fast tokens/sec.









No Comments