openclaw local memory search optimized for DGX Spark

Local AI memory (No Leaks, No Drama) with on Your DGX Spark

Feb 07, 2026

I’ve been knee-deep in setting up local embeddings on my DGX Spark (that Grace Blackwell beast), and it turned into a multi-hour slog of Docker pulls, config fiddling, and chasing GPU spikes. But it was worth it for one big reason: keeping my data locked down.

By default openclaw comes with vectorization via remote OpenAI or Google Gemini embedding models. While convenient, it also means that openclaw sends to OpenAI or Google all your memories that you might think are only between you and your personal assistant…

On a beast like DGX Spark, there is no excuse to cut corners when local vectorization models are perfectly capable (SOTA tier) with low latency (20-50ms) response. No sending notes or code to some cloud server that might leak everything.

Especially after yesterday’s ClawHub mess — that top skill turned out to be malware bait. Daniel Lockyer called it out on X, and Elon’s “Here we go” pretty much summed up the agentic wild west. Check it out:

Elon Musk@elonmusk

Here we go

Daniel Lockyer @DanielLockyer

malware found in the top downloaded skill on clawhub and so it begins https://t.co/VY4EeWExro

5:20 PM · Feb 6, 2026 · 6.35M Views

2.48K Replies · 7.29K Reposts · 41K Likes

The skill looked innocent, but the install tricked your agent into running obfuscated crap that bypassed macOS Gatekeeper. Scary stuff.

OpenClaw and similar frameworks let agents do real work — search memory, call tools, chain actions, even when you sleep (cron). It’s like LangChain or LlamaIndex, but baked in. But that’s why Grok and Gemini keep things tame: limited tools, no reckless chaining. One bad skill, and boom — data leak or backdoor. With regulations like GDPR looming, they’re playing safe. OpenClaw opens the door but doesn’t push you through; it’s local-first, with explicit configs. Still, ClawHub shows we need better vetting in these marketplaces.

From my session with Grok (shoutout — it helped debug), here’s what I learned: local is king for privacy. Sandboxed models like Qwen3 don’t phone home, and running on DGX Spark keeps everything on your hardware. Pair it with my GLM v4.7 guide from last month (perfect for Blackwell’s memory setup) for a full local AI loop.

Here’s the step-by-step I ended up with — dummy-proof, no fancy terms. Tested on my spark-9045 Ubuntu box.

Step 1: Fire Up Ollama for Embeddings
Ollama handles the heavy lifting with GPU support out of the box. It’s basically a wrapper around llama.cpp, so it offloads to CUDA automatically when it detects your hardware.

docker run -d --gpus all \
  -v ~/.openclaw/tei-data:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

Check logs for your GB10 to make sure CUDA is detected:

docker logs ollama | grep -i cuda

You should see something like “ggml_cuda_init: found 1 CUDA devices” and your NVIDIA GB10 listed. If not, double-check NVIDIA drivers and container toolkit.

Step 2: Grab Qwen3-Embedding-8B
Solid model, no leaks, great for notes/code. It’s open-source and vetted, so no worries about data phoning home.

docker exec -it ollama ollama pull qwen3-embedding:8b

Test it to confirm it’s working:

curl http://localhost:11434/api/embeddings -d '{
  "model": "qwen3-embedding:8b",
  "prompt": "Test local embedding"
}'

This should spit back a vector array. If it does, Ollama is ready.

Step 3: Plug It Into OpenClaw
Edit ~/.openclaw/openclaw.json (in agents.defaults.memorySearch). This tells OpenClaw to use Ollama as the embedding provider.

"memorySearch": {
  "provider": "openai",
  "model": "qwen3-embedding:8b",
  "remote": {
    "baseUrl": "http://localhost:11434/v1",
    "apiKey": "ollama"
  }
}

Restart OpenClaw after editing (kill the process and relaunch).

openclaw gateway restart

Step 4: Verify Setup & Check GPU
Before indexing, run a status check:

openclaw memory status --deep --verbose

Look for “Embeddings: ready”, “Vector: ready”, and the sqlite-vec path. Dims should be 4096 for Qwen3.

To test GPU (nvidia-smi should show spikes during embeds):

watch -n 0.1 nvidia-smi

In another terminal, run a loop to force embeddings:

for i in {1..50}; do
  curl -s http://localhost:11434/api/embeddings -d '{
    "model": "qwen3-embedding:8b",
    "prompt": "GPU test prompt repeated: DGX Spark, Qwen3, local AI."
  }' > /dev/null
done

You should see 20–80% util spikes. If not, check Ollama logs for CUDA mentions.

Step 5: Reset DB + Re-Index & Test
If switching models or fixing issues, reset the DB first (clears old vectors to avoid dimension mismatches):

rm -f ~/.openclaw/memory/*.sqlite

Then re-index your notes:

openclaw memory index --verbose

Watch nvidia-smi — expect spikes during the “embeddings” lines. Time it; for small notes, it’s quick.

Post-index status:

openclaw memory status --deep --verbose

Should show indexed files/chunks >0, dirty=no.

Test search:

openclaw memory search "crypto stuff"

If it pulls relevant snippets, success. Now ask your agent in Telegram: “What’s my crypto interest?” — it should recall from notes.

Step 6: Link with GLM v4.7 for Reasoning
From my GLM post: Pull glm:4.7 in Ollama, set as primary in agents. GLM loves Blackwell for local thinking over your embeddings. Update config:

"model": {
  "primary": "glm:4.7"
}

Restart and test agent queries.

That’s it — local, private, no drama. If you hit snags (I did with ARM64 stuff), watch nvidia-smi during index for GPU spikes. Questions? Hit me on X (@ivelini).

ivelin117

Discussion about this post

Ready for more?