-
-
Notifications
You must be signed in to change notification settings - Fork 128
Web search not working #1127
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I intend to use web search.
When searching for a query with Startpage or google, the side panel of the web search opens, loads appropriate pages but then continues just to run with a result. I noticed that all CPUs are in use.
When searching for a query with duckduckgo or other coustum search engine, the side panel of the web search opens, loads appropriate pages but then continues just to run with a result. I noticed that initally many CPUs are used but subsequent only one CPU.
If I prompt something without web search, I get an output.
I have tested several models.
The expected behavior is that the prompt gets processed along with information from the web searach.
Debugging information
flatpak run com.jeffser.Alpaca
INFO [main.py | main] Alpaca version: 9.2.0
MESA-INTEL: warning: ../src/intel/vulkan/anv_formats.c:993: FINISHME: support more multi-planar formats with DRM modifiers
MESA-INTEL: warning: ../src/intel/vulkan/anv_formats.c:959: FINISHME: support YUV colorspace with DRM format modifiers
INFO [ollama_instances.py | start] Starting Alpaca's Ollama instance...
INFO [ollama_instances.py | start] Started Alpaca's Ollama instance
time=2026-03-02T10:30:35.499+01:00 level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/alpaca/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://127.0.0.1:11435 http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-03-02T10:30:35.499+01:00 level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false"
time=2026-03-02T10:30:35.500+01:00 level=INFO source=images.go:473 msg="total blobs: 5"
time=2026-03-02T10:30:35.500+01:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-03-02T10:30:35.500+01:00 level=INFO source=routes.go:1718 msg="Listening on 127.0.0.1:11435 (version 0.17.4)"
time=2026-03-02T10:30:35.500+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
INFO [ollama_instances.py | start] Ollama version is 0.17.4
time=2026-03-02T10:30:35.609+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="/home/alpaca/.var/app/com.jeffser.Alpaca/data/ollama_installation/bin/ollama runner --ollama-engine --port 44991"
time=2026-03-02T10:30:35.721+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="/home/alpaca/.var/app/com.jeffser.Alpaca/data/ollama_installation/bin/ollama runner --ollama-engine --port 36621"
time=2026-03-02T10:30:35.747+01:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-03-02T10:30:35.747+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="62.5 GiB" available="50.2 GiB"
time=2026-03-02T10:30:35.747+01:00 level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
[GIN] 2026/03/02 - 10:30:35 | 200 | 347.976µs | 127.0.0.1 | GET "/api/tags"
INFO [_client.py | _send_single_request] HTTP Request: GET http://127.0.0.1:11435/api/tags "HTTP/1.1 200 OK"
[GIN] 2026/03/02 - 10:30:35 | 200 | 85.634145ms | 127.0.0.1 | POST "/api/show"
INFO [_client.py | _send_single_request] HTTP Request: POST http://127.0.0.1:11435/api/show "HTTP/1.1 200 OK"
INFO [_client.py | _send_single_request] HTTP Request: GET http://127.0.0.1:11435/api/tags "HTTP/1.1 200 OK"
[GIN] 2026/03/02 - 10:31:14 | 200 | 198.193µs | 127.0.0.1 | GET "/api/tags"
INFO [_client.py | _send_single_request] HTTP Request: POST http://127.0.0.1:11435/api/show "HTTP/1.1 200 OK"
[GIN] 2026/03/02 - 10:31:14 | 200 | 78.336689ms | 127.0.0.1 | POST "/api/show"
time=2026-03-02T10:31:14.925+01:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-03-02T10:31:15.011+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="/home/alpaca/.var/app/com.jeffser.Alpaca/data/ollama_installation/bin/ollama runner --ollama-engine --model /home/alpaca/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --port 35245"
time=2026-03-02T10:31:15.012+01:00 level=INFO source=sched.go:491 msg="system memory" total="62.5 GiB" free="50.1 GiB" free_swap="8.0 GiB"
time=2026-03-02T10:31:15.012+01:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1
time=2026-03-02T10:31:15.020+01:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-02T10:31:15.020+01:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:35245"
time=2026-03-02T10:31:15.023+01:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:16384 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-02T10:31:15.040+01:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name="Qwen3 14B" description="" num_tensors=443 num_key_values=28
load_backend: loaded CPU backend from /home/alpaca/.var/app/com.jeffser.Alpaca/data/ollama_installation/lib/ollama/libggml-cpu-alderlake.so
time=2026-03-02T10:31:15.044+01:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-03-02T10:31:15.070+01:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:16384 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:16384 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=ggml.go:494 msg="offloaded 0/41 layers to GPU"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.6 GiB"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="2.5 GiB"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="148.0 MiB"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=device.go:272 msg="total memory" size="11.3 GiB"
time=2026-03-02T10:31:15.935+01:00 level=INFO source=sched.go:566 msg="loaded runners" count=1
time=2026-03-02T10:31:15.935+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-03-02T10:31:15.936+01:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-02T10:31:16.700+01:00 level=INFO source=server.go:1388 msg="llama runner started in 1.69 seconds"
INFO [_client.py | _send_single_request] HTTP Request: POST http://127.0.0.1:11435/api/chat "HTTP/1.1 200 OK"
[GIN] 2026/03/02 - 10:31:23 | 200 | 8.89461562s | 127.0.0.1 | POST "/api/chat"
INFO [_client.py | _send_single_request] HTTP Request: POST http://127.0.0.1:11435/api/chat "HTTP/1.1 200 OK"
[GIN] 2026/03/02 - 10:32:06 | 200 | 51.833608294s | 127.0.0.1 | POST "/api/chat"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working