2026-05-04 22:11 GMT+8 · summary_2026-05-04_22-11.md

🤖 AI News Summary - 2026-05-04 22:11 GMT+8

Focused AI/dev subreddit roundup.

Full site: https://kkklobsterfarming.github.io/ai-news-summary-site/

What changed since last run

Made a Skill for creating Open Webui Tools, try it with Qwen3.6 — r/OpenWebUI
Back to Basics: making the scroll bar visible — r/OpenWebUI
A Qwen finetune, that feels VERY human — r/LocalLLaMA
Knowledge base vs LLM WIKI? How best to implement “context caching”? — r/OpenWebUI
Self-hosted document & email search: Need a lightweight RAG indexer with hybrid search — r/selfhosted
Added Ollama / LM Studio / llama.cpp support to my dataset generator app — fine-tune your model fully offline (or mix local + cloud) — r/llmdevs
Building an Auto-Restart Mechanism for Claude Code — r/ClaudeCode
Pushing a 5-Year-Old 6GB VRAM laptop to Its Limits: Qwen3.6-35B-A3B — r/LocalLLaMA
Reminder: Have you checked your context lately? — r/ClaudeAI
Arr Stack Question? — r/selfhosted
How much will it cost to host something like qwen3.6 35b a3b in a cloud? — r/LocalLLaMA
How to optimise my OpenAI API response time? (gpt-4o-mini) — r/llmdevs

r/openai

No non-pinned/newsworthy posts fetched after filtering.

r/LocalLLaMA

A Qwen finetune, that feels VERY human
- Hello guys, So TL;DR, I was asked by multiple people to make an Assistant_Pepe_32B version, but the best base model contender was Qwen3-32B, a model that is very hard to tune on anything other than STEM. The concept of Assistant_Pepe is an…
- Timestamp: 2026-05-04 01:20 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between joking and positive. Top reactions focus on Why qwen 3 and not 3.6? Also, make the ggufs so that people can test them to see if they’re actually any better than the base model | Try to break the IBM newest model. Fuck it why not bro…..
- Author: /u/Sicarius_The_First
Pushing a 5-Year-Old 6GB VRAM laptop to Its Limits: Qwen3.6-35B-A3B
- For the past few weeks, I have been trying to get this model working on my hardware. It still feels incredible how much better open models have become.
- Timestamp: 2026-05-04 06:16 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Very nice work. My main dev machine is an old Dell XPS w/ a 2060, 64GB of RAM - basically just a front end for the inference server running… | I’m running models okay on 32gigs ddr4 3200 and a 6600xt(8gigs)….
- Author: /u/abhinand05
How much will it cost to host something like qwen3.6 35b a3b in a cloud?
- I keep hearing the model is good, I don’t have the hardware for it, and I will wait to the end of the year for the hardware to evolve. But, I still need coding, people are saying qwen3.6 35b a3b is good, so the question is now how much…
- Timestamp: 2026-05-04 07:47 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between concerned and joking. Top reactions focus on Don’t lol. If you’re going to use a model in the cloud you might as well use the subsidized models that are extremely cheap like Minimax,… | This this this deepseek v4 flash is basically free at…
- Author: /u/Euphoric_North_745
Mistral-Medium-3.5-128B-Q3_K_M on 3x3090 (72GB VRAM)
- [Image: Mistral-Medium-3.5-128B-Q3_K_M on 3x3090 (72GB VRAM)] Here is the actual speed of Mistral Medium Q3 running locally on 3x3090 first some Python…
- Timestamp: 2026-05-04 08:46 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Pelican is not great. But I believe svg benchmarks are overfitted anyways. | Yes, I wanted to check quality of Q3 quant. At least all three files are valid code. Overall sentiment — post: positive; author: mixed….
- Author: /u/jacek2023
What a time to be alive from 1tk/sec to 20-100tk/sec for huge models
- https://www.reddit.com/r/LocalLLaMA/comments/1eb6to7/llama_405b_q4_k_m_quantization_running_locally/ (https://www.reddit.com/r/LocalLLaMA/comments/1eb6to7/llama_405b_q4_k_m_quantization_running_locally/)…
- Timestamp: 2026-05-04 01:46 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between concerned and positive. Top reactions focus on Wasn’t that a dense model? The others are MoE, that’s why you’re able to run them fast. 405B would be just as slow today. If you mean the… | Yeah, it was. OP probably doesn’t understand the…
- Author: /u/segmond

r/llmdevs

Added Ollama / LM Studio / llama.cpp support to my dataset generator app — fine-tune your model fully offline (or mix local + cloud)
- [Image: Added Ollama / LM Studio / llama.cpp support to my dataset generator app — fine-tune your model fully offline (or mix local + cloud)] A while back I shipped a desktop app that generates fine-tuning datasets via OpenRouter. Got my…
- Timestamp: 2026-05-04 01:22 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on the token accounting thing with think blocks is real. ollama not separating reasoning_tokens properly messed up my budgets too | Yeah, openrouter actually breaks reasoning_tokens out in completion_tokens_details…
- Author: /u/AronSan
How to optimise my OpenAI API response time? (gpt-4o-mini)
- I’m currently using gpt-4o-mini as the model for my openai api in my project. Even getting a response from a short prompt such as “What is your name?” takes 5-10 seconds.
- Timestamp: 2026-05-04 19:08 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on I see. What do you think could be the issue? My connection couldn’t be the issue either; I constantly get around 300 MBPS. I’d really… | To see if it’s your network you could do a ping against OpenAI’s API server. To see if it’s your app framework write…
- Author: /u/FindingOk1094

r/OpenWebUI

Made a Skill for creating Open Webui Tools, try it with Qwen3.6
- https://openwebui.com/posts/531f5d48-691f-41bc-95c7-bd0cad98d095 (https://openwebui.com/posts/531f5d48-691f-41bc-95c7-bd0cad98d095) https://pastebin.com/u6vZQXj2 (https://pastebin.com/u6vZQXj2) Made this with Hermes Agent, using the…
- Timestamp: 2026-05-04 01:27 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on Is that screenshot Qwen Coder or something (never used)… I notice a task checklist that I don’t see in Open WebUI | This is all native OpenWebui the task checklist is a new feature of open webui. Overall sentiment — post: mixed; author: mixed. Reply…
- Author: /u/iChrist
Back to Basics: making the scroll bar visible
- One of the most annoying things about Open WebUI is the scroll bar - or the lack of it. As the page gets longer with a response, the scroll bar doesn’t appear until you start randomly clicking on the right edge of the page, and you might…
- Timestamp: 2026-05-04 18:03 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on custom css can be put in static folder, but i agree its borderline invisible, i will open an issue to track this. Overall sentiment — post: mixed; author: mixed. Reply threads: 2026-05-04 18:20 GMT+8: post=mixed, author=mixed — custom css can be put in…
- Author: /u/BringOutYaThrowaway
Knowledge base vs LLM WIKI? How best to implement “context caching”?
- Has anyone implemented Karpathy LLM wiki idea in Open Web UI with knowledge base? I used open terminal and qwen3-coder-next to implement a folder structure there.
- Timestamp: 2026-05-04 12:27 GMT+8
- Community: Community reaction (heuristic-fallback): Top reactions focus on You’re talking about the Open Terminal integration (github.com/open-webui/open-terminal (https://github.com/open-webui/open-terminal)) with… | Knowledge base is RAG basically. People are interested in LLM Wiki because they want something different. You…
- Author: /u/Last_Bad_2687

r/selfhosted

Self-hosted document & email search: Need a lightweight RAG indexer with hybrid search
- Hi everyone, I am looking for a locally hostable application to get a comprehensive search across all my documents, emails, and files. I am currently using Paperless, a self-hosted mail server that fetches and stores all my emails via…
- Timestamp: 2026-05-04 14:00 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | no one really seems to have done this yet. from what I understand you would have to give Paperless a pretty significant refactoring…
- Author: /u/Alarmed_Bug3762
Arr Stack Question?
- Need some best practice advice on building out a media server. I already have QBittorrent and Jellyfin installed in separate Promox LXCs.
- Timestamp: 2026-05-04 15:00 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | Separation is good. I like having my containers in various VMs. No LXC at all. Using Ansible/Terraform/Docker Swarm to automate and…
- Author: /u/pagem
n8n + Paperless-ngx + Paperless-GPT for adding RAG to your documents!
- [Image: n8n + Paperless-ngx + Paperless-GPT for adding RAG to your documents!] Paperless-ngx is undoubtedly one of the most important and useful containers in my self-hosted stack. I have a modest collection of documents, ranging from…
- Timestamp: 2026-05-04 16:06 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | That’s what I’m implementing actually. If I only would have enough time.. Overall sentiment — post: positive; author: mixed. Reply…
- Author: /u/hackslashX
Postiz Self-Hosted - All working, but API access does not
- I have self-hosted Postiz on my server at home via Docker. I have it published to the Internet via NGINX Proxy Manager.
- Timestamp: 2026-05-04 16:58 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is mostly positive. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | Sounds like a proxy or config issue rather than Postiz itself. Make sure NGINX Proxy Manager is forwarding the correct path to the…
- Author: /u/Patient_Scale9438
Speakr v0.8.19 - Local audio/video transcription app update
- [Image: Speakr v0.8.19 - Local audio/video transcription app update] Hey r/selfhosted (/r/selfhosted), another Speakr update. If you haven’t seen this before, Speakr is a self-hosted audio transcription app: record or upload audio/video,…
- Timestamp: 2026-05-04 14:49 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between positive and skeptical. Top reactions focus on Expand the replies to this comment to learn how AI was used in this post/project. | ok dumb question. if i run this as a docker on my server, how do i transcribe, f.e. a meeting in ms teams on…
- Author: /u/hedonihilistic

r/ClaudeAI

Reminder: Have you checked your context lately?
- [Image: Reminder: Have you checked your context lately?] Just a reminder to run /context. I like to think I was on top of this!
- Timestamp: 2026-05-04 03:19 GMT+8
- Community: Community reaction (heuristic-fallback): The comment section is split between concerned and joking. Top reactions focus on When you start up a conversation, Claude code will pull in your plugins, mcp, extra data for you prompt so it knows to use them. These will… | This is the response I immediately expected coming…
- Author: /u/Arona_Daal

r/ClaudeCode

Building an Auto-Restart Mechanism for Claude Code
- [Image: Building an Auto-Restart Mechanism for Claude Code] Claude Code requires a manual session restart every time you install an MCP server or change a config, which breaks your momentum. I built claude-resurrect to fix this.
- Timestamp: 2026-05-04 14:50 GMT+8
- Author: /u/emnoleg

r/Codex

No non-pinned/newsworthy posts fetched after filtering.

Generated 2026-05-04 22:11 GMT+8 | Next update in 2 hours