GLM Behavior
This page covers the toggles and options that control how IntenseRP interacts with GLM Chat (chat.z.ai).
Beta quality
The GLM driver is still somewhat beta-like. It is mostly usable for daily driving, but you may occasionally run into quirks or instability. For a deep dive into known quirks and timing settings, see GLM Quirks.
Modes (model IDs)
In IntenseRP Next v2, the model you select in SillyTavern is mostly a mode selector, not a true model picker.
For GLM, these model IDs map to simple behavior presets:
| Model ID | Behavior |
|---|---|
glm-auto |
Uses your IntenseRP settings |
glm-chat |
Forces Deep Think off and never emits <think> |
glm-reasoner |
Forces Deep Think on (Send Deep Think follows your setting) |
About real GLM model selection
IntenseRP can also switch GLM's real model picker in the web UI.
Settings -> Provider Behavior -> GLM Chat -> Model
Supported options:
- GLM-5.1 (recommended for rp)
- GLM-5-Turbo
- GLM-5V-Turbo
- GLM-5
- GLM-4.7
Right now, GLM-5V-Turbo is also the only GLM model where IntenseRP exposes the separate Enable Tools toggle. On other GLM models, that setting is forced off.
Fallback behavior
If your selected model is not present in the dropdown (UI changes / rollout), IntenseRP logs a warning and selects the first available model instead.
Deep Think
Deep Think is GLM's reasoning mode. When enabled, GLM produces an internal reasoning trace before (or alongside) the final answer.
Enable Deep Think
Toggles the Deep Think button in GLM's interface.
Settings -> Provider Behavior -> GLM Chat -> Enable Deep Think
Send Deep Think
When enabled, IntenseRP includes GLM's reasoning in the response, wrapped in <think> tags.
Settings -> Provider Behavior -> GLM Chat -> Send Deep Think
Search
GLM Chat Search can be toggled via IntenseRP.
GLM streams internal tool/search payloads into the same response stream (wrapped in <glm_block>...</glm_block>). IntenseRP strips these blocks, so search results are not sent to the client.
Settings -> Provider Behavior -> GLM Chat -> Enable Search
Advanced Search
GLM also has an Advanced Search switch tucked inside the Search hover menu. IntenseRP can toggle it too, but it is off by default and only applies when Deep Think and Search are both enabled for the request.
Settings -> Provider Behavior -> GLM Chat -> Enable Advanced Search
Dependency behavior
In Settings, Enable Advanced Search is forced off while Enable Deep Think or Enable Search is off.
At runtime, IntenseRP also checks the resolved request settings. If a model ID, loadout, or request override asks for Advanced Search without both required modes, the driver logs a warning and sends the request without Advanced Search.
Same output filtering
Advanced Search can make GLM search more deeply, but IntenseRP still strips GLM's internal search/tool payloads from the API stream. You may still see citations in the final answer.
Tools
Some GLM models now expose a separate Tools button next to Search and Deep Think.
IntenseRP supports that toggle too, but currently only on GLM-5V-Turbo. If you pick any other GLM model, Enable Tools is automatically forced off instead of pretending everything is fine and then wandering into the UI looking confused.
Settings -> Provider Behavior -> GLM Chat -> Enable Tools
Very heavily recommended to leave off
Leave Enable Tools disabled unless you are intentionally poking at it.
GLM's Tools UI is still unstable, and IntenseRP does not support those tool outputs yet. The toggle exists mostly so advanced users can experiment, not because this is a polished or recommended workflow.
Separate from Search
Enable Tools is its own toggle. It does not replace Enable Search, and turning on one does not automatically turn on the other.
Loadout-aware
This setting works with Loadouts just like the other GLM Behavior fields. If a loadout enables Tools but the selected GLM model does not support it, IntenseRP still forces it back off at runtime.
Count Tokens
GLM's backend reports token usage near the end of a response stream. When enabled, IntenseRP captures these values and returns them in the OpenAI-style usage fields (prompt_tokens, completion_tokens, total_tokens). This is enabled by default.
Settings -> Provider Behavior -> GLM Chat -> Count Tokens
Caching
Sometimes GLM reports cached prompt tokens as usage.prompt_tokens_details.cached_tokens.
File Upload Mode
Instead of typing your message into GLM's chat box, IntenseRP can upload it as a text file attachment. This is useful for very long prompts that might hit input limits.
Settings -> Provider Behavior -> GLM Chat -> Send As Text File
File Upload Timeout
When uploading files, GLM can take a moment before the send button becomes active. This setting controls how long IntenseRP waits (in seconds) before giving up.
Settings -> Provider Behavior -> GLM Chat -> File Upload Timeout
Default is 15 seconds. Increase it if you're on a slow connection/PC or uploading very large prompts.
Text File Filler
GLM won't let you send a file with an empty textbox as it needs some text alongside it. By default IntenseRP pastes a single . (dot) as filler, but you can change this to whatever you want.
Settings -> Provider Behavior -> GLM Chat -> Text File Filler
This setting only appears when Send As Text File is enabled.
Reuse Matching Chat
Reuse Matching Chat tries to keep chats tidy: when you send the exact same prompt twice in a row, IntenseRP clicks GLM's "Regenerate" instead of creating a brand new chat. This is done with the goal of reducing clutter in the chat history and generally just speeding up the workflow.
Settings -> Provider Behavior -> GLM Chat -> Reuse Matching Chat
Pick one
On GLM, you can use either Reuse Matching Chat (+ optional Search Older Matching Chats) or Repetition Buster.
They are opposite strategies, so IntenseRP only uses one of them at a time.
Known issue (GLM)
Reuse Matching Chat is currently unreliable with GLM Chat. The option may error out even though your request still completes normally.
If you want to experiment with it anyway, try enabling Refresh After Generation under Settings -> Provider Behavior -> GLM Chat -> Quirks. This reloads the page after each response and can sometimes restore the UI state so Regenerate becomes available again.
Search Older Matching Chats
GLM also supports Provider Behavior -> GLM Chat -> Search Older Matching Chats.
That lets IntenseRP keep up to 7 older cached GLM chats per account and try reopening one of those when the current prompt matches.
Same point as above, though: if GLM's regenerate UI is being annoying that day, Search Older Matching Chats inherits the same annoyance because it still depends on the Regenerate button working.
Repetition Buster
Repetition Buster is basically the opposite of Reuse Matching Chat.
Instead of trying to reuse the same chat, IntenseRP checks whether the current prompt matches the immediately previous one for the active GLM account/profile. That last-prompt memory is kept across restarts, because GLM's own caching can survive them too. If it matches, IntenseRP opens a throwaway fresh chat, sends a random 128-character string there, and then opens another fresh chat for your real prompt.
That random string is just a cache buster. The whole idea is to disturb GLM's context caching before the real request goes out, which can help if you're worried about suspiciously repetitive duplicate generations.
Settings -> Provider Behavior -> GLM Chat -> Repetition Buster
No Search Older Matching Chats here
Search Older Matching Chats only works with Reuse Matching Chat, because it reopens an older chat and presses Regenerate there.
Repetition Buster does the opposite. It intentionally burns one throwaway chat and then starts another brand new one for the real request.
Delete Chat After Reply
If you want GLM's chat history cleaned up automatically, IntenseRP can delete the completed GLM chat after a successful reply finishes.
Settings -> Provider Behavior -> GLM Chat -> Delete Chat After Reply
Slower requests
This adds extra cleanup work after each request, so it can slow requests down quite a bit.
No Reuse Matching Chat here
This does not work together with Reuse Matching Chat or Search Older Matching Chats.
Repetition Buster still works
GLM's Repetition Buster is still compatible with this.
IntenseRP deletes the temporary cache-buster chat too before it sends the real request.
See also: Chat Auto-Deletion
Login notes (CAPTCHA)
GLM requires a CAPTCHA during login. Even with Auto Login enabled, you must complete the CAPTCHA in the browser window, since it's not really possible to reliably automate that step.
Use Persistent Sessions
Persistent Sessions are strongly recommended for GLM. They help you avoid solving the CAPTCHA on every start.
See: Login & Sessions
UI language requirement
The GLM driver currently expects the GLM web UI language to be English (en-US). If GLM is set to another language, IntenseRP may fail to find buttons/toggles reliably.
If you see a warning about GLM UI language:
- Change GLM language to English (en-US) in the GLM browser window
- Reload the page (F5 / Ctrl+R)
- Retry / restart the browser from IntenseRP if needed
Per-message macros
You can add simple [[...]] macros to the latest user message in SillyTavern to override certain GLM Behavior settings for that request only.
All macros are stripped from the message before sending it to GLM.
| Macro | Effect |
|---|---|
[[think]], [[r1]] |
Force Deep Think on |
[[nothink]], [[r0]] |
Force Deep Think off |
[[search]] |
Force Search on |
[[nosearch]], [[no_search]] |
Force Search off |
[[tool]], [[tools]] |
Force Tools on |
[[notool]], [[notools]], [[no_tool]], [[no_tools]] |
Force Tools off |
[[file]] |
Force Send As Text File on |
[[nofile]] |
Force Send As Text File off |
Search macros
Search macros like [[search]] / [[nosearch]] override the Enable Search setting for that request only.
Tools macros
Tools macros only do anything on GLM-5V-Turbo. On unsupported GLM models, IntenseRP ignores the request and keeps Tools off.
Scope
Only macros from the latest user message apply. They do not persist across requests.
Quirks & Timing
GLM has a few quirks worth knowing about, that could look as broken (but really they can be pretty easy to work around). These are covered briefly on this page (see individual sections above), but if you want the full picture including all the timing knobs and workarounds:
Quick Reference
| Setting | What It Does | Default |
|---|---|---|
| Model | Selects GLM's real model picker (UI) | GLM-5 |
| Enable Deep Think | Toggles GLM reasoning mode | Off |
| Send Deep Think | Includes thinking in response | Off |
| Count Tokens | Returns token usage in API responses | On |
| Enable Search | Enables GLM search | Off |
| Enable Advanced Search | Enables GLM's deeper Search mode when Deep Think + Search are on | Off |
| Enable Tools | Enables GLM's separate Tools button on GLM-5V-Turbo | Off |
| Send As Text File | Uploads prompt as .txt | Off |
| File Upload Timeout | Seconds to wait for upload | 15 |
| Text File Filler | Text pasted alongside the uploaded file | . |
| Reuse Matching Chat | Regenerates on duplicate prompts | Off (unstable for GLM) |
| Delete Chat After Reply | Deletes the completed GLM chat after a successful reply | Off |
| Repetition Buster | Sends a throwaway cache-buster prompt before duplicate prompts | Off |
| First Chunk Timeout | Seconds to wait for the response stream to start | 45 |
| Refresh After Generation | Reloads the GLM page after each response | Off |