Architecture fix: voice and text mode now have completely separate prompts.
Backend:
- VoiceAssistantProfileSupport.buildTextSystemRole: dedicated text-mode system
role that inherits all business rules (identity, KB-first, sensitive topics,
sales guidance, personal info) but removes voice-specific constraints (short
sentences, colloquial, single-line conclusion).
- DEFAULT_TEXT_SPEAKING_STYLE: text-specific style demanding detailed,
structured, Markdown-formatted answers with complete information.
- VoiceGatewayService.handleStart: switch between voice/text system role and
speaking style based on state.textMode.
- VoiceGatewayService.buildStartSessionPayload: preserve Markdown in text mode
(voice mode still strips asterisks/backticks via normalizeTextForSpeech to
avoid TTS pronouncing format chars).
Frontend:
- Added react-markdown@9 + remark-gfm@4 dependencies.
- ChatPanel renders assistant messages (non-voice) with ReactMarkdown:
headings, lists (ul/ol), bold, italic, inline/block code, tables, blockquote,
links, horizontal rules — all styled with Tailwind classes matching the dark
theme.
- User messages and voice-handoff messages remain plain text.
Verification: mvn test VoiceGatewaySmokeTest 20/20 pass, vite build succeeds.
Backend (VoiceGatewayService):
- [P0 Bug-1] handleAssistantChunk/Final: text mode must never apply blockUpstreamAudio
gate to text events (blockUpstreamAudio is for audio frames only). Non-KB text
queries now correctly stream subtitle back to client.
- [P0 Bug-3] sendUpstreamChatTextQuery: when upstream not ready, send
assistant_pending:false before error so client loading spinner can clear.
- [P1 Bug-6] handleUserPartial/handleUserFinal: early-return if textMode, guard
against spurious ASR echoes from S2S.
Frontend (ChatPanel S2S effect):
- [P0 Bug-2] Connection success now clears error; added cancelled flag to all
async setState paths to prevent state reversal on unmount.
- [P1 Bug-4] onAssistantPending: (false) always clears isLoading; (true) only
sets isLoading if not already streaming (streamingId drives UI, pending is
advisory).
- [P1 Bug-5] S2S mode loads session history via getSessionHistory (was Coze-only).
- [P2 Bug-8] When s2sService ref is null, also remove the placeholder assistant
bubble to avoid stale empty bubble in chat.
- [P2 Bug-9] All callbacks guard on cancelled flag to prevent React setState
warnings after unmount (cleanup triggers svc.disconnect which emits
'disconnected' state).
Verification: mvn test VoiceGatewaySmokeTest 20/20 pass, no voice regression.
- localStorage-persistent textEngine state ('coze' | 's2s')
- Header button toggles between the two engines when in chat mode
- ChatPanel remounts on engine switch via key=sessionId-textEngine
- Voice mode completely unaffected
Dual-channel S2S architecture with full isolation between voice and text links:
Backend (Java):
- VolcRealtimeProtocol: add createChatTextQueryMessage (event 501)
- VoiceSessionState: add textMode / playAudioReply / disableGreeting fields
- VoiceWebSocketConfig: register second path /ws/realtime-text (same handler)
- VoiceWebSocketHandler: detect text mode from URL path
- VoiceGatewayService:
* afterConnectionEstablished: overload with textMode flag
* handleStart: parse playAudioReply / disableGreeting from client
* buildStartSessionPayload: inject input_mod=text for text mode
* handleDirectText: text mode sends event 501 directly, skip processReply
* handleBinaryMessage: reject client audio in text mode
* handleUpstreamBinary: drop S2S audio if text mode + no playback
* startAudioKeepalive: skip entirely in text mode (no audio channel)
* sendGreeting: skip greeting if disableGreeting=true
Frontend (test2 + delivery):
- nativeVoiceService: connect accepts clientMode/playAudioReply/disableGreeting
* resolveWebSocketUrl accepts wsPath param
* Text mode: no microphone capture, no playback context (unless playAudioReply)
* New sendText() method for event 501 payload
* handleAudioMessage drops audio in text mode without playback
* Export NativeVoiceService class for multi-instance usage
- ChatPanel (test2): new useS2S / playAudioReply props
* useS2S=true: creates NativeVoiceService instance, connects to /ws/realtime-text
* subtitle events drive streaming UI, assistant_pending drives loading state
* handleSend routes to WebSocket in S2S mode, HTTP/SSE in Coze mode
* Voice link code path zero-changed
Verification: mvn test VoiceGatewaySmokeTest 20/20 pass, voice link regression-free