User 4b78f81cbc fix(s2s-text): 9 review bugs - text stream, loading, history, unmount safety
Backend (VoiceGatewayService):
- [P0 Bug-1] handleAssistantChunk/Final: text mode must never apply blockUpstreamAudio
  gate to text events (blockUpstreamAudio is for audio frames only). Non-KB text
  queries now correctly stream subtitle back to client.
- [P0 Bug-3] sendUpstreamChatTextQuery: when upstream not ready, send
  assistant_pending:false before error so client loading spinner can clear.
- [P1 Bug-6] handleUserPartial/handleUserFinal: early-return if textMode, guard
  against spurious ASR echoes from S2S.

Frontend (ChatPanel S2S effect):
- [P0 Bug-2] Connection success now clears error; added cancelled flag to all
  async setState paths to prevent state reversal on unmount.
- [P1 Bug-4] onAssistantPending: (false) always clears isLoading; (true) only
  sets isLoading if not already streaming (streamingId drives UI, pending is
  advisory).
- [P1 Bug-5] S2S mode loads session history via getSessionHistory (was Coze-only).
- [P2 Bug-8] When s2sService ref is null, also remove the placeholder assistant
  bubble to avoid stale empty bubble in chat.
- [P2 Bug-9] All callbacks guard on cancelled flag to prevent React setState
  warnings after unmount (cleanup triggers svc.disconnect which emits
  'disconnected' state).

Verification: mvn test VoiceGatewaySmokeTest 20/20 pass, no voice regression.
2026-04-17 09:44:36 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00
2026-03-12 12:47:56 +08:00

AI 知识库文档智能分块工具

将多种格式文档解析为文本,通过 DeepSeek API 进行语义级智能分块,输出为 Markdown 文件。

支持格式

PDF、Word (.docx)、Excel (.xlsx/.xls)、CSV、HTML、TXT/MD、图片 (PNG/JPG/BMP/GIF/WEBP)

安装

cd ai-knowledge-splitter
pip install -r requirements.txt

使用

python main.py <输入文件> -k <DeepSeek API Key> [-o 输出路径] [-d 分隔符]

示例:

# 基本用法(输出为同名 .md 文件)
python main.py report.pdf -k sk-xxxxxxxx

# 指定输出路径
python main.py data.docx -k sk-xxxxxxxx -o output/result.md

# 自定义分隔符
python main.py notes.txt -k sk-xxxxxxxx -d "==="

参数说明

参数 必需 说明
input_file 输入文件路径
-k, --api-key DeepSeek API Key
-o, --output 输出文件路径(默认:同名 .md
-d, --delimiter 分块分隔符(默认:---

运行测试

cd ai-knowledge-splitter
pytest tests/ -v
Description
No description provided
Readme 41 MiB
Languages
Python 100%