e145f1d97e25d4ba9306657bff060c7bbba12733
Architecture fix: voice and text mode now have completely separate prompts. Backend: - VoiceAssistantProfileSupport.buildTextSystemRole: dedicated text-mode system role that inherits all business rules (identity, KB-first, sensitive topics, sales guidance, personal info) but removes voice-specific constraints (short sentences, colloquial, single-line conclusion). - DEFAULT_TEXT_SPEAKING_STYLE: text-specific style demanding detailed, structured, Markdown-formatted answers with complete information. - VoiceGatewayService.handleStart: switch between voice/text system role and speaking style based on state.textMode. - VoiceGatewayService.buildStartSessionPayload: preserve Markdown in text mode (voice mode still strips asterisks/backticks via normalizeTextForSpeech to avoid TTS pronouncing format chars). Frontend: - Added react-markdown@9 + remark-gfm@4 dependencies. - ChatPanel renders assistant messages (non-voice) with ReactMarkdown: headings, lists (ul/ol), bold, italic, inline/block code, tables, blockquote, links, horizontal rules — all styled with Tailwind classes matching the dark theme. - User messages and voice-handoff messages remain plain text. Verification: mvn test VoiceGatewaySmokeTest 20/20 pass, vite build succeeds.
AI 知识库文档智能分块工具
将多种格式文档解析为文本,通过 DeepSeek API 进行语义级智能分块,输出为 Markdown 文件。
支持格式
PDF、Word (.docx)、Excel (.xlsx/.xls)、CSV、HTML、TXT/MD、图片 (PNG/JPG/BMP/GIF/WEBP)
安装
cd ai-knowledge-splitter
pip install -r requirements.txt
使用
python main.py <输入文件> -k <DeepSeek API Key> [-o 输出路径] [-d 分隔符]
示例:
# 基本用法(输出为同名 .md 文件)
python main.py report.pdf -k sk-xxxxxxxx
# 指定输出路径
python main.py data.docx -k sk-xxxxxxxx -o output/result.md
# 自定义分隔符
python main.py notes.txt -k sk-xxxxxxxx -d "==="
参数说明
| 参数 | 必需 | 说明 |
|---|---|---|
input_file |
是 | 输入文件路径 |
-k, --api-key |
是 | DeepSeek API Key |
-o, --output |
否 | 输出文件路径(默认:同名 .md) |
-d, --delimiter |
否 | 分块分隔符(默认:---) |
运行测试
cd ai-knowledge-splitter
pytest tests/ -v
Description
Languages
Python
100%